Log-Right: Nginx IP Masking / Anonymize IP-logging

Pseudonymization and anonymization of logs, sometimes seems to be one of the biggest problems developers struggle with. Coming up with good patterns and defining needs or procedures for these seems almost impossible. Fortunately, there are a lot of ways of dealing with this, like a proper design, different logging files and log filtering.

In this series, I will discuss a few of these to show it is not that difficult and for some programs, it requires some coding, some - some configuration and others... Just do it right to begin with or just blacklist or apply risk-treatment and define the residual risk.

Nginx IP-logging

To protect the privacy of visitors to websites, one can anonymize or pseudonymize the IP addresses by destroying the last octet. You could call this ip-masking. We will do this, only for the times we write this information to disk. For some security features, you might need this, but for regular access-logs (in other words: not audit- or security-logs) you don't and in some countries you are actually forbidden to do do, unless you have special concerns and/or use-cases. (Germany being one of them, and security for audit is allowed, but not the application)

#https://www.nginx.com/resources/wiki/start/topics/examples/full/

...

http {
  include    conf/mime.types;
  include    /etc/nginx/proxy.conf;
  include    /etc/nginx/fastcgi.conf;
  index    index.html index.htm index.php;

  default_type application/octet-stream;
  log_format   main '$remote_addr - $remote_user [$time_local]  $status '
    '"$request" $body_bytes_sent "$http_referer" '
    '"$http_user_agent" "$http_x_forwarded_for"';
  access_log   logs/access.log  main;
  sendfile     on;
  tcp_nopush   on;
  server_names_hash_bucket_size 128; # this seems to be required for some vhosts

...

The lines of interest are:


  log_format   main '$remote_addr - $remote_user [$time_local]  $status '
    '"$request" $body_bytes_sent "$http_referer" '
    '"$http_user_agent" "$http_x_forwarded_for"';

And

  access_log   logs/access.log  main;

In the first part, you define the "main" profile for logging and in the second part, you define the file output and the profile of it (and type).

Now let us assume you need to have the IP's or range of the IP's available in your application but not in the log files. Instead of destroying the variable $remote_addr (which will require a patch), we can just create a new variable:


map $remote_addr $remote_addr_anon {
    ~(?P<ip>\d+\.\d+\.\d+)\.    $ip.0;
    ~(?P<ip>[^:]+:[^:]+):       $ip::;
    default                     0.0.0.0;
}

The snipping above requires nginx 1.10 or later. map allows for multiple modifications since then. You will otherwise split this up in multiple steps to differentiate between IPv4 and IPv6.

This must be done in the http{} context. map and log_format can only be defined there! log type access_log, custom_log etc, can defined at server/host level.

This will take $remote_addr and store it in $remote_addr_anon. Ok, maybe I should change anon to psuedo... but it is usually not targeting or singling out individuals any more. If I would now change:


  log_format   main '$remote_addr - $remote_user [$time_local]  $status '
    '"$request" $body_bytes_sent "$http_referer" '
    '"$http_user_agent" "$http_x_forwarded_for"';

Into:


  log_format   anonip '$remote_addr_anon - $remote_user [$time_local]  $status '
    '"$request" $body_bytes_sent "$http_referer" '
    '"$http_user_agent" "$http_x_forwarded_for"';
  access_log   logs/access.log  main;
  access_log   logs/access.log  anonip;

We have now logs with ip's defaulting to something like: $1.$2.$3.0 or 1234:5678:

angelique@dawnbringer:/var/log/nginx$ sudo cat audit.log
108.162.221.120 - - [05/Jul/2018:09:13:24 +0200] "HEAD / HTTP/1.1" 301 0 "http://www.serveroffline.net" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)" "69.162.124.235"
108.162.221.72 - - [05/Jul/2018:09:13:50 +0200] "HEAD / HTTP/1.1" 301 0 "http://www.serveroffline.net" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)" "69.162.124.227"
108.162.221.72 - - [05/Jul/2018:09:14:50 +0200] "HEAD / HTTP/1.1" 301 0 "http://www.serveroffline.net" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)" "69.162.124.227"
angelique@dawnbringer:/var/log/nginx$ sudo cat access.log
108.162.221.0 - - [05/Jul/2018:09:16:50 +0200] "HEAD / HTTP/1.1" 301 0 "http://www.serveroffline.net" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)" "69.162.124.0"
108.162.221.0 - - [05/Jul/2018:09:17:50 +0200] "HEAD / HTTP/1.1" 301 0 "http://www.serveroffline.net" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)" "69.162.124.0"
108.162.221.0 - - [05/Jul/2018:09:18:50 +0200] "HEAD / HTTP/1.1" 301 0 "http://www.serveroffline.net" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)" "69.162.124.0"

Please be aware, that $http_x_forwarded_for also contains ip adresses. When using reverse proxies or load-balancers, ip's are usually stored here and $remote_addr will contain the load-balancers/proxies' ip address.

If is evil! Please read https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/ before considering if

I wrote this article / series in the beginning of 2007. It has since then been heavily modified. I published a new version in 2015 and removed the old article entirely as it was too outdated. I do my best to keep information relative and regularly check contents. Please contact me if you find something not befitting or "out-of-date".

Resources

Author: Angelique Dawnbringer Published: 2015-09-12 09:08:26 Keywords:
  • log-right
  • nginx
  • ip
  • masking
  • log_format
  • $remote_addr
  • privacy
  • logging
Modified: 2018-07-05 09:47:28