Pseudonymization and anonymization of logs, sometimes seems to be one of the biggest problems developers struggle with. Coming up with good patterns and defining needs or procedures for these seems almost impossible. Fortunately, there are a lot of ways of dealing with this, like a proper design, different logging files and log filtering.
In this series, I will discuss a few of these to show it is not that difficult and for some programs, it requires some coding, some - some configuration and others... Just do it right to begin with or just blacklist or apply risk-treatment and define the residual risk.
Nginx IP-logging
To protect the privacy of visitors to websites, one can anonymize or pseudonymize the IP addresses by destroying the last octet. You could call this ip-masking. We will do this, only for the times we write this information to disk. For some security features, you might need this, but for regular access-logs (in other words: not audit- or security-logs) you don't and in some countries you are actually forbidden to do do, unless you have special concerns and/or use-cases. (Germany being one of them, and security for audit is allowed, but not the application)
#https://www.nginx.com/resources/wiki/start/topics/examples/full/
...
http {
include conf/mime.types;
include /etc/nginx/proxy.conf;
include /etc/nginx/fastcgi.conf;
index index.html index.htm index.php;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] $status '
'"$request" $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log logs/access.log main;
sendfile on;
tcp_nopush on;
server_names_hash_bucket_size 128; # this seems to be required for some vhosts
...
The lines of interest are:
log_format main '$remote_addr - $remote_user [$time_local] $status '
'"$request" $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
And
access_log logs/access.log main;
In the first part, you define the "main" profile for logging and in the second part, you define the file output and the profile of it (and type).
Now let us assume you need to have the IP's or range of the IP's available in your application but not in the log files. Instead of destroying the variable $remote_addr (which will require a patch), we can just create a new variable:
map $remote_addr $remote_addr_anon {
~(?P<ip>\d+\.\d+\.\d+)\. $ip.0;
~(?P<ip>[^:]+:[^:]+): $ip::;
default 0.0.0.0;
}
The snipping above requires nginx 1.10 or later. map allows for multiple modifications since then. You will otherwise split this up in multiple steps to differentiate between IPv4 and IPv6.
This must be done in the http{} context. map and log_format can only be defined there! log type access_log, custom_log etc, can defined at server/host level.
This will take $remote_addr and store it in $remote_addr_anon. Ok, maybe I should change anon to psuedo... but it is usually not targeting or singling out individuals any more. If I would now change:
log_format main '$remote_addr - $remote_user [$time_local] $status '
'"$request" $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
Into:
log_format anonip '$remote_addr_anon - $remote_user [$time_local] $status '
'"$request" $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log logs/access.log main;
access_log logs/access.log anonip;
We have now logs with ip's defaulting to something like: $1.$2.$3.0 or 1234:5678:
angelique@dawnbringer:/var/log/nginx$ sudo cat audit.log 108.162.221.120 - - [05/Jul/2018:09:13:24 +0200] "HEAD / HTTP/1.1" 301 0 "http://www.serveroffline.net" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)" "69.162.124.235" 108.162.221.72 - - [05/Jul/2018:09:13:50 +0200] "HEAD / HTTP/1.1" 301 0 "http://www.serveroffline.net" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)" "69.162.124.227" 108.162.221.72 - - [05/Jul/2018:09:14:50 +0200] "HEAD / HTTP/1.1" 301 0 "http://www.serveroffline.net" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)" "69.162.124.227"
angelique@dawnbringer:/var/log/nginx$ sudo cat access.log 108.162.221.0 - - [05/Jul/2018:09:16:50 +0200] "HEAD / HTTP/1.1" 301 0 "http://www.serveroffline.net" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)" "69.162.124.0" 108.162.221.0 - - [05/Jul/2018:09:17:50 +0200] "HEAD / HTTP/1.1" 301 0 "http://www.serveroffline.net" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)" "69.162.124.0" 108.162.221.0 - - [05/Jul/2018:09:18:50 +0200] "HEAD / HTTP/1.1" 301 0 "http://www.serveroffline.net" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)" "69.162.124.0"
Please be aware, that $http_x_forwarded_for also contains ip adresses. When using reverse proxies or load-balancers, ip's are usually stored here and $remote_addr will contain the load-balancers/proxies' ip address.
If is evil! Please read https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/ before considering if
I wrote this article / series in the beginning of 2007. It has since then been heavily modified. I published a new version in 2015 and removed the old article entirely as it was too outdated. I do my best to keep information relative and regularly check contents. Please contact me if you find something not befitting or "out-of-date".