IP addresses

AMO records IP addresses for various user actions, allowing us to correlate user activity to find malicious and abusive actors.

Processing

SetRemoteAddrFromForwardedFor middleware is responsible for processing the various headers and meta information we receive and setting META['REMOTE_ADDR'] on the request object passed to each view.

The request flow is either: Client -> CDN -> Load balancer -> WSGI proxy -> addons-server or Client -> CDN -> CDN shield -> Load balancer -> WSGI proxy -> addons-server or Client -> Load balancer -> WSGI proxy -> addons-server

Currently:

  • CDN is CloudFront or Fastly

  • CDN shield is an additional PoP on the CDN that request can go through (only enabled on Fastly)

  • Load Balancer is GKE Ingress (GCP)

  • WSGI proxy is nginx + uwsgi

CDN is set up to add a X-Request-Via-CDN header set to a secret value known to us so we can verify the request did originate from the CDN.

If the request was shielded by the CDN it sets the X-AMO-Request-Shielded header to "true". This header should only be trusted if X-Request-Via-CDN has been verified already.

Nginx converts X-Request-Via-CDN and X-Forwarded-For to HTTP_X_REQUEST_VIA_CDN and HTTP_X_FORWARDED_FOR parameters, respectively.

The X-Forwarded-For header is potentially user input. When intermediary servers in the flow described above add their own IP to it, they are always appending to the list, so we can only trust specific positions starting from the right, anything else cannot be trusted.

CDN always makes origin requests with a X-Forwarded-For header set to “Client IP, CDN IP”, so the client IP will be second to last for a CDN request. If the request was shielded, the shield PoP IP will be added so the client IP will be third to last.

On GCP, GKE Ingress appends its own IP to that header, resulting in a value of “Client IP, CDN IP, GKE Ingress IP” (or “Client IP, CDN IP, CDN Shield IP, GKE Ingress IP” for shielded requests), so the client IP will be third to last, or fourth to last if there was a CDN Shield.

We are no longer hosted on AWS, but it’s worth noting that on AWS, the classic ELB we were using did not make any alterations to X-Forwarded-For. For this reason, we only shift the client IP position we look at to account for the Load Balancer if DEPLOY_PLATFORM environ variable is set to "gcp".

If the request didn’t come from the CDN and is a direct origin request, on AWS we can use REMOTE_ADDR, but on GCP we’d get the GKE Ingress IP, and the X-Forwarded-For value will be “Client IP, GKE Ingress IP”, so the client IP will be second to last.

Recording

Through several older models in the code have a dedicated field for this, more recent implementations should use IPLog, which is automatically populated when an ActivityLog action constant is defined with store_ip set to True (note that if the keep property isn’t defined, we don’t keep the activity forever and therefore ultimately delete the associated IPLog instance as well)