Blocking Referrer Spam

Since I launched my blog a couple of months ago I've been getting quite a lot of referrer spam. After a bit of research, I decided to create a filtered view in Google Analytics and make some changes to my Nginx configuration.

Google Analytics filters

I found the Definitive Guide to Removing Referral Spam by Analytics Edge to be a helpful resource on filtering Google Analytics views. The filters mentioned below are all variations on filters suggested in that article.

Note: Before you start creating any filters, make a copy of your unfiltered view. Apply your spam filters to the copy and keep the original for comparison.

Invalid hostnames filter

We'll start by filtering out invalid hostnames. In Google Analytics, go to Audience > Technology > Network > Hostname. If you see any hostnames in the list that aren't your own, you need this filter.

I chose not to use a valid hostname filter as suggested in the Analytics Edge article. I only had one spam hostname showing up in Google Analytics, so I decided an invalid hostname filter would suffice. Also, there's no risk of excluding valid traffic with this approach.

To set up an invalid hostname filter:

  1. From your Google Analytics home screen, go to the Admin tab, select your view from the dropdown and click on "Filters" in the list below.
  2. Choose the Filter Type "Custom", select "Exclude", and choose the Filter Field "Hostname".
  3. In the Filter Pattern text field, add the hostnames you want to block. If you have more than one, separate them with a vertical bar (|). E.g. "badhost1.com|badhost2.com|badhost3.com".

Invalid hostnames filter

Fake referrers filter

To see your referral spam, go to Acquisition > All Traffic > Referrals.

For any fake referrers, you can create a filter similar to that above but this time with the Filter Field "Campaign Source".

Referral spam filter

This filter requires maintenance, and you'll need to update it with each new referral. There's a 255 character limit for each filter. I'm currently using four filters with the following patterns:

semalt.com|anticrawler.org|best-seo-offer.com|best-seo-solution.com|buttons-for-website.com|buttons-for-your-website.com|7makemoneyonline.com|-musicas*-gratis|kambasoft.com|savetubevideo.com|ranksonic|medispainstitute|offers.bycontext|100dollars-seo

12masterov.com|bard-real.com.ua|billiard-classic.com.ua|cardiosport.com.ua|ci.ua|customsua.com.ua|delfin-aqua.com.ua|dipstar.org|dvr.biz.ua|e-kwiaciarz.pl|este-line.com.ua|ghazel.ru|it-max.com.ua|maridan.com.ua|mebeldekor.com.ua

mirobuvi.com.ua|offers.bycontext.com|olgacvetmet.com|palvira.com.ua|trion.od.ua|наркомания.лечениенаркомании.com|алкоголизм.лечениенаркомании.com|med-zdorovie.com.ua|ranksonic.org|.*ranksonic.com|prodvigator.ua

trafficmonetize.org|4webmasters.org|success-seo.com|semaltmedia.com|videos-for-your-business.com

Exclude bots and spiders

One last thing you can do is to exclude traffic from known bots and spiders from appearing in your analytics. Go to Admin > View > View Settings > Bot Filtering and tick "Exclude all hits from known bots and spiders". You'll need to do this for each of your views.

Blocking referrer spam with Nginx

Ok, so now we've cleaned up Google Analytics we can turn our attention to the server.

Attempt 1

I started off following the approach used in this article to block referrer spam in my Nginx config file.

I created referral-spam.conf in /etc/nginx/global/:

##
# Referrer exclusions
##
if ($http_referer ~ "(semalt\.com|buttons-for-website\.com|trafficmonetize\.org|4webmasters\.org|100dollars-seo\.com)") {
  return 403;
}

Then I added this line to my existing blog.conf file:

server {
  ...
  include /etc/nginx/global/*;
}

And restarted Nginx:

nginx -s reload
pm2 restart Ghost

This worked, but I could tell it was going to become unwieldy as the list of referrers grew. I wanted to find an approach that would be cleaner and easier to maintain with a longer list.

Attempt 2

With some more searching I came across this article with a slightly different method of adding bad referrers to a blacklist file (gist here).

I created blacklist.conf in /etc/nginx/conf.d/:

##
# Referrer spam blacklist
##

map $http_referer $bad_referer {
    hostnames;

    default                           0;
    "~social-buttons.com"             1;
    "~semalt.com"                     1;
    "~buttons-for-website.com"        1;
    "~trafficmonetize.org"            1;
    "~4webmasters.org"                1;
    "~100dollars-seo.com"             1;
    "~webmonetizer.net"               1;
}

All config files in the conf.d directory are included in my nginx.conf file with:

include /etc/nginx/conf.d/*.conf;

And then I updated my existing blog.conf file:

server {
    ...

    if ($bad_referer) { 
        return 444; 
    }
}

About the 444 HTTP status code:

444 No Response (Nginx)
Used in Nginx logs to indicate that the server has returned no information to the client and closed the connection (useful as a deterrent for malware).

Finally, I tested my blacklist like so:

curl -k --referer http://4webmasters.org http://stephsharp.me
curl: (52) Empty reply from server

Great! It's working as expected and I'm satisfied the blacklist will be easy enough to maintain for new referrers.

The results so far

I've had the Google Analytics filters in place for about a month now and I'm happy with how they're working so far. I've added an additional three referrers to the filter in the last month, but they were only minor offenders and barely affecting my reports so I could have just as easily ignored them.

The image below shows my sessions in the last few days without any filtering, and then with the hostname and referrer filters discussed in this post. That's a pretty big difference!

Number of sessions before and after spam filtering

As for Nginx, it's harder to tell the impact the config changes have had. These referrers still show up in Google Analytics (and as far as I know there's nothing I can do about that), but the volume of spam does appear to have slowed down a bit. It may just be a coincidence, but I'll keep an eye on it for the next month or two and see how it goes.