Posts tagged with: web

Blocking Referrer Spam

Since I launched my blog a couple of months ago I've been getting quite a lot of referrer spam. After a bit of research, I decided to create a filtered view in Google Analytics and make some changes to my Nginx configuration.

Google Analytics filters

I found the Definitive Guide to Removing Referral Spam by Analytics Edge to be a helpful resource on filtering Google Analytics views. The filters mentioned below are all variations on filters suggested in that article.

Note: Before you start creating any filters, make a copy of your unfiltered view. Apply your spam filters to the copy and keep the original for comparison.

Invalid hostnames filter

We'll start by filtering out invalid hostnames. In Google Analytics, go to Audience > Technology > Network > Hostname. If you see any hostnames in the list that aren't your own, you need this filter.

I chose not to use a valid hostname filter as suggested in the Analytics Edge article. I only had one spam hostname showing up in Google Analytics, so I decided an invalid hostname filter would suffice. Also, there's no risk of excluding valid traffic with this approach.

To set up an invalid hostname filter:

  1. From your Google Analytics home screen, go to the Admin tab, select your view from the dropdown and click on "Filters" in the list below.
  2. Choose the Filter Type "Custom", select "Exclude", and choose the Filter Field "Hostname".
  3. In the Filter Pattern text field, add the hostnames you want to block. If you have more than one, separate them with a vertical bar (|). E.g. "badhost1.com|badhost2.com|badhost3.com".

Invalid hostnames filter

Fake referrers filter

To see your referral spam, go to Acquisition > All Traffic > Referrals.

For any fake referrers, you can create a filter similar to that above but this time with the Filter Field "Campaign Source".

Referral spam filter

This filter requires maintenance, and you'll need to update it with each new referral. There's a 255 character limit for each filter. I'm currently using four filters with the following patterns:

semalt.com|anticrawler.org|best-seo-offer.com|best-seo-solution.com|buttons-for-website.com|buttons-for-your-website.com|7makemoneyonline.com|-musicas*-gratis|kambasoft.com|savetubevideo.com|ranksonic|medispainstitute|offers.bycontext|100dollars-seo

12masterov.com|bard-real.com.ua|billiard-classic.com.ua|cardiosport.com.ua|ci.ua|customsua.com.ua|delfin-aqua.com.ua|dipstar.org|dvr.biz.ua|e-kwiaciarz.pl|este-line.com.ua|ghazel.ru|it-max.com.ua|maridan.com.ua|mebeldekor.com.ua

mirobuvi.com.ua|offers.bycontext.com|olgacvetmet.com|palvira.com.ua|trion.od.ua|наркомания.лечениенаркомании.com|алкоголизм.лечениенаркомании.com|med-zdorovie.com.ua|ranksonic.org|.*ranksonic.com|prodvigator.ua

trafficmonetize.org|4webmasters.org|success-seo.com|semaltmedia.com|videos-for-your-business.com

Exclude bots and spiders

One last thing you can do is to exclude traffic from known bots and spiders from appearing in your analytics. Go to Admin > View > View Settings > Bot Filtering and tick "Exclude all hits from known bots and spiders". You'll need to do this for each of your views.

Blocking referrer spam with Nginx

Ok, so now we've cleaned up Google Analytics we can turn our attention to the server.

Attempt 1

I started off following the approach used in this article to block referrer spam in my Nginx config file.

I created referral-spam.conf in /etc/nginx/global/:

##
# Referrer exclusions
##
if ($http_referer ~ "(semalt\.com|buttons-for-website\.com|trafficmonetize\.org|4webmasters\.org|100dollars-seo\.com)") {
  return 403;
}

Then I added this line to my existing blog.conf file:

server {
  ...
  include /etc/nginx/global/*;
}

And restarted Nginx:

nginx -s reload
pm2 restart Ghost

This worked, but I could tell it was going to become unwieldy as the list of referrers grew. I wanted to find an approach that would be cleaner and easier to maintain with a longer list.

Attempt 2

With some more searching I came across this article with a slightly different method of adding bad referrers to a blacklist file (gist here).

I created blacklist.conf in /etc/nginx/conf.d/:

##
# Referrer spam blacklist
##

map $http_referer $bad_referer {
    hostnames;

    default                           0;
    "~social-buttons.com"             1;
    "~semalt.com"                     1;
    "~buttons-for-website.com"        1;
    "~trafficmonetize.org"            1;
    "~4webmasters.org"                1;
    "~100dollars-seo.com"             1;
    "~webmonetizer.net"               1;
}

All config files in the conf.d directory are included in my nginx.conf file with:

include /etc/nginx/conf.d/*.conf;

And then I updated my existing blog.conf file:

server {
    ...

    if ($bad_referer) { 
        return 444; 
    }
}

About the 444 HTTP status code:

444 No Response (Nginx)
Used in Nginx logs to indicate that the server has returned no information to the client and closed the connection (useful as a deterrent for malware).

Finally, I tested my blacklist like so:

curl -k --referer http://4webmasters.org http://stephsharp.me
curl: (52) Empty reply from server

Great! It's working as expected and I'm satisfied the blacklist will be easy enough to maintain for new referrers.

The results so far

I've had the Google Analytics filters in place for about a month now and I'm happy with how they're working so far. I've added an additional three referrers to the filter in the last month, but they were only minor offenders and barely affecting my reports so I could have just as easily ignored them.

The image below shows my sessions in the last few days without any filtering, and then with the hostname and referrer filters discussed in this post. That's a pretty big difference!

Number of sessions before and after spam filtering

As for Nginx, it's harder to tell the impact the config changes have had. These referrers still show up in Google Analytics (and as far as I know there's nothing I can do about that), but the volume of spam does appear to have slowed down a bit. It may just be a coincidence, but I'll keep an eye on it for the next month or two and see how it goes.

Responsive Images

A couple of weeks ago I saw this tweet from Brad Frost and it was the nudge I needed to finally learn something about responsive images. My resources were Brad Frost's article on responsive images and Responsive Images in Practice by Eric Portis.

I wanted to improve the load time of the cover image on my blog. When I was setting up my blog I found an image I liked on Unsplash, cropped it slightly, and made it my cover image. Simple. It looked good on my retina screen at work, but it took forever to load on my phone and was unnecessarily using up my limited data.

The process

I created four versions of the cover image at different sizes, ranging from 2560px wide down to 480px. To load the appropriate image based on the screen resolution, I used the srcset attribute:

<img src="/content/images/cover/large.jpg"
    srcset="/content/images/cover/xlarge.jpg  2560w, 
            /content/images/cover/large.jpg   1920w,
            /content/images/cover/medium.jpg  960w,
            /content/images/cover/small.jpg   480w"
    sizes="100vw" 
    alt="Cover image" />

The img tag still has a src attribute for browsers that don’t support srcset. I also used the Picturefill polyfill as a fallback. The sizes attribute gives the browser an estimate of the image's display width, in this case the full width of the viewport (100vw).

To find out if my responsive cover image actually improved performance on mobile screens, I measured the page weight across a range of screen resolutions. To measure the page weight, I used Chrome's Developer Tools (View > Developer). From the Network tab, reload your page with the Shift key held down, press Cmd+Shift+R, or tick the "Disable cache" checkbox at the top of the Network tab. This will cause Chrome to load the page without using the cache, which is important since the Network tab only shows the data transferred from the server. The page weight is displayed at the bottom of the Network tab, "1.9MB transferred".

Chrome Developer Tools - Network tab

Along the bottom of the Dev Tools window there is the Emulation tab. Drag it up into view and choose your screen resolution and pixel density.

Chrome Developer Tools - Emulation tab

If you dont see the Emulation tab, click the button to the left of the Settings icon in the top right corner of the Developer Tools to show the bottom drawer (it should be highlighted blue like in the above screenshot).

The results

Results

The responsive cover image reduces the page weight by 0.3MB on small mobile screens. I've also reduced the overall page weight by another 0.4MB by resizing the original cover image down to a maximum width of 2560px. The result is a saving of almost a third of the total page weight on the smallest screens.

While not quite as impressive as Eric Portis' results, it's not bad either considering I only made the cover image responsive and left all the other images unchanged. And more importantly, the page loads faster on mobile devices and is using 30% less data that it was previously.

Drawbacks

The main drawback to implementing a responsive cover image on my blog is that Ghost doesn't currently support this, and I can no longer access the cover image in my theme with {{@blog.cover}}. It would be great to be able to upload a cover image via the Ghost admin and have the option to specify multiple sizes for the image. There's an open Github issue for this feature, so hopefully it's not far away.

Configure Nginx to Serve Ghost and Non-Ghost Files

I set up this Ghost blog on DigitalOcean with the help of this tutorial. I wanted to be able to serve non-Ghost files hosted on the same DigitalOcean droplet as my blog for two reasons:

  1. I had an existing portfolio website I wanted to host on the same server and link to from the blog with the URL stephsharp.me/portfolio

  2. I wanted to be able to upload static files and link to them in my blog posts with a URL like stephsharp.me/files/file.zip. (See this discussion and this discussion on the Ghost support forum on this topic.)

Nginx config for Ghost

After following the tutorial to install Ghost, my Nginx config looked like this:

server {
    listen       80 default_server;
    server_name  stephsharp.me;
    root         /home/ghost/ghost;

    location / {
        proxy_pass http://localhost:2368/;
        proxy_set_header Host $host;
        proxy_buffering off;
    }
}

The root directory is /home/ghost/ghost, and I have just the one location block to serve Ghost content.

Nginx config to serve non-Ghost files

I then set up my preferred directory structure as follows:

|-- home
    |-- files
    |-- ghost
    |-- portfolio

To serve up non-Ghost files in the 'files' and 'portfolio' directories, I altered my Nginx config to have multiple location blocks. It has a default root directive outside the location blocks, and a different root directive within the location block serving Ghost.

I've also used try_files to provide appropriate fallbacks in the /files and /portfolio location blocks.

server {
    listen          80 default_server;
    server_name     stephsharp.me;
    root            /home;

    location / {
        root        /home/ghost/ghost;

        proxy_pass          http://localhost:2368/;
        proxy_set_header    Host $host;
        proxy_buffering     off;
    }

    location /files {
        try_files   $uri $uri/ =404;
    }

    location /portfolio {
        try_files   $uri $uri/ /portfolio/index.html =404;
    }
}

I put this config in a separate file on the server at /etc/nginx/conf.d/blog.conf and have this line in /etc/nginx/nginx.conf:

include /etc/nginx/conf.d/*.conf;

I could have just put the server block into the existing nginx.conf file, but I decided to separate it out into its own config file.

Side note: Remember to restart Nginx with nginx -s reload when you make changes to the config files.