Insight into Blog Spam

Everyone hates blog spam no matter if it’s comment or trackback spam but I thought I would look into it and see if there is any common elements and yes I’m amazed how much it’s still used.  So I thought I’d gather some comment data from this blog but also SEOMeetups.com.au that is regularly spammed as people try to build links and drive referral traffic back to their affiliate links, social profiles, youtube clips, clients websites and their own sites.

I managed to export and filter 10,119 spam comments and started playing around with the data in Excel to see what insights I could learn from blog spam left from the 31st July 2011 until the 12th January 2014.

Following items of interest explored

  • Did the spammer have an email
  • What email service did they use

  • What was their email domain extension?
  • What was their author URL on comment

  • Was there any targeted deep linking campaigns?
  • How many comments also contained links?

  • What was the average spam comment length
  • What does their spam commentators IP profile look like

  • What are the largest regions for spam?
  • Is there growth in comment spam?

  • Top cities IPs for comment spam
  • Is there a best time for moderation?

  • Top spam browsers

Do spammers always use email?

It seems around 33% of the time that they don’t leave an email address and that’s because it’s trackback or pingback spam. There are plugins that can stop a bulk of this and they are already in place but there is so much that a % does get through but maybe I can find a better spam solution.

Do spammers list emails

What email platform do spammers often use?

Note that this is not perfect as spammers will can easily leave fake/junk emails but it seems that Gmail appears to be the preferred platform.  I found that 58.2% of all spammers were using Gmail as their preferred solution I assume part of the favoritism is that you can “send as” and easily forward emails from the spam Gmail email to their real accounts if they really wanted.  The ease of creating a churn and burn Gmail account is a big part of the problem as Yahoo (5%), AOL (4%) and Hotmail (2.8%).  I’ve noticed that there is a long tail of 467 domains that include many churn & burn domains that make up the remaining 29.9% of spam emails.

Email Platform used by Spammers

What is the spammers email domain extension?

So what is the most popular domain extension that spammers are using? Once I removed “blank emails” I found that .com extensions were linked to around 82.6% of all spam but surprisingly .de represented 11.3% of spam.  The other domains .net (2.8%) .org (0.9%) .co.uk (0.5%) .ru (0.3%) .pl (0.2%)  and .es (0.2%) with the remaining 28 domains only representing 1.2% of all spam.

Email Domain Extension

What is the comment spam URL?

It’s interesting that 1.59% of the Author URLs were for https:// and 0.79%  were left blank. It seems looking at the stats below that Penguin doesn’t appear to have greatly changed spam link builders strategies as they are still pushing for the deep links and much of the oldest links are to domains it was only from the start of 2012 did a majority of tactics switch to deep linking. Over the entire period of data 65% of the link building that was tried was deep linking and 12 actually just failed when they so low quality they added a broken link.

Link Building Type

 

 

What about targeted campaigns?

Around 10% of all deep links could be easily categorised into targeted campaigns to promote YouTube videos, increase link authority to Wiki spam, profile URLs and interestingly social profiles. As you can see from the graph below a number of these made up less than 1% of deep link spam and it’s great to see that SEOs might have finally given up on Squidoo.  The last comment spam to promote a Squidoo URL was on the 14th April 2013 from a North American IP address, it seems European spammers gave up on spamming links to the Squidoo platform at the start of 2013. This data doesn’t include the various types of links that were dropped into the comment it just includes links left in the Author profile URL.

Spam URL TargetsHow many comments contained links?

Yes it seems still around 21% of comment spam contains links and 7.6% of those comments with spam links were generated by trackbacks and pingbacks. Of those trackback/pingback spam with links in the comments I found 90% were traced back to Asian IP addresses that shows more automated techniques are used by Asian link builders.  If you look at comment spam with links Asia still leads with (65.2%) with North America (16.5%) and Europe (14.1%).

Did Comment Contain Link

I only found 1 Australian IP address was used but found 13 comments dropping links had an Australian domain in the Author URL using 11 North American IPs and 2 Asian IPs.  Looking at a general trend with link dropping it seems the spammers are are getting smarter and now adding in NoFollow links to the comment spam so they get the referral traffic with a potential lower chance of a penguin penalty.  72.1% of links are dropped carry the NoFollow tag which in the chart below you can see this tactic aggressively spiked up in 2014 following both domains increasing their SEO authority and number of blog posts that could be spammed.

Comment Link Drops

I did also notice that a number of the comment links are for pharmaceutical or fashion links and many of the pharma links appear to be from comprised sites to other compromised sites. It’s hard to easily break down the comment links as sometimes there just one or two but often several per post sometimes it’s HTML code with specific anchor text and other times just a straight link but much of the comments are gibberish or spun comments.

These are not comments along the lines of that’s a great article but I found this point incorrect and here is a reference link or I explored this in more detail over on my blog and here is my link these are obvious low quality link building & spam tactics.

Because of the sheer number of comments there is a small chance that there are some real comments buried deep in there somewhere.  Please note that I did find a handful suitable comments but the authors name was a anchor text keyword and a deep link done for SEO but since the email was just a throw away address so I nuked the comments as it’s my blog and my choice to publish the comment or not. These type of SEO motivated comments will hurt hurt your site if you auto-published them as per Google user-generated content guidelines.

Insight into length of comment spam?

Looking at the data the average comment spam post is around 450 characters including HTML code and links.  But if I break that down further it’s trackback/pingback spam is around 270 characters while comment spam is far longer at 540 characters.

The maximum length of a comment spam comment was 32,759 characters which was a rambling comment about ebooks that was linked to a author name “medical billing business” and the profile link was a link to a Squidoo page. The maximum length of a trackback comment was 4150 characters contained html code, random user agent referral data and some links as was linked with the domain as the authors name which and the profile link was to a Blogspot spam blog about fashion items masked by a .com domain.

The shortest spam comment was 0 characters and that was linked to an author name “Elizabeth Macey Sageturema” that was linked to a domain boarda_info that redirected publications.usa.gov from a Berlin IP address.  The shortest trackback/pingback comment was 29 characters and was linked to an author name “Ugg” and the profile link was to Alresco_com which looks like an official page for Ugg Australia for Denmark which came from a Chinese IP from the city of Fuzhou.

I did notice a number of what seemed to be harmless comments but with enough data you can see that IP address has spammed before.  The reason that people will spammers will do this is that they hope once you approve the first comment they can then start spamming your blog without you noticing.  The default setting in WordPress is set to manually approved so WordPress could change this setting and stop a huge amount of spam instantly on any new WordPress blog.

Comment author must have a previously approved comment

What does their IP profile look like?

I found that there was 10,119 comments from 4432 unique IP addresses but the interesting insight is that 51.8% of all my comment spam was originating from Asian IP addresses.  North America IPs were half as prominent which is surprising considering the amount of cheap IPs available that people usually use to mask their comment spam link building tactics.

Spammer IP Region

What is the regions largest year in blog spam?

From the chart below sorted from the highest number with Asia at the top down to The Caribbean with 1 IP address you can see the shift in spam is moving towards Asia.  2012 showed a large number of static IPs that appear to have pretty much vanished as these IPv4 addresses started to be sold or taken back by ISPs and hosting providers.

Largest Years in SpamWhat is the growth in blog spam?

I looked at the top 3 regions linked to spam and you can see the massive uplift for Asian IPs and so far the 2014 spam comments just from Asia are already 22% higher than the entire spam comments for 2013.  So far for the comment spam for the first 12 days of 2014 is actually 56% higher than for all of 2013 which has me kinda worried as this puts spam comments up from 8.86/day to 417.83/day.  If this comment blog spam continues at this rate it will put the combined comment spam for the two blogs at around 152,209 comments by the end of 2014.

Growth in Spam?

 

What are the top cities for blog spam?

The winner with 28% of all spam in our data set are IPs from Fuzhou followed by Guangzhou (11%) and surprisingly Chicago (5%).  Some honorable mentions go to comment spam IPs from Perth (58.3%) for Oceania and Guarujá, Brazil (19.2%) for South America.

If we break this down this data for Europe only the top cities are Roubaix, France (18%), Czempin, Poland (7%), Stockholm, Sweden (6%), Gunzenhausen, Germany (4%), Kiev, Ukraine (4%), Rivne, Ukraine (4%), Milan, Italy (3%), Bournemouth, England (3%), London, England (2%), Madrid, Spain (2%) and Rotterdam, Netherlands (2%).

If we break down this data for North America only the top cities are Chicago (18%), Kansas City (13%), Henderson (6%), Clarks Summit (5%), Miami (5%), Las Vegas (4%), Dallas (4%), Buffalo (3%), Phoenix (3%), Los Angeles (3%), Seattle (3%), Toronto (3%), Scranton (2%), San Jose (2%), Montreal (2%), Tampa (2%).

 

Top Spam Cities

So how is 2014 tracking so far?

Based on the data it appears most US cities will easily surpass their 2013 spam comments numbers it appears that the following cities have already aggressively stepped up spam in 2014.  Compared to their 2013 count this is already how much higher their current 2014 count is for Kansas City (+42%), Henderson (+31%) Miami (+216%), Las Vegas (+197%), Buffalo (+108%), Seattle (+107%) and Montreal (+56%).

It seems that spammers from European cities IPs are changing their tactics as while I think a several will pass their 2013 targets fairly easily they don’t show the aggressive growth of the US spam.  The number of spam comments from European IPs are also much more spread out with 191 different cities generating comment spam compared to only 142 for North America.  But the European cities IP addresses linked to spam that have already passed 2013 numbers are Bournemouth, England (+154%), Wroclaw, Poland (+40%), Manchester, England (1%) and Bucharest, Romania (350%).

So does this mean people are faking/spoofing their IPs to come from these cities? I’d be interested to hear from SEOs & Bloggers in these regions, target these regions or attract a lot of visitors from these regions have you seen a similar trend?

Is there a best time for moderation?

I had thought that there would be a specific time of the day but because of the global network of spammers it seems that they are spread out the comment spam to be around 4% each hour.  There is a large spike from before 3pm until after 4pm and also from 6am to after 7am but no real downtime with the lowest number from 11pm until around 1am GMT.

Time of Spam Comments (GMT)

 

What browsers are spammers using?

It seems that 89% of spam comments browsers can’t be easily identified which indicates that a vast amount of the spam is automated via bots and software.  I did notice a small number of known malware agents also placing comments and a tiny amount of known botnets spamming the blogs.

Browser Software Used

Solutions to reduce comment spam?

This post focused on WordPress comment spam, so if you are using WordPress make sure your site is secure and read my Better Protect WordPress guide, make sure your CMS version is updated/patched regularly, and you have a strong password.  If you are using WordPress make sure you have turned off auto approvals and before a comment appears the comment must be manually approved, force comment authors to fill out a name and email and considering closing comments on articles older than a few months. There are plenty of other free solutions for personal blogs such as Akismet and Cloudflare that can help slow the torrent of spam or block the IPs from regions that are common for spammers.

Edit: I’ve added a infographic Akismet created at the end of 2013 and linked to their 2013 review on spam, they are also showing a massive rise in spam comments and are catching on average around 200,000,000 spam comments each day which is a 40% rise on 2012 numbers.

2013-year-in-review

Thank you for reading this far!

I hope that you enjoyed this post and welcome your comments and hopefully not too many negative comments from people that use this tactic for their link building or promotional methods.  Feel free to leave a comment below and I’ll respond or delete it if it’s automated spam.

35 Replies to “Insight into Blog Spam”

  • Viagra!

    Just kidding… Awesome post, Dave. Crazy to see the huge uplift in spam from Asia. I guess it’s not surprising when you consider the massive growth they’re experiencing in the web. Did I read that graph right when I see ~1000%+ growth in spam from Asia?

    Did you look at spam to ham ratios? Even though Mail.RU ranked low in the total spam comments, I would imagine it’s ham ratio is quite low.

    • Thanks Rob,

      Ah yes I must say I was shocked but China is certainly taking the lead when it comes to spam which is quite alarming on current growth rates. I think I approved maybe 5-10 comments during that period for each blog which is a little sad, but the only thing that this data doesn’t show is how many were blocked by Akismet or Cloudflare and never showed in the data.

      I booked marked this comment and will come here often…. ugg boots…

  • Wonderful article 🙂

    I would love to grab your data and run some analysis on the effectiveness of the spam, for example taking a sample of 30 sites and checking to see if they have achieved anything in the SERPS.

    Why the surge from China I wonder?

    Steve

    • Thanks Steve, it part of Saturday night for initial data gathering and some research and Sunday for writing it all up.

      I did wonder about posting what domains were being spammed by the author URL or by the link drops in the comments but did expect some false positives might be hidden in there, I did notice a lot of hacked sites and churn and burn domains. I’ll consider a link to the data from this to the article if there is enough interest.

      I think the surge in China is down to cheaper staff for link building and some developers building software tools to automate the whole process.

      David

  • Great post! I’m pretty sure not all comment spam has to do with SEO. Many spammers promote affiliate products, and just want to get clicks that they can cookie to generate affiliate commissions. Some sites make money from pageviews as well, and if it costs next to nothing to post 10,000 spam comments to get 100 pageviews, then it’s worth it for the spammers. Some spammers are also clever and will post a comment without a link to start off with, then post follow-up comments with links if the first comment was approved (to get around WordPress’ default comment spam behavior).

    Anyway, interesting analysis.

    • Thank you Takeshi yes SEO is certainly a shrinking part of comment spam as many of the links dropped now carry NoFollow tags. Yes you are completely right that it takes almost nothing to get 10,000 or 100,000 spam comments and some affiliate programs don’t care where you get the traffic from as long. So part of the issue might also be certainly affiliate programs make it so financially beneficial to spam blogs to get as many pageviews/clicks as they can.

      The other big problem I see is that many bloggers aren’t so strict on comments and will often approve what is obviously a spam comment just to lift the amount of comments on a post and even try and respond to the spam comment or just auto-approve everything.

      Yes the post something nice and then get past block one so you can spam the heck out of their blog is a nasty little trick but also one far too easy for spammers to resist.

    • Thank you Felix, yes but I was surprised both at the amount and the growth of spam comments from Asia. Bigger problem is that it seems there is also a growth in link building being outsourced to there and I don’t think any quality controls are in place based on what I’m seeing in the data.

  • Great post David, very interesting stuff.

    WordPress comment SPAM is getting ridiculously out of hand. If I were to turn off SPAM protection on my main blog I would easily see 1000 comments every day.

    What really baffles me is that the attempts spammers are making are getting worse, not better. Spintax is making less and less sense. Some spammers have gotten really savvy and are using gravatar connected email addresses which lends some believability but that is a small percentage.

    10k comments is such a small drop in the bucket when it comes to SPAM. Most 10k comments can be pumped out in under 10 minutes by 1 individual with a desktop PC on a DSL line. I’d love to see this same study with data from say Akismet (not at all discounting your efforts)

    Also regarding IP it should be noted that a lot of spammers either use open proxy servers, hacked proxy servers or a subscription based proxy service so location data is really hard to actually pinpoint. At the same time, just guessing Russia and China to be the winners of that race on a global scale.

    • Thank you Patrick very kind of you.

      I do wonder if WordPress really cares about the problem enough to make a decent effort to lock down some of these common exploits. So much of it’s automated without the visitor actually even viewing the post.

      Oh yes some of the quality of the comments can’t even be blamed on Google translate they are horrible. Here is just a few examples of how bad they are:
      Hey, kliler job on that one you guys!
      So excited I found this aritcle as it made things much quicker!
      A wonderful job. Super helpful ifnormation.
      Would you be enthusiastic about exchanging hyperlinks?
      I like your article, thanks! I also recommend (censored) website submitter.
      God, I feel like I souhld be takin notes! Great work
      Times are chagning for the better if I can get this online!
      Touchdown! That’s a really cool way of pttuing it!
      Your article was ecxlleent and erudite.

      Maybe WordPress could add a filter if a % of the comments are mispelt that it will block them from posting ever again?

      I do admit a vast amount of these are flagged instantly as spam but I do check once in a while as there can be false positives but even sorting/filtering through the spam is too slow and manual and could be improved.

      Yes I would expect larger blogs would be getting an insanely higher amount of comment spam and I did think about doing a followup test with 100,000, 1 million or 10 million comments and see how the data changes. The problem is that working with 25 different data points with just 10,000 comments was awkward in excel so scaling it up would require me to test fewer elements but I think I could make it scale.

      Ah yes good point on the open/hacked proxies I’m guessing that is one big caveat on the location data. But I would have thought many more spammers would be trying to hide their location is China not promote it if they are doing link building for Australian or English domains?

      • I’ve wondered that as well. WP releases all of these major fixes and releases to the framework but is more or less ignoring the fact that hundreds of millions of blog comments and trackbacks are being created every day.

        I mean, think about the amount of sheer electricity being used!
        Would happy to contribute all of my spam data to you as well.

        Yes your last paragraph re: Chines / English clients that does make sense but I think (based on my own analytical data) that a lot of them are promoting their own affiliate sites. These are the last few anchor texts from my SPAM queue

        michael kors
        burberry watches
        budget vps
        beats by dr dre
        casino
        garcinia cambogia
        make money
        buy X followers

        Lately it seems like spammers are just pumping up parasite sites, web 2.0s or just doing pump and dumps on domains.

        • Oh yes I did see some very interesting things with the higher level spam comments that actually seemed to promote multiple clients or sites in the one email. Wow your spammers are much more focused many of mine are really pushing for the long tail link building, note that I stripped out the trackback spam that seems purely focused on longer tail with 3-5 keywords in anchor.

          cloud
          car insurance
          nike nike blazer
          Tassen Louis Vuitton
          homepage
          carsters_net
          canada goose
          the latest louis vuitton handbags
          Célèbre Femmes Parajumpers Kodiak Vert Veste
          nike air max bw classic
          coupon code for iherb
          directorytelevision_com
          chaussure basket ball
          http://www.kopskorrea.com
          dog training
          cheat codes for cityville
          sandersonandcompany
          chaussur air max
          air max bleu

          I’ll reach out to you in a few weeks on the spam data once I chat to a few other folks that were interested in adding some of theirs and once I finish the other backlog of posts I have on my plan.

  • i have worked in B2B Portals, where 2 million companies are listing and we observed that our paid and free listing companies was getting spam links and comments from china.

    • Thanks Anoop for the insight, I didn’t find an easy way to filter the date to find out how many spam links in the author bio pointed to portals and directories. But I did see a bit of profile link building in the URLs so I’m sure it would be there in similar numbers if not higher.

  • Excellent analysis David, and I was also surprised to see the rapid growth in comment SPAM from Asia. It’s simply dwarfing the western hemisphere in volume, and I doubt it will let up in the next few years given the volume of population in Asia and especially China.

    RE: the 89% comments with no browser – which botnets and such did you identify? Also, is there any pattern that can be tracked from automated blog spam solutions like ScrapeBox? We don’t use it, but I’ve had colleagues insist it is undetectable if set up properly. We only operate within strict White Hat SEO boundaries, so this is mainly an informational question (so I can spot them on it before they get themselves into trouble).

    For my own website, which is also in WordPress, I managed to reduce the spam on the blog by using all 3 of the following:
    1. Akismet – in the early days, this was enough. After a couple of years blogging, though, it missed a lot.
    2. An .htaccess script that redirected no-referrer form submissions back to their own IP address. This one helped a lot, for a while. Even with the #3 below, we left this one live to discourage automated comment Spam.
    3. Move to Disqus. We finally gave up on the whole thing and decided to focus all comments on the discussion only. This probably discourages some real commenters from chiming in since there’s no link (unless they drop one in the comment itself), but we hardly ever have to worry about spam comments any more.

    Keep up the great work,
    Tommy

    • Thanks Tommy, very much appreciated your support and extensive response.

      Cloudflare partly confirmed they are seeing similar things on Twitter but waiting for an official comment from their team.

      I flagged Hotbar, Botnet (actually called that) and a few others based on the types of comments they left details they passed across into WordPress seemed to have a pattern. ScrapeBox can have a pattern but yes if you invest enough time and tweak the settings it can be harder to detect within WordPress comment management interface but more sophisticated sites can pick it up… everything leaves a footprint.

      If they are using IP addresses and the provider finds out they are running ScrapeBox you might find you get your access revoked and banned from using their service. Most of the proxy providers seem to keep a closer idea on behaviours than blog owners.

      I do have some research on moving to a different comment platform, probably one of those things I’m going to look at in the next few weeks. But yes kicking around a few just need some time to review options and get it implemented.

      Thanks again and glad you enjoyed the post
      David

  • Admiring the time and effort you put into your blog and in depth information you offer.

    It’s awesome to come across a blog every once in a while that isn’t the same old rehashed information.
    Excellent read! I’ve bookmarked your site and I’m adding your RSS feeds
    to my Google account.

    Admin Note: any questions about spam comment above please contact jeanna.menhennitt@hotmail.de

  • surely as if your website however you must confirm the spelling on several of your respective content. Some of options are filled using transliteration troubles we to uncover that really problematic to see the reality however I most certainly will definitely go back all over again.

    Editors Note: Thanks for the comment spam

  • I just now can’t disappear your site ahead of indicating which i definitely enjoyed the most common details anyone present for your friends? Will be returning incessantly to look into cross-check completely new posts

    Editors Note: Thanks for the comment spam

  • It’s a very handy item of data. My business is content you distributed this convenient information and facts here. Be sure to stop us up to date in this way. Thank you expressing.

    Editors Note: Thanks for the comment spam

  • I want to show some appreciation to this writer for rescuing me from this type of scenario. After surfing through the world-wide-web and seeing concepts that were not powerful, I assumed my life was done. Existing without the approaches to the problems you have solved by means of your good short post is a critical case, as well as those which could have in a negative way affected my career if I hadn’t encountered your website. Your main knowledge and kindness in maneuvering a lot of stuff was priceless. I don’t know what I would’ve done if I hadn’t come upon such a point like this. I am able to at this time look ahead to my future. Thanks so much for the specialized and effective guide. I will not be reluctant to refer your web sites to any individual who should receive direction on this subject.

    Admin Comment – Askimet cleared this spam comment (this concerns me) any questions email the original poster InmonIglesia87@googlemail.com who uses IP 188.129.63.41

  • Akismet no longer does it for me. It approves too many spam comments.I use a no hash plugin and this does a much better job because it actually detects how most spam bots work. Popular spam bots fail a hash check. Another common blog comment spam problem is the use of SPUN comment text. Ridiculous. Anyway, great post. The China spam observation is an eye opener. For sure!

    • Thanks Geno that’s actually a fairly decent tip, it’s something that I recall developers mentioning to me back a few years ago but certainly something I have to explore again. Great you enjoyed the post.

  • I’m extremely impressed with your writing skills and also with the layout on your
    weblog. Is this a paid theme or did you customize it yourself?
    Either way keep up the nice quality writing, it’s rare to see a great blog like this
    one nowadays.

    Editor Note: Any questions about this spam comment contact poster > clinton_ridley@freenet.de

Comments are closed.