Insight into Blog Spam
Everyone hates blog spam no matter if it’s comment or trackback spam but I thought I would look into it and see if there is any common elements and yes I’m amazed how much it’s still used. So I thought I’d gather some comment data from this blog but also SEOMeetups.com.au that is regularly spammed as people try to build links and drive referral traffic back to their affiliate links, social profiles, youtube clips, clients websites and their own sites.
I managed to export and filter 10,119 spam comments and started playing around with the data in Excel to see what insights I could learn from blog spam left from the 31st July 2011 until the 12th January 2014.
Following items of interest explored
- Did the spammer have an email
What email service did they use
- What was their email domain extension?
What was their author URL on comment
- Was there any targeted deep linking campaigns?
How many comments also contained links?
- What was the average spam comment length
What does their spam commentators IP profile look like
- What are the largest regions for spam?
Is there growth in comment spam?
- Top cities IPs for comment spam
Is there a best time for moderation?
- Top spam browsers
Do spammers always use email?
It seems around 33% of the time that they don’t leave an email address and that’s because it’s trackback or pingback spam. There are plugins that can stop a bulk of this and they are already in place but there is so much that a % does get through but maybe I can find a better spam solution.
What email platform do spammers often use?
Note that this is not perfect as spammers will can easily leave fake/junk emails but it seems that Gmail appears to be the preferred platform. I found that 58.2% of all spammers were using Gmail as their preferred solution I assume part of the favoritism is that you can “send as” and easily forward emails from the spam Gmail email to their real accounts if they really wanted. The ease of creating a churn and burn Gmail account is a big part of the problem as Yahoo (5%), AOL (4%) and Hotmail (2.8%). I’ve noticed that there is a long tail of 467 domains that include many churn & burn domains that make up the remaining 29.9% of spam emails.
What is the spammers email domain extension?
So what is the most popular domain extension that spammers are using? Once I removed “blank emails” I found that .com extensions were linked to around 82.6% of all spam but surprisingly .de represented 11.3% of spam. The other domains .net (2.8%) .org (0.9%) .co.uk (0.5%) .ru (0.3%) .pl (0.2%) and .es (0.2%) with the remaining 28 domains only representing 1.2% of all spam.
What is the comment spam URL?
It’s interesting that 1.59% of the Author URLs were for https:// and 0.79% were left blank. It seems looking at the stats below that Penguin doesn’t appear to have greatly changed spam link builders strategies as they are still pushing for the deep links and much of the oldest links are to domains it was only from the start of 2012 did a majority of tactics switch to deep linking. Over the entire period of data 65% of the link building that was tried was deep linking and 12 actually just failed when they so low quality they added a broken link.
What about targeted campaigns?
Around 10% of all deep links could be easily categorised into targeted campaigns to promote YouTube videos, increase link authority to Wiki spam, profile URLs and interestingly social profiles. As you can see from the graph below a number of these made up less than 1% of deep link spam and it’s great to see that SEOs might have finally given up on Squidoo. The last comment spam to promote a Squidoo URL was on the 14th April 2013 from a North American IP address, it seems European spammers gave up on spamming links to the Squidoo platform at the start of 2013. This data doesn’t include the various types of links that were dropped into the comment it just includes links left in the Author profile URL.
Yes it seems still around 21% of comment spam contains links and 7.6% of those comments with spam links were generated by trackbacks and pingbacks. Of those trackback/pingback spam with links in the comments I found 90% were traced back to Asian IP addresses that shows more automated techniques are used by Asian link builders. If you look at comment spam with links Asia still leads with (65.2%) with North America (16.5%) and Europe (14.1%).
I only found 1 Australian IP address was used but found 13 comments dropping links had an Australian domain in the Author URL using 11 North American IPs and 2 Asian IPs. Looking at a general trend with link dropping it seems the spammers are are getting smarter and now adding in NoFollow links to the comment spam so they get the referral traffic with a potential lower chance of a penguin penalty. 72.1% of links are dropped carry the NoFollow tag which in the chart below you can see this tactic aggressively spiked up in 2014 following both domains increasing their SEO authority and number of blog posts that could be spammed.
I did also notice that a number of the comment links are for pharmaceutical or fashion links and many of the pharma links appear to be from comprised sites to other compromised sites. It’s hard to easily break down the comment links as sometimes there just one or two but often several per post sometimes it’s HTML code with specific anchor text and other times just a straight link but much of the comments are gibberish or spun comments.
These are not comments along the lines of that’s a great article but I found this point incorrect and here is a reference link or I explored this in more detail over on my blog and here is my link these are obvious low quality link building & spam tactics.
Because of the sheer number of comments there is a small chance that there are some real comments buried deep in there somewhere. Please note that I did find a handful suitable comments but the authors name was a anchor text keyword and a deep link done for SEO but since the email was just a throw away address so I nuked the comments as it’s my blog and my choice to publish the comment or not. These type of SEO motivated comments will hurt hurt your site if you auto-published them as per Google user-generated content guidelines.
Insight into length of comment spam?
Looking at the data the average comment spam post is around 450 characters including HTML code and links. But if I break that down further it’s trackback/pingback spam is around 270 characters while comment spam is far longer at 540 characters.
The maximum length of a comment spam comment was 32,759 characters which was a rambling comment about ebooks that was linked to a author name “medical billing business” and the profile link was a link to a Squidoo page. The maximum length of a trackback comment was 4150 characters contained html code, random user agent referral data and some links as was linked with the domain as the authors name which and the profile link was to a Blogspot spam blog about fashion items masked by a .com domain.
The shortest spam comment was 0 characters and that was linked to an author name “Elizabeth Macey Sageturema” that was linked to a domain boarda_info that redirected publications.usa.gov from a Berlin IP address. The shortest trackback/pingback comment was 29 characters and was linked to an author name “Ugg” and the profile link was to Alresco_com which looks like an official page for Ugg Australia for Denmark which came from a Chinese IP from the city of Fuzhou.
I did notice a number of what seemed to be harmless comments but with enough data you can see that IP address has spammed before. The reason that people will spammers will do this is that they hope once you approve the first comment they can then start spamming your blog without you noticing. The default setting in WordPress is set to manually approved so WordPress could change this setting and stop a huge amount of spam instantly on any new WordPress blog.
What does their IP profile look like?
I found that there was 10,119 comments from 4432 unique IP addresses but the interesting insight is that 51.8% of all my comment spam was originating from Asian IP addresses. North America IPs were half as prominent which is surprising considering the amount of cheap IPs available that people usually use to mask their comment spam link building tactics.
What is the regions largest year in blog spam?
From the chart below sorted from the highest number with Asia at the top down to The Caribbean with 1 IP address you can see the shift in spam is moving towards Asia. 2012 showed a large number of static IPs that appear to have pretty much vanished as these IPv4 addresses started to be sold or taken back by ISPs and hosting providers.
I looked at the top 3 regions linked to spam and you can see the massive uplift for Asian IPs and so far the 2014 spam comments just from Asia are already 22% higher than the entire spam comments for 2013. So far for the comment spam for the first 12 days of 2014 is actually 56% higher than for all of 2013 which has me kinda worried as this puts spam comments up from 8.86/day to 417.83/day. If this comment blog spam continues at this rate it will put the combined comment spam for the two blogs at around 152,209 comments by the end of 2014.
What are the top cities for blog spam?
The winner with 28% of all spam in our data set are IPs from Fuzhou followed by Guangzhou (11%) and surprisingly Chicago (5%). Some honorable mentions go to comment spam IPs from Perth (58.3%) for Oceania and Guarujá, Brazil (19.2%) for South America.
If we break this down this data for Europe only the top cities are Roubaix, France (18%), Czempin, Poland (7%), Stockholm, Sweden (6%), Gunzenhausen, Germany (4%), Kiev, Ukraine (4%), Rivne, Ukraine (4%), Milan, Italy (3%), Bournemouth, England (3%), London, England (2%), Madrid, Spain (2%) and Rotterdam, Netherlands (2%).
If we break down this data for North America only the top cities are Chicago (18%), Kansas City (13%), Henderson (6%), Clarks Summit (5%), Miami (5%), Las Vegas (4%), Dallas (4%), Buffalo (3%), Phoenix (3%), Los Angeles (3%), Seattle (3%), Toronto (3%), Scranton (2%), San Jose (2%), Montreal (2%), Tampa (2%).
So how is 2014 tracking so far?
Based on the data it appears most US cities will easily surpass their 2013 spam comments numbers it appears that the following cities have already aggressively stepped up spam in 2014. Compared to their 2013 count this is already how much higher their current 2014 count is for Kansas City (+42%), Henderson (+31%) Miami (+216%), Las Vegas (+197%), Buffalo (+108%), Seattle (+107%) and Montreal (+56%).
It seems that spammers from European cities IPs are changing their tactics as while I think a several will pass their 2013 targets fairly easily they don’t show the aggressive growth of the US spam. The number of spam comments from European IPs are also much more spread out with 191 different cities generating comment spam compared to only 142 for North America. But the European cities IP addresses linked to spam that have already passed 2013 numbers are Bournemouth, England (+154%), Wroclaw, Poland (+40%), Manchester, England (1%) and Bucharest, Romania (350%).
So does this mean people are faking/spoofing their IPs to come from these cities? I’d be interested to hear from SEOs & Bloggers in these regions, target these regions or attract a lot of visitors from these regions have you seen a similar trend?
Is there a best time for moderation?
I had thought that there would be a specific time of the day but because of the global network of spammers it seems that they are spread out the comment spam to be around 4% each hour. There is a large spike from before 3pm until after 4pm and also from 6am to after 7am but no real downtime with the lowest number from 11pm until around 1am GMT.
What browsers are spammers using?
It seems that 89% of spam comments browsers can’t be easily identified which indicates that a vast amount of the spam is automated via bots and software. I did notice a small number of known malware agents also placing comments and a tiny amount of known botnets spamming the blogs.
Solutions to reduce comment spam?
This post focused on WordPress comment spam, so if you are using WordPress make sure your site is secure and read my Better Protect WordPress guide, make sure your CMS version is updated/patched regularly, and you have a strong password. If you are using WordPress make sure you have turned off auto approvals and before a comment appears the comment must be manually approved, force comment authors to fill out a name and email and considering closing comments on articles older than a few months. There are plenty of other free solutions for personal blogs such as Akismet and Cloudflare that can help slow the torrent of spam or block the IPs from regions that are common for spammers.
Edit: I’ve added a infographic Akismet created at the end of 2013 and linked to their 2013 review on spam, they are also showing a massive rise in spam comments and are catching on average around 200,000,000 spam comments each day which is a 40% rise on 2012 numbers.
Thank you for reading this far!
I hope that you enjoyed this post and welcome your comments and hopefully not too many negative comments from people that use this tactic for their link building or promotional methods. Feel free to leave a comment below and I’ll respond or delete it if it’s automated spam.