This quick post explains how to stop the notorious site scrapers, RSSing.com, from stealing your content. In fact, this technique can be used to stop virtually any site that uses HTML frames to scrape your pages. Once again, the solution is one line of .htaccess to the rescue.
Readers reach out..
Recently a reader asked about stopping RSSing.com from stealing their content:
Do you have anything or even have any interest in building anything that stops the feed scraper RSSing.com? I notice they’ve got some channels going on you, too. […] Google “Perishable Press + Rssing.com” for a typical Google listing. I discovered your channel listings by doing so looking to see if you already had a script out.
People have been stealing my content for over 10 years now, so I’m very used to it. Still I think it’s bad practice, so I decided to pop on over to the alleged site and check it out for myself. Sure enough, there are over 30 article summaries posted, each of which links to a framed version of the complete article. And not just for this site, some of my other sites also are scraped.
Who/what is RSSing.com
So what is RSSing.com? Who cares. Apparently it’s just another site that likes to steal other people’s content instead of doing something unique or helpful. It doesn’t matter, really, and honestly I’m not even going to block them because I can always use the extra traffic. And besides they don’t outrank me on anything important so double no cares given. I’m sharing this information for my readers and to help fellow seekers of useful security techniques.
Maybe first try asking..
Before pulling out the big guns, maybe first try just “asking” the RSSing folks to kindly stop stealing your stuff. They even have a contact form all set up for this very purpose. Not sure if they honor all requests immediately or what, so if you happen to have experience with this strategy, please share in the comment section. Here is a screenshot to help you find it:
FWIW IMHO they’re the ones who should be asking to use your content in the first place. Not the other way around. Putting the burden on everyone else is just not cool. Anyone who assumes that everyone wants their content to be stolen is utterly clueless.
Knock ’em dead (kid)
If you’re reading this, I assume you want to block RSSing from framing your content. The first thing to understand is that they are using two different methods to scrape:
- They scrape and post excerpts directly from your feed (cached in their database)
- They scrape your full post content via HTML frames (not cached in their database)
So the scraping via feed excerpt is not such a huge deal, and really is difficult to prevent since they are housing your content in their own database. Anyone who publishes their content via RSS feed is subject to this sort of thing. Nonetheless, if you are serious about stopping lowlifes from stealing your feed content, check out my article How to Deal with Content Scrapers.
# break out of frames <IfModule mod_headers.c> Header always append X-Frame-Options SAMEORIGIN </IfModule>
That little snippet tells the server to include an
X-Frame-Options header along with responses to all requests. The value of this header is
SAMEORIGIN, which means that any frame request that does not originate from your domain will be blocked. So you can use HTML frames all day long if they originate from your own site. All other domains, however, will not be able to frame your pages. That is, until some clever lazy content thief figures out a way to bypass the restriction. So apply and be done, but keep an eye on things and stay vigilant.
So for now, it’s bye-bye RSSing.com and bye-bye content framing in general.
For those who are wondering about the effect of the previous .htaccess technique, here is a screenshot showing how my scraped pages were displayed at RSSing.com before applying the prescribed snippet:
And here is a screenshot showing how my scraped pages were displayed at RSSing.com after applying the .htaccess snippet:
As mentioned before, I am not blocking RSSing from framing my content. These screenshots are for demonstration purposes only. Basically if you employ the previous .htaccess technique, all framing pages at RSSing will display blank white pages inside of the frames. Definitely should be sufficient for getting Google to rank your pages higher than those framing your content.