Detecting when your blog posts get censored by Google (or any search engine) 2012-09-25

Goverments and companies keep approaching Google to “forget” certain URLs with the result of millions of URLs being removed from the search index per month, according to Google itself (see links earlier).

Now if you happen to blog about a risky topic, your blog posts (or any other kind of web page) may be removed from the Google search index without early notice. So you may want to know if (some of) your content still can be found easily.

My approach would be to

  1. Generate some random checksum (e.g. a SHA1, see below)
  2. Make sure that this checksum does not get any hits on Google, yet
  3. Embed the checksum in the post somewhere, maybe at the front or the very end
  4. Search for that checksum every few days
  5. If the result shows the post of yours it must be contained in the search index, i.e. it has not been censored
  6. (Automate the previous step)

On Linux I run

# cat /proc/sys/kernel/random/uuid | sha1sum
8f6a8cfc66bc3523eac19b1402568bc2ae7950ae -

to make a checksum for this very blog post. As it’s part of the post already, I can omit adding it to the end once more, neat :-)

I hope this technique works for someone. Good luck.

Creative Commons License
The Detecting when your blog posts get censored by Google (or any search engine) by Sebastian Pipping, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

3 Comments
avx September 26th, 2012

Nice theory, but imho to complicated and error prone. AFAIK it’s not guaranteed, that Google (and others) really pick up everything in a post – at least that happened to me.

Why not simply generate a random string and put it directly in the URL?

sping September 27th, 2012

If you have/take the liberty to make ugly long URLs, that may work well. Thanks for bringing it up.

avx September 28th, 2012

Why should it be ugly? Most simple way would be to use some post-id, which is usually static and provided by (f.e.) wordpress and drupal.

I’ve got my URLs setup as: domain.tld/$id/$yyyy/$mm/$dd/$title with some trickery in the background resolving domain.tld/$id to the correct post and thus serving as a short URL.

Leave a Reply