Thou Shalt Not Search Adult Tumblr Blogs
If you’ve got an adult blog on Tumblr, there’s a good chance Tumblr uses robots.txt to exclude the search engines from indexing it. Did you know that?
Two weeks ago in The Pornocalypse Comes For Us All, I wrote:
Who is next? My guess would be Tumblr. Tumblr is, of all the big platforms, perhaps the most porn friendly; there’s lots of porn on there and the Terms of Service do not prohibit it… But Tumblr is, famously, a popular platform in search of a revenue-generating business model. And we’ve learned that the suits have no loyalty to the porn users who made their platform popular. So, my bold prediction is that as Tumblr casts about for a business model, one of their steps will be to “clean this place up”…
And now, guess what? I’ve discovered that Tumblr uses robots.txt to bar all search engine access to blogs flagged as adult. If you’ve got an adult Tumblr, go look at your own settings. Do you see that first checkbox, the one that says “allow search engines to index your blog”?
That checkbox is a lie. It’s nicely checked, it’s not greyed out, but if your blog is flagged “adult” it’s a lie. Do you see the “Learn more about what this means” link under “Your blog was flagged NSFW” selector? It leads to this page, where Tumblr requests users to appropriately self-flag their blogs:
Please respect the choices of people in our community and flag your blog as NSFW or Adult from your blog Settings page.
- NSFW blogs contain occasional nudity or mature/adult-oriented content.
- Adult blogs contain substantial nudity or mature/adult-oriented content.
If you’re not sure if you should flag your blog you can leave it unflagged, but keep in mind that we might flag it later if we see a lot of mature/adult-oriented content.
To answer the question “What happens to blogs that are flagged NSFW or Adult?” Tumblr offers this handy chart. The key piece of information is the white space indicated by my red superimposed arrow:
That’s right — where the “Blog indexed by Google” row intersects the “Adult Blogs” column, we find a ringing silence.
Would you have noticed? None of the adult Tumblr bloggers I know ever did. I knew from my porn researching that adult Tumblrs tended to be poorly represented in Google search results, but I chalked it up to the sheer scale of Tumblr and Google’s growing bias against returning porn search results. Nope, I found out the truth in one stark moment of astonishment, summed up by this image:
Let’s click the “See wickedknickers.tumblr.com robots.txt page” link:
From me: Aghast. Fucking. Gulp.
In robot, that means, roughly “All robots: stay out!” No search spiders allowed. No Internet Archive crawler. The Wicked Knickers tumblr is there, but you have to know about it, or you have to be linked to it. You won’t find it in Google, you won’t find it in any other search engine that honors robots.txt, and when Tumblr decides to stop hosting it, you won’t find the pages in the Wayback Machine — it will be gone for good, lost to humanity unless somebody with the technical chops and outlaw sensibilities of Archive Team finds a way to archive it anyway, robots.txt be damned.
Wicked Knickers is just an example, one that has some meaning to me because it’s one of the first Tumblr blogs I ever noticed, and I’ve been linking to it since 2010. That’s almost 6,000 vintage erotica posts since January 2009, and none of those pages are in Google or the Wayback Machine. It was only when I twigged to that anomaly that I finally understood what Tumblr is doing to adult blogs.
In all the years that I’ve been preaching Bacchus’s First Rule (“Anything worth doing on the internet is worth doing on your own domain that you control”), I’ll confess that I never considered the power of robots.txt, or what it means to be putting stuff on an internet site where somebody else controls what robots.txt says. Not only do they control your visibility to search engines, they control whether history will remember what you said. That strikes me as a high price to pay for a “free” blogging platform.
It’s worth noting that there’s still rather a lot we don’t know about the Tumblr robots.txt blockade on adult Tumblr sites. Unanswered questions include:
- Does Tumblr have any flexibility on this? Would their support, if asked, remove or modify the robots.txt barrier in specific cases?
- When did Tumblr start using robots.txt to block Google from adult blogs? Has it always been like this, or is it a recent innovation?
- Why does Tumblr display the misleading checkbox that falsely implies that search engines can see flagged adult blogs?
- What is the actual reason for excluding adult Tumblrs from search engine and (especially) archive crawls?
In an unusual move for me, I actually reached out to press@tumblr.com, told Tumblr I was going to write this post, and asked them for answers to those questions. That was on May 11th. No response so far. If they ever do answer, I’ll be sure to update this post.
Similar Sex Blogging:
- Tumblr To Bury Adult Blogs Even Deeper, Begins Vilifying Them
- Bacchus's First Rule Of The Internet
- How To Back Up Your Adult Tumblr Blog
- How To Search Your Adult Tumblr Blog
- Yahoo To Buy Tumblr?
- Google Glass: No Fuck For You!
- The Pornocalypse Comes For Us All
- Son Of "Anything Worth Doing..."
- Web 2.0: Spammed Into Oblivion
- Seven Things Google Doesn't Mind You Searching For
- Cory Doctorow On The Value Of Links
- Why Blogging Services Suck
Shorter URL for sharing: https://www.erosblog.com/?p=9859
“When did Tumblr start using robots.txt to block Google from adult blogs? Has it always been like this, or is it a recent innovation?”
I’m not entirely sure, but I would guess that the robots.txt block is fairly recent. Just anecdotal evidence, but I remember that the result of a certain search phrase that I use from time to time “lost” most of the Tumblr results somewhere this year (March-April?). Until now I thought that Google had just adopted its pagerank algorithm…
Thanks, Endymion. I don’t know either, but I do “feel” like the block is somewhat recent. I don’t have a particular query in mind, but it seems to me I used to get many more Google results featuring adult tumblrs. Like you, I’d assumed Google was to blame for the change.
would this problem also affect searches from other engines than Google?
Fyooz, Yes. Very much yes.
Bacchus, I think I first noticed tumblr at all because of one of your posts about 2 2 and a half years ago. I immediatly started poking around for bdsm / kinky tumblrs…
I found one or two random, sparsely posted ones. I found dozens and dozens by following links. I would suggest this policy is at least that old.
(yes I am an old man of the internet. I still think gopher is a real hacker’s protocol. )
Fyooz, urtharda is correct. That robots.txt prevents EVERY search engine (that’s ethical and obeys robots.txt) from indexing the adult blogs. It would take a search engine that says “fuckit, I’m gonna ignore robots.tx” to get these adult tumblrs. And I’ve never heard of such a search engine.
I concur that this is recent. I use the reverse image search on google all the time to attribute images correctly and those results always tended to be full of tumblr posts. I thought they’d been a lot sparser recently and this would suggest why.
It seems deeply stupid. Goggle have already nerfed their image search to make it far less likely to return porn images unless you’re really explicit in the query about what you want. So I’ve no idea what tumblr think they gain by doing this (other than making their platform less appealing). It’s not like people were already getting flooded my tumblr porn in the images searches if they weren’t expecting it.
-paltego
Actually one further interesting data point to note. I just very unscientifically surveyed the first 15 tumblrs in my tumble list (http://www.femd...ages/ ).
Almost every one that had a simple unadorned robots file disallowed the root. For example: http://growlbad...s.txt
Everyone that provided links to a sitemap in the robots file didn’t disallow the root and was searchable. For example: http://continuo...s.txt
Note sure if that’s intentionally or accidental, but it certainly looks like a potential workaround for tumblr owners.
Also not it’s not strictly true to say you can’t index anything for robots blocked pages. Engines still have access to things like the anchor text data which gives a limited signal allowing them to index some sort of stub page. Obviously it’s a pretty poor substitute for actually indexing the page contents itself.
-paltego
Paltego, I don’t see how the sitemap links in robot.txt point to a potential workaround. Unless I’m confused, you’re just looking at sites that Tumblr has not flagged as “adult”. Unflagged sites, Tumblr allows free crawling and offers the robots a helpful set of site maps; flagged sites, the map goes away and the robots are told to stay out. But it’s all controlled by Tumblr, not by the blog owner. So where’s the workaround?
This whole “adult” blog is new. I thought I was crazy because I didn’t remember having that option when I created my blog last month.
I did a search and the cache of the page about NSFW blogs doesn’t say anything about “adult” blogs: http://webcache...=clnk
I think only “adult” blogs have the disallow in the robots-txt. My blog is flagged as NSFW (because the adult option wasn’t there) and it doesn’t have that – at least for now.
Another thing that worries me is that Tumblr don’t allow me to change from NSFW to Adult; the option is greyed out. I hope that doesn’t get me in trouble in the future.
(Sorry for any mistakes, English is not my first language)
Lilitthd, thanks for that data point and especially the pre-adult cached web page. (I’ve taken a screenshot for posterity.)
What’s interesting is that, apparently, before Tumblr made a distinction between “NSFW” and “Adult” blogs, their docs page said “When your blog is flagged NSFW … Your blog isn’t indexed by Google.” But it doesn’t sound like your NSFW blog got the robots.txt. Apparently there’s still much to understand about this.
In my very unscientific survey I was making the (possibly naive) assumption that since those tumblrs had all been around for some time and all featured adult content in every single post, they’d obviously be detected and tagged as adult. As you point out if they’re not tagged like that then it doesn’t mean anything. Although it does suggest tumblr’s porn detection algorithms suck.
Having read Lilitthd’s comment, maybe some are NSFW and some adult. Which seems a very odd distinction to make.
-paltego
Automated filters are always terribly imprecise, especially when somebody is first ramping them up, as seems to be the case here. It’s clear that a lot of adult blogs haven’t been flagged by Tumblr. Not yet, anyway.
[…] Tumblr users leave comments on my Thou Shalt Not Search Adult Tumblr Blogs post, it’s becoming clearer that the new robots.txt that prohibits search engines from […]
@paltego: apparently Tumbrl thinks the frequency of adult content is what is important. I don’t see any logic in banning blogs that post porn all the time from the search index while allowing blogs that “occasionally” post porn to be indexed.
And of course they won’t say how often is still ocasional. If I publish 10 adult posts and 1 not adult, is that ocasional? What about 5 to 1? Or 5 to 5?
This is just ridiculous.
@Bacchus: yep, that’s right. If I see any changes – either in the robots.txt or the option for flagging my blog as adult, I will report back.
I also made a capture of the cached page with Evernote.
[…] the one that doesn’t actually “allow search engines to index your blog”, this checkbox appears to actually work in […]
[…] (via Thou Shalt Not Search Adult Tumblr Blogs — ErosBlog: The Sex Blog) […]
Here’s a Tumblr user discussing the blocking robots.txt as early as April 23, 2013: http://3danceha...oogle However it sounds like the duality of “NSFW” and “Adult” blogs had not yet been implemented at that time.
More data points and we’ll eventually figure out just when this happened.
[…] to exclude the search engines from indexing sites labeled as “adult”. This was reported in accurate detail by Bacchus at the long-respected ErosBlog. (it must be noted that Tumblr does not seem to be using the Robots Meta Tag. Do you know about […]
Duck Duck Go “wickedknickers”
Less than 50 pages in the Duck Duck Go index and all of them say “We would like to show you a description here, but the site you’re looking at won’t allow us.” Duck Duck Go is just as constrained by the robots.txt as Google, sadly.
Duck Duck Go “wickedknickers” show all pages
[…] bought Tumblr. {everybody cries} Tumblr’s already locked away all the porn blogs so they can’t be searched or archived. Maybe it’s time to back up your adult Tumblr […]
I just googled “Wicked Knickers” and it took me right to the site. It did refuse to discuss the content, blaming robot.txt, but it did show me the site.
That’s as expected. This is about Tumblr making it impossible for you to search for the contents of these blogs.
What goodyear says. i googled “Wicked Knickers” and received the link on top, while without a description available, due to robots.txt – i guess this is location responsive, differing by country.
Hiding tumblr adult blogs from search engines started at May, everything was ok before. This came after Yahoo bought Tumblr.
No, it started at least a week or two before the Yahoo sale. But pretty obviously (in hindsight) because the negotiation was under way.
I’ve noticed this in the past month with my own blog. I initially flagged it as NSFW, ‘adult’ not being on option, and didn’t have a problem finding it in Google. Right now it’s still NSFW, and like Lilitthd, I can’t change it to adult (even though it fits the definition).
I’m not a big-time blogger, so page rank and the like doesn’t affect me too much – but if you’re familiar with Tumblr, you’ll know how absolutely useless their search function is when it comes to things other than tags. This is a main concern of mine. Searching for any phrases or words in the text of posts doesn’t work, so I have to do it through Google… but if my blog is flagged as adult, I won’t be able to navigate my own posts anymore outside of obsessive tagging.
That, and it’s my workaround for multiple tag searching. I’m not sure what to do now, with this new policy.
I just noticed it as well. It started happening around 19 April that my visits were plummeting from google analytics, and from what I see, it has cut my monthly visitors in half. Such a sneaky move to not mention it at all..
[…] may be next. Back in May, Erosblog noticed that Google wasn’t indexing or archiving Tumblrs identified as adult. Since adult tumblrs account for over 16% of Tumblr traffic, the potential impact is huge – and […]
[…] was among the first to discover back on May 15 that Tumblr was using an exclusionary robots.txt file to hide the contents of blogs flagged […]
The annoying solution,
1.
Clean out all NSFW text and images. Try to get the Tumblr pages indexed again.
2. Create another site with the NSFW contents.
3. Link to the new NSFW site.
[…] It’s time to be very clear about the words “NSFW” and “Adult” in the Tumblr context. They mean different things. Until recently, users were faced with two self-flagging options. This is the relevant screenshot from my May 15 post: […]
[…] I say “breaking the news” because Bacchus from Eros Blog had actually discovered this in May. […]
totally agree, it’s time to be clear about the difference beteween “NSFW” and “Adult” in the Tumblr context. They mean different things.
[…] the email does not say so, I predict that explicit-content blogs will go back to flying that robots.txt that makes them invisible to the search engines, too. No search-discovery for Tumblr […]