Google’s Digital Dementia: It’s Forgetting Stuff
As if we didn’t have enough problems, there’s a mounting body of evidence that Google now has an attention span somewhat shorter than ten years. After ten years or so, Google forgets things. Or, perhaps, Google just can’t be bothered to index these older web pages, because there’s no money in it.
A commenter mentioned this after my post wherein I spoke of the pain of the Kink.com transition to their “new” (2016) Kink Unlimited product that broke many hundreds of my old links. It turns out that blogging pioneer and web-bones architect Tim Bray noticed the Google-dementia phenomenon about a year ago, writing that “Google has stopped indexing the older parts of the Web.”
Bray had discovered that his old blog posts weren’t turning up in Google searches even when he chased them with extremely precise search terms. I had noticed the same thing, but I assumed it was the “Google hates porn” filter that was killing me. (More on this later.)
Bray also noticed that Bing and Duck-Duck-Go were finding his old posts just fine. The implication is that it’s not some inherent “the web has gotten too big to index” problem, but rather it’s a deliberate choice by Google to focus on newer, fresher material. Bray:
My mental model of the Web is as a permanent, long-lived store of humanity’s intellectual heritage. For this to be useful, it needs to be indexed, just like a library. Google apparently doesn’t share that view.
Indeed.
A couple of days later, Marco Fioretti expanded on Bray’s post with his own examples of the things Google forgets, and had this additionally to say:
Unless we’re all missing something here, it seems more correct to say that Google forgets stuff that is more than 10 years old. If this is the case, Google will remember and index a smaller part of the web every year. Google may do so simply because it would be impossible to do more, for economical and/or technological constraints, which sooner or later would also hit its competitors. But this only makes bigger the problem of what to remember, what to forget and above all who and how should remember and forget.
Neither Bray nor Fioretti applied the term “dementia” to Google. I got that term from an earlier (2017) blog post by open-data maven Tony Hirst, that was referenced in the comments on Bray’s post. Hirst posits that Google is getting both paranoid (because of SEO and other factors) and forgetful. To Hirst, Google seems rooted in the past, crediting signals of link authority that people are mostly not using these days (publication of links on websites) and not able to properly weight or remember the social media signals that accompany most links modernly. It’s a different problem to be sure from the one that Bray and Fioretti highlighted, but the terminology seems applicable here too.
My observations, from my perspective inside the adult/porn parts of the web, are parallel with Hirst’s. Google’s digital dementia is even more severe with respect to adult URLs, because our #pornocalypse-driven exclusion from so much social media means that our links are automatically absent from so many of Google’s modern page quality signals and ranking algorithms.
Here’s my own example, showing the type of digital dementia Bray highlighted. There’s an ErosBlog post from 2005 called Dildoes In the Subway (that’s the post title.) As of this writing, if you search for those four words in quotes, Google will admit to knowing of four places on the web — including three on ErosBlog — where that phrase exists, but Google doesn’t seem to know that the post itself exists:
Bing? Bing still has possession of all its faculties, and returns the proper post as the first search result:
I’ve been seeing this phenomenon for years, but honestly? I just assumed it was a porn thing. Google hates stinky porn sites like mine, and is always pretending not to know about pages that are actually in its index. Usually what this means is that you haven’t used enough “porn words” in your search query to convince Big Brother Google that you realio-trulio want a porn result, so the porn result is being hidden from you for your own good. But that’s probably not the case here, because “dildoes” ought to be porny enough. And anyway, we can test this; adding the “site:erosblog.com” search filter should override the “it’s for your own good” anti-porn filters:
Nope! Google is being adamant here; it knows of three places on ErosBlog that mention this post, but the post itself? Not in the Google index any more.
Just in case you’re skeptical or curious, though, here’s what it looks like when you’re searching for an ErosBlog page that actually is (unlike the Dildos In The Subway page) in Google’s dementia-ridden memory, only Google doesn’t want to show it to you, because stinky porn. I wrote a post in 2005 called The Pony Girls Of Ancient Egypt that contains the unique-on-the-web (until I hit the publish button on this post) phrase “a charioteer boffing a woman”.
Google knows about it. Google hasn’t forgotten it. Google has the charioteer-boffing in its index, all right:
But apparently “boffing” is an insufficiently pornographic word to signify that I am an adult who wants to see porn, genuinely and truly. Because, even though I have all the so-called “safe search” settings turned as far off as Google will allow these days, here’s what Google pretends to know about my Egyptian pony girls once I remove the site:erosblog.com search constraint. That’s right, it’s Sergeant Schultz time: they know nothing! Pony girls? Boffing charioteers? New phone, new search engine, who dis?
Increasingly I find myself going to Bing when I need completeness in a search result. Google’s digital dementia, it turns out, is part of why that has become necessary.
Shorter URL for sharing: https://www.erosblog.com/?p=22990
And that is one reason why I follow this world: what is done very obviously to sexual speech on the Internet is often being done more quietly to other kinds a few years later.
Maciej Ceglowski has written about what will happen when the adware bubble bursts, but there will be a second stage where Google and FB go the way of IBM, Blackberry, Yahoo, and MS and some hired gun of a CEO tries even harder to monetarize their petabytes of personal information.
That search works just fine with the Yahoo search engine.
Yeah. It’s just Google, apparently, that’s dumping old pages from its index.
I’m pretty sure it’s an unintended side-effect of Google pushing for security, and devaluing sites that have not switched to https.
Well, old sites are less likely to have switched, so there’s rough correlation. But it’s very rough. So much so that I find myself largely unmoved by this theory.
The https theory would not explain the given example of Tim Bray, would it?
Not unless he’s upgraded his website to https since he wrote the post, which seems — given that he’s an early adopter of all kinds of web-related technologies — unlikely.
So, not so much Google Web Search, more GoogleWeb Search. Where the GoogleWeb is just that part of the Web that Google can monetise against. It’s just as well that I switched to DuckDuckGo for most of my searches.
It doesn’t help, of course, that Google and Facebook won’t let the other access their link data. Combine that with the de-emphasis of non-HTTPS websites and anything remotely adults, and we have walled gardens that don’t need walls, because no one will be aware that there’s an ‘outside’.
[…] I read a rather depressing post by Bacchus over at ErosBlog the other day. […]
Yeah, I’m finding Google pretty much useless these days. Altavista was better than Google now is, more than twenty years ago. I’m using duckduckgo now.
For researching into old and more obscure topics, Devontechnology’s DEVONAgent (Mac only, unfortunately) gives you a nice meta search engine on your local machine. It searches across all the standard search engines you specify, aggregates and compares the result sets and can filter results for spam, linkbait sites and so on – as well as extract images from the search results. Bacchus, I always thought that might be a tool for you :-)
(I am just a happy user of the product and in no way affiliated with the company)
[…] I read a rather depressing post by Bacchus over at ErosBlog the other day. […]