How To Back Up An Adult Tumblr (2018 Edition)
OK, Monday was our day to smack foreheads and react to the Tumblr porn ban. Tuesday was our day to mourn the loss of communities. For people who need a new home, yesterday began the migration process, or at least the search for a destination. Today? For me it’s a day to act. I’m a porn curator, and I’m aghast at the collections that will burn on December 17. What little I can do to fight those fires starts today.
Five years ago when I first saw this shitshow coming, I researched and advised the community on backup options for adult tumblrs. But since then, Tumblr has taken several technical steps to make porn tumblrs harder to view or copy. And fundamentally, I’m a technical idiot. I gloomily thought all my old advice was obsolete, which is why I wrote this fundamentally wrong depressed Eeyore shit on Monday:
I haven’t tested, but I don’t think the advice in that post would work any more, now that you have to be logged in to Tumblr even to view your own porn blog. (I could be wrong.)
Reader, I was wrong. After consulting with my good friend Dr. Faustus at Erotic Mad Science, who is technically sharper than me and who has spent much of the past five years diligently rescuing material of interest from Tumblr and turning it into a deep and intricate set of self-hosted blogs, he pointed me at some instructional material that set me straight. It turns out that I can pick the best backup solution from my old article, add one screenshot with a single new setting (the password/login you need to be able to see porn on Tumblr), and we are back in backup business! (Product best when used before December 17, 2018, contents may settle in shipping.)
How To Back Up An Adult Tumblr
These are instructions for using HTTrack/WinHTTrack Website Copier to make a local mirror of any Tumblr in a folder on your hard drive. HTTrack is a free-software GPL general utility for copying and mirroring websites, available for most current versions of Windows as well as for a wide variety of Linux/Unix flavors. The Windows version presents a fairly old-fashioned interface with a bunch of cryptic options, but most of them come pre-set with sensible defaults that you actually don’t need to mess with. Plus, there’s good documentation.
Download the software and install it. Windows will probably try to scare you out of completing the install. Be bold, be brave. Trust your open source software developers.
Now run the the software. You’ll be presented with a welcome screen where you need to click “Next”. Then, this screen:
The arrows show you the two fields that need your attention. All you really need to do is give this backup project a name and tell the software where to save the backup. Then hit “Next”:
The two red arrows above screen point to mandatory buttons that need our attention. The first one opens the screen where we tell Tumblr the URL of the porn tumblr you want to download. It will be something like: http://yourtumblr.tumblr.com — this doesn’t have to be your tumblr URL though, it can be any tumblr you want to save, like one that nobody has updated in five years that you know will be going dark in two weeks. This is also where you input your Tumblr login information, to prove to Tumblr that you’re special enough to view porn. This does have to be your login, no matter whose blog you are downloading.
The next screen is the one you see when you push the “Set options” button, which is absolutely vital. When you do, you’ll see this:
I’ve pointed arrows at three optional settings tabs that you may want to adjust, and at the one mandatory options tab where you must change a setting. I’m going to ignore the optional ones for now, except to say that you would tinker with these if you wanted to change the sorts of media files you’re saving beyond the basic .gif, .jpg, and .png (you’d need to do this if you were saving a Tumblr that had .wav files or .zip files or .mp3s), or if you need to limit this program from slamming your internet connection too hard. It’s the mandatory “Spider” tab you really need to click:
See the box where it says “follow robots.txt rules”? Change it using the drop-down menu to say “no robots.txt rules” so that this software (your robot) will know to ignore Tumblr’s robot-hostile electronic “keep out” signs.
Yay! We’re almost done. Hit the “OK” button, hit “Next”, hit “Finish”, and your site copying should begin.
How long it will take to finish depends on the available bandwidth of your net connection, the memory and processing speed of your computer, and on whether you tweaked any of the options that control things like how many simultaneous connections your computer is making and how many files it’s trying to download in parallel. It also depends on how many pages there are on the Tumblr blog you are backing up, and on how big the images are. The default settings seem to be fairly gentle about not maxing out your internet connection or putting an unruly amount of strain on the server at the site you are trying to copy. Using default settings and a fairly crappy internet connection, I downloaded an ancient porn blog overnight last night. It had 3,300 posts and took up 1.8GB on my hard drive.
What does success look like? You’ll have a folder on your hard drive with the name you provided on the first options screen. If you open it, you will find many sub-folders, and much that may seem mysterious. You should also find a file called “index.html” — and if you click on it, it should open in a new browser window where you’ll be looking at your backed up Tumblr site, using nothing but the files on your hard drive.
What have we not accomplished? Well, you’ve made what should be a full and true copy, but it’s not a nice clean export in some standard format that you could use to easily import all your posts into another content management system or blogging tool. HTML files and related images are scattered through a system of directories and subdirectories that, while logical, may not be the simplest thing to work with. Using the data you’ve got, a clever computer person could generate an XHTML document (or something similar) that could be semi-automatically imported into (say) WordPress. But it would take parsing; it would take work. Figuring out how to take the copy you just made and turn it back into a non-Tumblr website is a solvable problem, but how easy or hard it might be to actually do it depends on your access to computer expertise and tools. For now, you’re safe in the knowledge that you’ve got all the posts you’ve made this past however-many years. You’ve got the images, you’ve got their metadata (any tags you set for them and any credits you may have reblogged or included) and you’ve got the clever things you said about them, all, safe on your hard drive.
Now would be a good time to back up your hard drive. I’m just sayin’.
Additional caveats and warnings and gotchas and thoughts:
1) This might or might not work on your own porn tumblr after December 17. Tumblr hasn’t said it will delete porn, just that it will make it invisible to everyone but the owner. But I don’t trust them not to continue deleting stuff. I wouldn’t wait. And for any blog you don’t own, this will almost certainly stop working. Every old porn tumblr that is no longer being updated? This is a way to make a copy on your hard drive. Hurry! Don’t wait. Hell is coming, but so is December 17. December 17 will get here first.
2) You have to type your Tumblr login and password into HTTrack to use this method. There’s a good chance that login information gets recorded in the many complex files HTTrack writes to your hard drive. If you ever turn those files over to someone who is, say, helping you convert those files into a WordPress blog — or if, say, you send them off to the Internet Archive for posterity — your password could be exposed. It might be smart to set up a “burner password” on Tumblr that you never used anywhere else and will never use again before firing up HTTrack.
3) There is a method offered by Tumblr for doing a fancier and more complete export of the Tumblr blogs that belong to you, including some of the social interaction metadata. I have not covered it here because it appears not to be very reliable; in every case available to me for testing, it failed permanently with a “processing backup” message that never ends or goes away. So I decided not to waste your time with the instructions for making that attempt. I have heard precisely one account on Twitter of a successful official export.
I hope this helps!
Shorter URL for sharing: https://www.erosblog.com/?p=21921
I have to disagree on the Tumblr backup. I just did the export on four of my Tumblrs, each containing several thousands of photographs. It did take a long time, over 24 hours with “processing backup” showing but eventually it did change to “Download” and I successfully downloaded all the content as a ZIP file. So I would suggest trying this method, but be very patient.
Fred, glad it worked for you. In my case, I have access to a handful of old Tumblrs, most old, small special projects of various kinds dating back six or seven years. Every one of them has been “processing backup” for 48 hours or so, and this includes tumblrs that have less than twenty posts. I saw one person on Twitter who has been “processing backup” for two months! I just don’t feel right advising people to “be very patient” when we are up against a two week deadline under these circumstances.
I used the official Tumblr download method last month when I closed my account. I just set it going, then closed the browser tab. Eventually, I got an email from Tumblr to say that the download was ready. Mine clocked in at over 25GB zipped, 28GB unzipped. I had been (re)blogging there since early 2013, so not a huge surprise.
It seems that Tumblr’s restrictions for accessing adult content (i.e. being forced to log in) have long only applied to users coming in with normal Referer strings. If one uses a browser plugin to spoof those, and appear e.g. as the BingBot search engine crawler, access is still possible without login. Sooo… I’m pretty certain that is a useful tidbit of information for those trying to spider things at the last minute.
Perhaps I am just one of the lucky ones. I take your point that we have a hard deadline and it has to be backed up before deletion by Tumblr. It’s down to what works, and an instant fix that you suggest is probably the best way forward given the restrictions.
Good luck to everyone.
[…] post this morning gave you the tools to backup and save your own porn tumblr, plus perhaps a handful of others. Good. You’ve carried your fully-loaded book bag safely out […]
I just tried this, and while it worked perfectly to backup my two sfw blogs, even with fixing the spider settings and messing around, trying numerous times, it kept giving me the “empty mirror” error message and not processing anything. I’m trying tumblr’s export now as the blog has less than 500 posts total, but I’m still uneasy that this otherwise perfect system suddenly isn’t working for explicit blogs
That’s disturbing, M. I have not seen that error message; it’s worked (so far) for me on every blog I’ve run it against.
Did you remember to change the setting to ignore robots.txt? Because that’s the one setting that would allow SFW blogs and balk at NSFW ones. I think I see where you said you changed it, but it’s all I can think of.
I keep getting the mirror error message too, and I followed all of your instructions. Oh well.
idk if im doing something wrong but its not working for me. i followed the steps, and made sure my blogs werent hidden but i kept getting a mirror error from 2 different blogs of mine
quote
**MIRROR ERROR!**
HTTrack has detected that the current mirror is empty, if it was an update, the previous mirror has been restored.
Reason: the first page(s) either could not be found, or a connection problem occured.
=> Ensure that the website still exists, and/or check your proxy settings! <=
end quote
I am one of these people who can follow instructions and then spell them out in plain words when they work for me, but still be at a complete loss when they don’t work. Which is my situation here, sadly. I don’t know what these mirror errors mean. I am really hoping an HTTRack expert will parachute into this comment thread and miraculously explain to all of us what is going wrong, for the people who are having trouble.
I believe I may have at least identified part of the issue; for my blogs that tumblr assigns https:// to, httrack will not mirror, but it has no problem with any blogs that don’t have the s after http. Strange, but consistent, in my case.
Sorry to double-comment; but I found that my problem was that blogs set to “always serve over ssl” will automatically be set a https:// adress, and turning this off will let it be able to be backed up, and take that off the url. However, blogs marked explicit don’t have this option, and it’s on automatically.
M, no worries about multiple comments, all clues are welcome!
Tumblr has a lot of inconsistencies. The SSL https stuff is indeed a useful clue; it is not surprising that blogs served securely are either impossible to mirror with HTTRack or will require some sort of shenanigans with settings or proxy servers that we have not been briefed on.
However, among the test blogs I have access to, I have at least one that is explicit and the SSL slider is still free to slide back and forth. Doesn’t mean you aren’t seeing explicit blogs that are force-locked with SSL on; I believe you. Tumblr is capricious. I’m just saying it’s not — thankfully — universal; because if it were, I wouldn’t have been able to be pulling favorite porn blogs down non-stop for the last few days.
If HTTRack only works over HTTP then the solution is to use a different crawler. I have just googled for
https crawler ssl download tumblr
and there appear several good-looking candidates to use. Hopefully, this will help someone.
I figured out how to get around the https:// issue… on this window (http://www.eros...r.jpg), after you fill in the address and login information on the Add URL button, a complicated-looking URL will be generated on the text window below.
It starts with http://.
Just add the s.
That was it.
Woot! Good job PVP, thank you!
Thanks Bacchus for this page.
And thanks to pvp for getting https instead of http straightened out.
Tumblr took about 36 hours to get my zip file ready to download.
Though I left one too long and got a “Link has expired” message when I tried to export it.
The last time tumblr shows that the export file is ready to go.
So I try downloading the tumblr export zip.
Five times I got a “failed” download.
Using HTTrack/WinHTTrack Website Copier with the clear instructions of Bacchus was easy.
And once I added the s to create https the download started.
Though I am getting a lot of bad messages:
Warning: link is probably looping, type unknown, aborting: https://XXXXX.tumblr.com/post/180889444444/amospoe-when-did-the-future-switch-from-being#_=_
Looks like tumblr is deleting a lot of images before I can download them.
Very friendly of them. Thanks Verizon. /s
[…] I had a bright idea. Hey! Isn’t this one of the Tumblrs I backed up myself using HTTRack? I checked, and it was. Glory! I should have that post saved right here, on my very […]