Internet Archive Wayback Machine

Dulloldfart

Squirrel Extraordinaire
If you know that the Internet Archive archives digital things, including websites, and you have seen preserved snapshots there of website pages from the past, you might get fooled into thinking the coverage was vaguely similar to Google's. If you search Google for something that exists on ESMB, for instance, you will usually find it indexed in Google within an hour or less. It's marvellous — complete coverage.

But it doesn't work like that! Not at all. I got curious a few days ago and looked up paulsrobot.com (not paulsrobot3.com) in the search engine (Wayback Machine) there, top centre on the front page at http://archive.org. It contained maybe 5 pages from the site, out of about 130. One thing you can do to rectify things, however, is to submit pages individually to them. How you can do this is:

1. In the search box (Wayback Machine), enter the root url, in this case paulsrobot.com.

2. Assuming this page has been captured (snapshot in archives), it will show you one of the captured pages, and in tiny blue print near the top it will say "20 captures" or however many it is. Click on this link.

3. You will see a timeline, years along the top and a 12-month calendar underneath, that shows every time the page has been captured. Click on a recent one, assuming you're allowed to access that page.

4. In this specific case, it shows the current page and the address bar shows https:// web.archive.org/web/20140518052252/http://paulsrobot.com/ (without the gap)

5. Now, you can click on a link to another page in that site, or manually type in a page at the top. Let's do, say, paulsrobot.com/session-audio-gen-start-of-session.html. Right now, this brings up a page that says

Hrm. Wayback Machine doesn't have that page archived. This page is available on the web! Help make the Wayback Machine more complete! Save this url in the Wayback Machine

6. That last sentence is a link. Click it, and that page as it is now will be saved in the archive for eternity (kind of). And will show a notice saying it is saved.

7. Now you can enter another page per number 5 above, rinse and repeat, until you run out of patience or urls.

From experience, if you have a lot of pages you want to be saved for posterity, start with a list of exact urls and go through it methodically. Otherwise you will waste a lot of time trying to find pages that haven't already been archived. You might think you could just enter a single directory, say paulsrobot.com/, and it would index all the pages in that directory. Unfortunately, no, so if you've got a lot of pages you've got a lot of work ahead of you.

-----

Who would want to do this? Anyone who wants to keep a snapshot of a web page archived for everyone. It may be your own site(s). Or it may be someone else's page that you think might disappear sooner or later and you think it should be preserved . . . .

Note that if you are the website owner (more or less) you can get your site pages deleted from the Wayback Machine, so it's not an infallible way of archiving others' stuff. If it's really important, archive it and additionally take a screenshot yourself.

Paul
 

Dulloldfart

Squirrel Extraordinaire
None of Facebook gets archived by the Wayback Machine. Not even the front page. None of Twitter. Precious little of ESMB. I checked a few websites/blogs of Scn-type people who add regular posts/articles to their sites: almost nothing has been archived. This means that the world is in danger of losing these documents for ever.

Does it matter?

Well, it depends, doesn't it?

Paul
 

Dave B.

Maximus Ultimus Mostimus
It's OK, I'm continuously archiving Twitter and Facebook on titanium plates and storing them in a mountain fortress for the benefit of future generations.
 
Top