Thursday, September 22, 2016

Darn Broken Links!

The World Wide Web has become the go-to source for information in our society.  We have easy access to about 47 billion web pages on the World Wide Web. This number only accounts for the surface web.*  In order to be accessed web pages need to have an address.**  Web pages can lose their addresses if their owner fails to pay the web-hosting bill, if the server breaks down, if the files which make up the web page are moved or deleted, or if the owner of the website simply decides to change the content.  We even have a name for when a web page loses its address: link rot.

This can be a problem when conducting research.  In academia, writers rely on the process of citation to show the reader where they found information.  In law, jurists and expert witnesses will also use citations to inform the reader of information upon which a decision or the expert’s opinion might be based.***

Prior to the internet the citation process was straightforward because once something was published in print it was recorded in a physical and unchanging form.  This is not so with web pages, since there are many reasons links may become unavailable over time.

So what can you do if you’re conducting research and come across link rot?  Well, there is one thing you can try—check out the Wayback Machine.

What is the Wayback Machine?

The Internet Archive Wayback Machine is a web archive which enables users to view webpages across time. Simply go to the website and enter the URL of the website for which you’d like to see past versions in their search box. If the website has been archived, you will be brought to a page with a bar graph indicating how often the page has been indexed.
The bar pictured is for “”
The bar graph consists of one box for each year since 1996.  Within each box are up to 12 black vertical lines which represent each month within that year.  The height of each vertical black line represents how many times the web page was archived that month. Clicking on the boxes will bring up a calendar for that year with days highlighted when a snapshot of the webpage was archived. To see the webpage as it was, click the highlighted dates.

Note: The Wayback Machine is a free service and is also very popular.  This means that sometimes you may sometimes get an error message because their servers are busy—be patient.

I recently had occasion to do some research on expert witness reports. In one of these reports (link requires Westlaw password) I was surprised to find a great number of the references to an advocacy website (PFLAG) were broken due to changes to the organization’s website. Of the five links which were broken in the expert report, I was able to retrieve all five using the Wayback Machine including one pdf file.  While your results may vary—if you come across a broken link while doing research the Wayback Machine may be a good solution.
* Surface web means the part web which is indexed by the commercial search engines (e.g. Google). The surface web accounts for about 4% of the total web.  Non-indexed parts of the web are called the deep web and dark web. To learn more about those click here.  
** Web addresses are usually thought of as Uniform Resource Locators or URLs. For more detailed information about URLs click here
*** Legal scholars have recognized link rot as a problem. See e.g. Jonathan Zittrain, Kendra Albert, Lawrence Lessig, Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations, 127 Harv. L. Rev. F. 176 (2014).

