Tuesday, April 24, 2012

The Mystery Box

Google, Bing, and other search sites make finding information online look easy. You put a few words into a search box, like “chicken curry recipes” or “vacuum cleaner review” and one of the first few results is likely to be good enough. But the standard for “good enough” is quite a bit higher when you need legal information. So let’s think a little deeper about the search process.

As a researcher starting a search electronically, you need to know what you want to find, and how to ask for it.

“What you want to find,” in its broadest sense, is just information about a particular legal concept – your information need. For instance, you may want to find the elements of larceny in Vermont. Sometimes you may want to find the concept in a particular format, like a statute or regulation, and legal databases generally let you specify which of these formats you are searching (or to use post-search filters to pick a format). All of this is important, and could should be developed much further, but in the interests of blog post length we’ll let that lie to focus on “how to ask for it.” 1

Natural Language searching

One way to ask a database for something is to do a natural language search. In natural language searching, you put in keywords that characterize the type of information you are looking for. Think of it as the same as doing a basic search in Google.

The goal of a natural language search is to return responses to your query that the database thinks you will find useful, or relevant to your search. It makes this determination by looking at your query, and trying to find documents about2 the issue, based on an algorithm. The ones it calculates as most useful will be at the top of the results list, unless you choose to sort them differently. Westlaw and Lexis both have natural language options, and both WestlawNext and Lexis Advance default to natural language search, each using a complex proprietary algorithm. When you are putting in your keywords, try to use three to six, at least on your first search. Having too few keywords will make it hard for the database to judge relevance in the search results, and having too many might make each individual word less important in your results. You can phrase your search as a question, but there is no real benefit to doing so.

This process of finding and sorting relevant documents can lead to some interesting consequences. For instance:

  • The top ten or so search results could easily be just about the same, in terms of projected relevance.
  • None of the results could be particularly relevant. The system is going to give you a certain number of results, no matter what. As an example, try searching for case law on a dog flying an airplane without a pilot’s license.
  • One of the words you put into your search may not be in some of the results.
All of this highlights the importance of closely reading many of your results, not just the top one or two on your results list.

Natural language is a good way to start your research in areas with which you are unfamiliar, or when you are looking for broad concepts. Next time, I’ll focus on how to take more control of your search with “terms and connectors” searching.

Legal databases are moving toward allowing users to put a citation in the same box they use for search. However, that is really just a [welcome] nod to user convenience, and I do not include retrieving a document from a known citation as “search.”

This is rather simplified. To learn more than you would ever want to about information retrieval, see the online textbook Introduction to Information Retreival.

No comments:

Post a Comment