Save to Del.icio.us


Search strategies for multiple collections

June, 2006

The holy grail of corporate intranets is a single, one-stop-shop search box that will retrieve documents and data from all internal and external collections. For many people that means Google or one of its competitors (Yahoo, MSN, etc). In fact, no single search engine is capable of finding information stored in corporate networks, proprietary applications (e.g. document management systems), and commercial services (e.g. Lexis-Nexis).

Still, the search for the holy grail is active on several fronts. Search engine vendors (including Google) are adding features and forming alliances with academic institutions and commercial publishers. Corporate information managers are writing custom code that allows one search engine to scan multiple specialized repositories, and innovators like Amazon.com are launching experimental services that let users create their own one-stop-search service.

A common denominator in all these efforts is the ability to access, store, and manipulate metadata. The basic idea is that if you can't scan a content collection due to technical incompatibilities or access restrictions, you can at least get access to its attributes and descriptors (its metadata). In this article we discuss the problem of searching multiple collections, look at five strategies for creating a common search interface, and give some tips on how to apply them in a corporate environment. The five strategies are:

  • connector code;
  • XML feed;
  • federated (distributed) search;
  • metadata harvesting;
  • search engine/metadata "bridges."

Which strategy you choose depends in part on the characteristics of the target collection.

What's a "collection?"
For our purposes here, the word "collection" can mean two different things:

More ... (members only) How to become a member

Created on July 2, 2006 l Updated on July 3, 2006