Save to Del.icio.us


A taxonomy for the Homeland Security Digital Library

October, 2005

by Angela Pitts with commentary by Jean Graef

In this article Angela Pitts, Taxonomy Developer for the Homeland Security Digital Library, tells about her experiences in selecting and using auto-categorization software and a new search engine to create a navigation structure for the library's Web site.

Following Angela's narrative are additional details of her work in a question/answer format with the editor.

Introduction
Soon after the Department of Homeland Security was established in 2002, its Office of Domestic Preparedness sponsored a project to assemble an interdisciplinary collection of electronic documents. Our job was to develop an organization scheme (taxonomy) that would make it easy for users to access the collection and help librarians evaluate and manage it. Under the direction of Lillian Gassie, former principal investigator and information architect for the HSDL, we implemented the taxonomy using Teragram, a rules-based auto-categorization program, and the FAST search engine. The system allows users to narrow the scope of a search by topic, event, or geography. ...

... plus more on these topics:

Purpose of the collection

Development of the HSDL taxonomy

Categorizing documents

Selecting the approach, evaluating the tools

How semantic rule-building works

Results of auto-categorization

Integrating auto-categorization and search

Questions
Q: Is your collection of documents 100% electronic? If not, how do you handle print material in your taxonomy? If all electronic, how are the documents stored -- in a file system on a Web server, in a database?

Q: Who are Content Developers and how do they use the taxonomy to evaluate and manage content?

Q: How did you deal with cross references between different concepts in different disciplines? Can users access definitions while they're browsing topics and terms?

Q. Did you include an A - Z index? If not, why?

Q: How do you deal with names for people, organizations, and products/services? Did you create a name authority? Do you use an entity extraction program to identify names in documents?

Q: Did you revise your taxonomy in the aftermath of Hurricane Katrina? If so, how?

Q: How do you determine the effectiveness of the taxonomy and retrieval methods? Did you conduct usability studies? Do you have anecdotal evidence that the taxonomy helped people solve a problem?

Q: How long did it take you develop the taxonomy (including rule development)? How much time do you spend each month on taxonomy maintenance?

Q: What do you use for metadata management?

Q: In your diagram "Categorization/search integration workflow" can you give more explanation about the "Start retrieve documents" circle? Who or what is doing the retrieving? How does the person or program decide whether to go to the Parse PDF files step or the Metadata repository step -- or are both paths always taken regardless of document format or source?