|Montague Institute l Contents l Index l Digest l Courses l Calendar l Subscribe
When less is more
"Pruning" and "harvesting" the R&D knowledge base
While many knowledge base editors are still working to make information accessible to everyone who might need it, it's also necessary to cull the undergrowth, add value to what's left, and identify trends and patterns as the information is used.
But "pruning and harvesting" means different things to different people. In scientific applications, for example, it's important to acquire documents that may be used in patent applications. Furthermore, research scientists are often promoted and compensated partly based on how many times their work is cited in scientific literature. Comprehensiveness on a very narrow subject is important, and within a specialty, information providers have a near monopoly. For this and other reasons, scientific information is expensive, and companies don't want to pay for services that supply documents that aren't relevant.
On the other hand, comprehensiveness is less important in business applications. What's needed is sufficient current, accurate information to make a decision, solve a problem, or prepare a presentation. Information is often available from multiple sources (including interviews), and competition among information providers keeps the cost of secondary information (i.e. published documents) low. In this case, it's more important to find a low cost information service with a good search engine than to invest heavily in harvesting nuggets from large commercial databases.
The R&D challenge
Building an intranet (private, internal Internet) to provide access to all this information is the first step. Many intranet documents originate within the company, but as much as 50% can come from bibliographic databases (where each record describes an article or book) such as Compendex.
Separating the wheat from the chaff
In one high tech packaging materials company, a knowledge base publishing team of programmers, librarians, and engineers has developed a software tool and an evaluation process to meet these two objectives.
Pruning R&D information sources
1. Initial assessment. The team spends one to four weeks studying and analyzing a bibliographic database. How many relevant sources (i.e. journal titles, conferences) does it cover? How many relevant terms or descriptors are included in its "controlled vocabulary" (a list of standardized terms used to describe the contents of an article)?
2. Run trial searches. The initial analysis helps the evaluation team formulate some test queries, which are run against the database over a six week period. The results of these searches (similar to what you'd get using an Internet "search engine" like Altavista, Excite or Infoseek) are sent to subject matter experts. Their job is to weed out articles that are clearly of no interest. The goal at this stage is to eliminate the chaff.
3. Refine searches. Next the team finalizes its list of target sources for the database under consideration, refines the queries that will be used to extract information from it, and negotiates with the database vendor for access to the desired documents. The goal is to pay only for what the company needs without complicated accounting. In other words, once an article is made accessible on the intranet, anyone can read or download it. Not all database vendors, however, are willing to go along with this arrangement. Many require companies to specify the number or the identity of people that need access.
4. Run searches. The refined queries are submitted periodically to the targeted journal titles and the results are analyzed for possible purchase. The homework pays off. If steps 1 -3 have been completed and queries are structured properly, 30% to 60% of the citations retrieved using the system can be relevant.
5. Make a purchasing decision. The result of a search is a list of titles with a brief summary or abstract. These lists are sent to subject matter experts, who decide whether the full text of the article should be purchased and made available on the intranet. Typically, experts spend about fifteen minutes per day reviewing the "hit lists." Not only do they mark a citation for purchase, but they can classify it under a certain product, project, or technology (see the section below on enhancement).
6. Add to intranet. Purchased articles are added to the intranet, where they are automatically indexed and linked to relevant material. For example, the knowledge base publishing team developed software that automatically links a patent description from one database to a list of related references cited by a patent's author or examiner from another database. A commercial example of patent management software is Aurigin's Intellectual Property Asset Management system.
1. Avoid proprietary solutions. Managing information from a variety of internal and external sources means dealing with multiple file formats. Knowledge management programs designed for a specific application (e.g. collaboration, records management, scientific research) often cannot index or store data from another application. For example, an intranet search engine may not be able to find information contained in E-mail messages.
2. Separate the retrieval mechanism from the source documents. This makes it easier to add new file formats, accommodate new information sources, and "tweak" the search and analysis software to add new functions.
3. Build in porous boundaries. The knowledge base editor stresses the value of a system with porous boundaries -- one that allows peripheral information to come through. Solutions are often found around the edges of established research categories or even in another discipline altogether. A multi-disciplinary editorial team helps ensure that these sources are not excluded.