Share


SharePoint Search: An Enterprise Contender?

February, 2008

by Jean Graef

This article was originally published in the Enterprise Search Sourcebook, February 6, 2008. See also SharePoint Thesaurus Web Part

Is the search component of Microsoft’s SharePoint suite a viable option for enterprise search? Some of our members have already chosen it, some have tried and rejected it, and many more are considering it as a serious contender. Gartner lists it along with Google in its “Challenger” category.(1)  The reason is that, with the 2007 release of the product (now called Microsoft Office SharePoint Server or MOSS), SharePoint search now has most of the basic features we’ve come to expect in enterprise search along with low cost and tight integration with existing SharePoint installations and other Microsoft applications. As one person put it, “It isn’t the best in class, but it’s good enough.”

Whether you deploy MOSS for enterprise search depends on your technology strategy and budget, how much you’ve invested in metadata and taxonomies, and how you plan to search multiple content repositories. If you use SharePoint for collaboration and content management but choose another product for enterprise search, you’ll need to consider two kinds of complimentary products:

  • taxonomy management programs that integrate with MOSS search;
  • search engines that can search SharePoint content.

Either way, you’ll need a strategy that integrates SharePoint’s bottom-up (decentralized) publication and management model with the top-down (centralized) enterprise search deployment model. You want users to be able to find resources – documents, Web sites, people – regardless of company location or technology yet not be overwhelmed by the minutiae of documents generated by local collaboration.

MOSS 2007: A big improvement
MOSS 2007 search, a big improvement over the 2003 version, provides the basic functionality we’ve come to expect from search engines such as:

  • search “scopes” that allow users to broaden or narrow a search based on a content collection (e.g. intranet, department, or team);
  • security trimming, so users only see those results that they’re authorized to access;
  • synonym suggestion (Did you mean …?) and term highlighting in search results;
  • improved relevancy and ranking algorithms using such factors as click distance, hyperlink anchor text, URL depth, and metadata extraction;
  • editorial control over which documents show at the top of the search results through “Best Bets” and “Authoritative Sources;”
  • greater control over what content is included in the search results through crawl rules, immediate removal of any site or item from the search index, and multiple start addresses per content source;
  • better usage reports. Previously, administrators reported having trouble extracting data from search logs. Now they can create their own reports from a variety of built-in templates.

In addition, MOSS 2007 search has two new features:

  • People search that allows users to find employees not only by department or job title but also by expertise, social distance, and common interests. Integration with Microsoft’s Active Directory saves implementation time, especially if the directory contains information about security levels and what people actually do.
  • Business Data Catalog allows users to search content stored in SAP, Oracle, or other databases.

MOSS vs Google
How effective are these new and improved features? One large, global organization compared MOSS 2007 search with Google using a test collection of half a million documents. From a relevancy standpoint, both gave similar results without using metadata cues. Three-fourths of the 500 users enrolled in the test said that MOSS 2007 search was better than what they had before (a combination of SharePoint 2003 and a well known enterprise search engine). Testers especially liked:

1. More informative results. Document summaries enabled users to tell what they were about – a big time saver.

2. Simplicity. People didn’t need to learn how to search. They got reasonably good results by typing a word or phrase in the search box.

3. Integration with desktop applications. SharePoint search is available in the upper right hand corner of the Internet Explorer 7 browser and is integrated with Windows desktop search. Users only see results that they are permitted to access.

The search manager also reported that MOSS 2007 is easier to administer and maintain, though he said that the index update process is still too time consuming. He liked the variety of usage reports, especially the one that shows the most popular search terms that have no Best Bets assigned to them (i.e. the editors have not selected one or more documents or sites to display at the top of the results list).

Room for improvement
Even those who like MOSS for search point out that there’s still room for improvement. Features they would like to see include:

  • Wide card search -- Enables a user to substitute one or more letters of a search word with an asterisk. Visitors can search effectively for names and products without knowing the exact spelling.
  • Video/image/audio search – Search audio and video files; display image search results as thumbnail images.
  • Support for near operator – Enables a user to enter two search terms and specify their proximity to each other (e.g. “defense near 2002” would find documents in which the words “defense” and “2002” occur within a specified number of words within the document text).
  • Document highlighting – MOSS highlights search terms in the list of results but not in the target Web page.

Many, if not all, of these features are available through third-party add-ons from vendors such as Coveo and Mondosoft Ontolica. Unlike other search engine vendors, who provide new features exclusively through the upgrade process, Microsoft encourages its customers to purchase enhancement packages created by independent developers. These add-ons, however, increase the total cost of MOSS search deployment.

Influence of strategy and budget
MOSS search is especially compelling for those organizations that have standardized on Microsoft products as a way to reduce the costs of systems integration and support – or because Microsoft is a major business partner for software consulting services. On the other hand, MOSS is less appealing to organizations that subscribe to a best-of-breed strategy where products from multiple vendors believed to be best at what they do are purchased and then integrated.

Moreover, companies selecting MOSS tend to look at search as part of a single system in which:

  • people are key information resources;
  • much of the firm’s intellectual capital resides locally in word processor documents, spreadsheets, and presentations (as opposed to centralized in formal document libraries);
  • increasing knowledge worker productivity is an explicit goal;
  • many document and content management functions occur in semi-autonomous knowledge centers.

In other words, MOSS search is well suited to organizations that have standardized on the Microsoft technology platform, use SharePoint for collaboration, have a decentralized organization structure, and are in knowledge-intensive industries (e.g. R&D, software consulting).

Investments in metadata and taxonomies
Organizations that have invested in populating content with metadata and creating extensive taxonomies naturally want to leverage this effort to enhance enterprise search. MOSS search can use existing metadata in documents as well as some relationships from an external thesaurus.

The MOSS search crawler will discover metadata embedded within documents, then use it to filter search results and display options in Advanced Search. However, the administrator must first map the crawled metadata elements to “managed properties” (attributes such as author, title, and URL that can be used in search scopes and queries). The Dublin Core metadata library comes with MOSS out of the box.


click image to enlarge

Some common metadata elements are mapped by default, but it’s also possible to create new managed properties for such attributes as customer name, customer service rep, or customer service region. Managed properties can be incorporated into document and site templates to make it easier to add metadata values at creation time, but MOSS provides no auto-categorization program to add metadata retrospectively to an existing document collection.


click image to enlarge

Using a thesaurus with MOSS search
With MOSS keywords and synonyms it is possible to use some thesaurus data and relationships to expand a search or influence the order of documents in the results list. Keywords in a search engine context are somewhat different from terms in a thesaurus that is used for classification or browse purposes. In a traditional thesaurus, there are preferred terms, non-preferred (USE) terms, broader terms, narrower terms, and related terms. In the MOSS “thesaurus” file (used to expand or redirect a query), there are only three kinds of relationships:

  • expand search (user enters “Internet Explorer,” MOSS also returns documents containing “IE” and “IE7”);
  • replace query term (user enters “NT5,” MOSS replaces it with “W2K”);
  • rank order (in the search results, MOSS ranks documents containing the word “automobile” with a weight of ”1.0” ahead of those containing the phrase “beach wagon” with a weight of “0.7”).

In MOSS you can also associate definitions with keywords.

It’s not possible to simply import a traditional thesaurus into the MOSS thesaurus XML format because they’re two different animals. For one thing, a search thesaurus (i.e. a list of synonyms) should contain words that real users will type in the search box (from search logs) – not terms created by a professional indexer (though there will be some overlap). For another, a traditional thesaurus may contain phrases such as “packaging law & legislation,” while a search thesaurus should contain single words or, at most, two-word phrases. Finally, there’s no way to show broader/narrower relationships in search results (e.g. as “see also” links or an expandable hierarchy of related topics).

At least two organizations we know of have bumped into size and performance limitations with the MOSS thesaurus (Microsoft says there’s a 10 mb limit).

Changing the order of search results
One of the major uses of a thesaurus is to classify documents (i.e. assign terms to them). Organizations that have used a thesaurus in this way, either by using human indexers or an auto-categorization program, can leverage some of this work in MOSS through Best Bets and Authoritative Pages.

With Best Bets, MOSS administrators can associate keywords with specific Web pages or sites. When a user types the keyword into the search box, MOSS displays those sites designated as Best Bets at the top of the results list (or in a sidebar) and marks them with an icon, such as a star (see below).


click image to enlarge

With Authoritative Sites, administrators increase or decrease the relevance of content within search results by assigning one of four levels to a Web page or site: most authoritative, second-level authoritative, third-level authoritative, or sites to demote in the ranking. Sites that are not assigned an Authoritative Page level are weighted based on their “click distance” from an authoritative site. Click distance refers to the number of links between a page and an authoritative page linking to the content item.

So, while it’s possible to tweak MOSS search results using a variety of techniques along with some data from an existing thesaurus, it’s a labor-intensive endeavor. For this reason, some organization with large, complex taxonomies opt to purchase third party thesaurus management software that integrate with SharePoint – an approach which Microsoft endorses. Examples of MOSS-compatible taxonomy management tools include Factica Synaptica, Data Harmony Machine-Aided Indexer, Schemalogic SchemaServer, and Interse I-box.

A consistent search experience
Users want a simple, effective way to search all available content collections – whether they reside in SharePoint, on the company’s intranet, in databases, or in external information services.  The ideal is a single search box, a results page that contains relevant listings without duplicates, and a way to match user security profiles with content access levels in each content source.

Within MOSS, an administrator can create a Shared Services Provider (SSP) and instruct it to crawl all the content sources deemed necessary for a particular business function. Sources can include SharePoint content, the company intranet, database applications such as SAP and Oracle, and external information services such as FindLaw. The crawl results are stored in a single index, which makes the search relatively fast and efficient.

However, large organizations typically have multiple SSP’s. To allow a user to search all of them from a single user interface, you can purchase a third-party application such as Mondosoft’s Ontolica (see the federated search option on the Ontolica Web site). Or, you can select an enterprise search engine that can crawl and index SharePoint content. Examples include Autonomy, FAST, Longitude (BA-Insight), Oracle, Recommind, Vivissimo, and others.

Is MOSS 2007 right for you?
Organizations that use SharePoint for collaboration and content management should consider MOSS for enterprise search. Its tight integration with Microsoft applications, especially Office, low cost, and new search features make it a serious contender. Because MOSS is designed for bottom-up implementation, it’s important to get input from business units as well as the enterprise search team and taxonomy manager, if there is one.

Several of our members have mentioned the effort needed to customize MOSS search and set up interfaces to other business applications through the MOSS Business Data Connector. Added to that is the cost of purchasing third-party programs for enhanced search features and taxonomy management. We suspect that for many organizations, the question is not “Should we use MOSS as our enterprise search engine” but rather “What’s the best way to integrate our non-Microsoft enterprise search engine with MOSS?”

Created on February 6, 2008 l Updated on January 4, 2010