Log analysis is an important
technique in the knowledge base editor's toolkit. But what is it? What
are its benefits and limitations? How does it work? This article provides
answers to these questions and discusses our use of logs at the Montague
Institute. In a companion article,
John Morelli describes his experiences with file and web logs.
What is "log analysis?" Log analysis is the process of analyzing data
about online user behavior -- e.g. date/time of access, tasks performed,
and any errors encountered. Logs are produced automatically by many kinds
of programs -- the operating system (e.g. Windows 2000), a web server
(e.g. Apache), a search engine (e.g. Inktomi) or a database (e.g. Oracle).
In some cases, editors can dictate what information is logged, how it's
formatted, and what time period each covers.
To be useful for analysis, log data must be sorted
and summarized. The least expensive, most versatile tool for this purpose
is a spreadsheet program like Microsoft Excel, but specialized tools are
also available (for a comparison chart, see "Automated
Monitoring of Customer Access").
What can you learn from logs? Some of the things you can learn from logs are:
Which applications (i.e. word processing,
e-mail, fax) do people use most often?
Which articles on your web site are most popular?
Which articles have been accessed by a specific
user?
What are the top 20 words are phrases that
people use to search your web site or database?
What kinds of errors have users encountered?
Are unauthorized users accessing your data?
What sequence of actions do users perform
in the course of a research project?