Tech News

Trouble-shooting Quick Search issues

Home

Triaster uses the Keyoti search system to index files and present results in a process library site’s website. When trouble-shooting Search issues, there are some key files that can help.


Triaster\TriasterServer2011\KeyotiSearch\
IndexDirectory\
Numerically-named index files
Indexer.txt
lock
ParserProvider.txt
Reader.txt
IndexLog.txt

‘IndexLog.txt’ file

When a re-index is run as a post-publish task, actions are logged in ‘IndexLog.txt’. Useful information includes:

  • Timestamps, so that you know whether records represent expected indexing activity
  • The sources (e.g. which maps and documents)
  • Errors, including for specific sources, and whether indexing was run at all

The timestamp on the file itself will also indicate whether indexing has happened when expected. An old timestamp would suggest a problem.

Errors for specific sources aren’t unusual. There are commonly files in searchable locations that cannot be handled, and these are often identified in the ‘Indexer.txt’ and ‘Reader.txt’ files, more details of which will be given later.

Index files

The index files are a set of numerically-named files. In a complete index, the numbers would be the same, but the files would have different extensions. File-names with different numbers would suggest either an index is currently underway, or that one had crashed. The timestamps on those files should indicate which.

If indexing has crashed, then the index files should be deleted. A corrupt index can prevent further indexing.

‘Indexer.txt’, ‘Reader.txt’ files

These contain records of files, as represented by their HTTP URLs.

Problems reading and indexing files are likely to be identified here. Often, these are of no concern, perhaps system files such as ‘thumbs.db’ files that Windows Explorer uses.

However, there may be characters in a path that are forbidden in URLs, preventing the file from being indexed. For example, by default, ‘+’ isn’t allowed in a URL by IIS (Microsoft Windows’ web serving engine).

A document with a ‘+’ in its file name won’t appear in Search results. In this case, either the document is renamed to remove problem characters, or ‘double-escaping’ is enabled in IIS to allow their use in URLs.

‘lock’ file

When indexing is running, a ‘lock’ file is written to the ‘IndexDirectory’ folder. This will prevent another reindex from being initiated. When indexing is complete, it should be deleted. However, if indexing crashes, it’s likely to remain. Its timestamp should indicate whether it represents current indexing action.

If indexing has crashed, this file should be deleted.

‘ParserProvider.txt’ file

This file identifies MIME-types that are not recognised. A MIME type is used by a web server as a way of identifying a file based on its nature and format, and will determine how that web server serves a file.

We’ve only encountered one related issue where documents were on a different server. That server’s MIME-type configurations for some types of file weren’t in accordance with standard definitions, and those types of file weren’t being returned in Search results. This was apparent from the ‘ParserProvider.txt’ file. Correcting the MIME-types on the document-hosting server resolved the issue.

Summary

When investigating Search behaviour, these files in particular can offer helpful information:

  • IndexLog.txt
  • .txt’ files in the ‘IndexDirectory’ folder, especially ‘Indexer.txt’ and ‘Reader.txt’, and perhaps in more specific circumstances, ‘ParseProvider.txt’

Timestamps on files will identify the currency of indexing.

To remove a corrupted index and ensure subsequent indexing can run, delete these files from the ‘IndexDirectory’ folder:

  • Numerically-named index files (only the index files are named numerically)
  • ‘lock’ file

Register to receive product release notifications

SIGN UP FOR CONNECTOR

Sign up for Connector
Industry best practice and knowledge in our ‘best of breed’ newsletter.
Published quarterly.

Signup here