Thursday, February 10, 2011

SharePoint 2010 Search - Thesaurus Files

This year is the year of search for us. We're looking at implementing fast, focusing on metadata and better tagging in our content, and improving the overall search experience for our users. With that in mind, I just wanted to touch on creating Thesaurus files in SharePoint 2010 Search, as I've found the content on Technet and MSDN to be a bit cryptic.

Modifying Thesaurus Files
If you've been referencing Technet (http://technet.microsoft.com/en-us/library/dd361734.aspx), there is some pretty good information on how to create thesaurus files to improve your search. There are two main priniples in a thesaurus file: expansion sets and replacement sets.

To create an expansion set, which essentially maps synonyms to a specific keyword, you simply need to add the following code to your thesasurus file:

<expansion>
    <sub>401k</sub>
    <sub>401 k</sub>
    <sub>401-k</sub>
</expansion>

An important item to note, parentheses are not acceptable in a thesaurus file. I found this out the hard way when attempting to add <sub>401(k)</sub>, which is the approved standard for 401(k). Luckily for us, the parentheses are ignored in the keyword, so 401(k) will still appear in the search results, even when the user keys in 401-k.

To create a replacement set, the schema is similar:

<expansion>
    <pat>IE8</pat>
    <pat>IE7</pat>
    <sub>Internet Explorer</sub>
</expansion>


In this case, IE8 or IE7 would be replaced by Internet Explorer.

While we don't have much use for replacement sets, expansion sets have proven extremely valuable. Instead of providing Best Bets links via the Search Keywords administration tool, we can populate our search results with all synonyms immediately, which reduces the need for multiple clicks to get a specific result.

Updating Thesaurus Files


The update process is where I have found the least amount of information. Thesaurus files are stored in multiple locations on a SharePoint Server, and determining the correct location can be a real pain in the butt. We update our Thesaurus files in 2 locations in the /Program Files/Microsoft Office Servers/14.0/ folder.

First, we update the thesaurus files here: \Program Files\Microsoft Office Servers\14.0\Data\Config. Updates to the thesaurus files here will ensure that any new Search Service Applications that you create will have the updated thesaurus definitions when they spin up.

Second, we copy the thesaurus files to the: \Program Files\Microsoft Office Servers\14.0\Data\Applications\[GUID]\Config folder. The GUID will be the guid of your Search Service Application, which can be found using this bit of Powershell goodness: Get-SPServiceApplication. This powershell command will list all of your service applications and their corresponding IDs.

Once the files are copied to this location, you'll need to restart the search service for the changes to take affect. You can do this from Administrative Tools > Services > SharePoint Server Search 14, or by running these commands:

net stop osearch
net start osearch

Once the service is resarted, go to your search center and plug in one of the newly added expansion/replacement terms to see your results.

It is important to note that if you're running search on multiple servers, you will need to perform these steps on each server running search. If you already have multiple search applications running, you will also need to copy your thesaurus files to each config directory under the GUID folder for each search service application.

For safety sake, we also updated the tsneu.xml along with tsenu.xml. NEU is the language neutral thesaurus file, and ENU is the US English thesaurus file. Definitions for each language prefix can be found on technet.

1 comment:

  1. I liked your article, I will share your article to everyone!!



    ________________________________________________________________________
    WoW gold|Diablo 3 Gold|RS Gold|GW2 Gold

    ReplyDelete