On a project I’m currently on, we had a scenario where we needed to support being able to quickly remove potentially many documents from the FAST Search index. Unfortunately, the FAST web administration only allows you to delete one document at a time, which would definitely not be suitable for our scenario. We had a couple of ideas on how we were going to tackle the problem. One of the ideas we tossed around was using the FAST Content API. Although we didn’t end up using this technique for the project, I still believed that using the Content API along with Powershell to be a very useful and powerful combination. So today, I spent a little bit of time working on a Powershell cmdlet that can remove many items from the FAST index.
Visual Studio 2010 Project Setup
The first thing to do is to create a Class Library project in Visual Studio and add a reference to the Esp-Contentapi.dll from the ESP SDK. You’ll also want to add a reference to both System.Management.Automation.dll (found in C:WindowsassemblyGAC_MSILSystem.Management.Automation184.108.40.206__31bf3856ad364e35) and System.Configuration.Install.dll (in C:WindowsMicrosoft.NETFrameworkv2.0.50727).
After adding the three dlls, you want to add two class files to the project, a Powershell snap-in class and a class for the cmdlet. In my project, my snap-in class is PointBridge.FAST.Cmdlets.PointBridgeFASTSnapIn and the cmdlet class is PointBridge.FAST.Cmdlets.Content.RemoveContentItem. The code and explanation of these classes follows.
This class derives from PSSnapIn and is used to register all the cmdlets in the assembly. When deriving from PSSnapIn, you need to override the following three properties: Name, Description, Vendor.
The class also is decorated with the RunInstaller attribute, in order to be able to install the assembly using installutil.exe.
This class, which derives from Cmdlet, is the main class than handles the processing. When building cmdlets, you decorate the class with a Cmdlet attribute. This attribute is used to indicate the verb-noun pair used to invoke your cmdlet. In this instance, because of this attribute, my cmdlet is invoked as ‘Remove-ContentItem’ from the shell.
The RemoveContentItem class has three Powershell parameters:
- ContentID – the ID of the content to delete from the FAST index.
- Collection – the name of the collection in FAST where the item is in.
- ContentDistributor – the server/port of the FAST ContentDistributor.
In the BeginProcessing() method (overridden from the Cmdlet base class), I set up an instance of an IDocumentFeeder object to be used later, when processing each record. The IDocumentFeeder is an interface that allows you to work with a FAST ESP collection for adding/removing/updating documents within that collection. You can get an instance of an IDocumentFeeder by calling the static CreateDocumentFeeder method of the Com.FastSearch.Esp.Content.Factory class.
In the ProcessRecord() method, I call the RemoveDocument() method of the IDocumentFeeder object to queue up the removal of the content item. The ProcessRecord() method is called for each ContentID passed into the cmdlet from the pipeline.
Finally, in the EndProcessing() method, I take care of reporting and clean up. The call to IDocumentFeeder.WaitForCompletion() is used to make sure all deletes that were submitted are complete (successfully or not) before we continue. After the deletes have been processed, I used the IDocumentFeederStatus object returned from IDocumentFeeder.GetStatusReport() to build up a report of the deletes that failed or executed with warnings.
Using the cmdlet
In order to use the cmdlet, open up a new Powershell window and use installutil.exe to install the snap-in:
PS> CD [location of assemblies] PS> set-alias installutil $env:windirMicrosoft.NETFramework64v2.0.50727installutil PS> installutil PointBridge.FAST.Cmdlets.dll
You only need to run installutil one time and the snap-in can be added on any subsequent Powershell sessions.
The following is an example of how to use the cmdlet:
Line 1 just adds the snap-in for use in my current session. Line 2 sets up an array of the content ids I want to delete from my collection. This array (or set of records to process) can be read from a file, database, wherever. Here, I just set it up directly as an example. The third id in the example above is a fake id that doesn’t actually exist in my collection. Lastly, I take my content id array, pipe it to my remove-contentitem cmdlet and the results are sent to an output file (not necessary to push to an output file but I always like to, instead of everything dumping on the screen).
The results of running this cmdlet looks like this:
The nice thing about wrapping this up in a cmdlet is that I can reuse this cmdlet in my Powershell scripts so that I can easily remove any unwanted content from my collections.
So here is the shameless plug – If you want a copy of the Visual Studio solution, use the Tweet link below to tweet this post. Then send me an email (firstname.lastname@example.org) and I’ll send you a copy of the solution.