Project Description
IndexTank is a cloud-based, real-time indexing SaaS (software as a service) that lets you quickly and easily add custom, full-featured search functionality to any web site or application.

IndexTankDotNet provides convenient programmatic access to any IndexTank-compatible API from any .NET application.

IndexTankDotNet is available as a NuGet package.

IMPORTANT NOTE: On October 11, 2011, IndexTank was acquired by LinkedIn and the API was closed to the public. Since then, a number of other services have emerged, based on the same engine that originally powered IndexTank. Thus far, Searchify is the only one which I have found that faithfully implements the original IndexTank API, such that it can serve as a 100% compatible drop-in replacement for IndexTank. IndexTankDotNet v1.1 has been fully tested against the Searchify service.

Unfortunately, I can't yet guarantee that the library will work with any other IndexTank-like services. If you happen to give it a try, feel free to ping me and let me know how you make out.

Introduction
Overview
Basic Usage
Specifying a Search Timeout
Additional Fields
Document Variables
Scoring Functions
Query Variables and Geolocation
Faceting
Range Queries
Batch Indexing
Bulk Delete
Delete By Query
Index Management
Error Handling
Autocomplete
Getting Additional Help
Acknowledgments

Introduction

IndexTank is a service that lets you easily add powerful search capabilities to any web site or web-enabled application. The service is used in hundreds of applications, from small personal blogs to huge global sites like Reddit, TwitVid, and TaskRabbit.

IndexTank has many advantages to current alternatives, such as Lucene, Solr, Sphinx, or SQL Full Text indexing:
  • Cloud-based: no infrastructure to store or maintain
  • Scalable from a simple personal blog to hundreds of millions of documents (FREE for up to 100,000 documents)
  • Truly real-time: you can instantly update any document without re-indexing
  • Geo- and Social-aware: use location, votes, ratings, or comments to custom tailor search results relevance
  • Sort and score results using algorithms that you define
  • Fuzzy searching, Autocomplete, Facets for how users really search
  • Highlights and Snippets quickly show search results relevance

IndexTankDotNet is a client library that serves as a convenient front-end for IndexTank-compatible services that you can use in any .NET application, from ASP.NET MVC, WebForms, or Web Pages; to WinForms, WPF, Silverlight, WP7, or even console applications. Anything you can do with the IndexTank API directly, you can do with IndexTankDotNet.

The library is copiously documented, so it is likely that you will be able to catch on to how everything works merely by inspecting the documentation via Intellisense. The source code also includes a full suite of unit tests that demonstrate how to use the library. The following primer is provided to help you get started. If you require additional help, please don't hesitate to ask.

back to top

Overview

The following assumes that:
  • you have already obtained an account and have the private API URL for your account handy.
  • you have added IndexTankDotNet to your project via Nuget (PM> Install-Package IndexTankDotNet or via the NuGet management tools in Visual Studio).
NOTE: The examples below omit basic error handling in the interest of clarity. See the section on Error Handling for a detailed explanation of IndexTankDotNet exceptions.

back to top

Basic Usage

If you already have created an index using your account dashboard, you'll need to use your private API URL to instantiate a client. You can then use the client to fetch an existing index by its name.

IndexTankClient client = new IndexTankClient("<YOUR API URL HERE>");

Index index = client.GetIndex("<YOUR INDEX NAME HERE>");

Once you have an instance of an Index, all you need is the content you want to index. You can add a document to your index by specifying a document identifier and adding its content to a single field called "text":

string documentId = "<A DOCUMENT IDENTIFIER>";
string documentText = "<TEXTUAL CONTENT OF DOCUMENT>";

Document document = new Document(documentId).AddField("text", documentText);

index.AddDocument(document);

That's it. You have indexed a document.

You can now search the index for any indexed document by simply providing a search query:

string queryText = "<TEXT TO SEARCH FOR>";

Query query = new Query(queryText);

SearchResult result = index.Search(query);

Console.WriteLine(string.Format("There were {0} matches found for '{1}'.", result.Matches, result.QueryText));
Console.WriteLine(string.Format("The search took {0} seconds.", result.SearchTime));

foreach (ResultDocument document in result.ResultDocuments)
{
   Console.WriteLine(string.Format("Document ID: {0}", document.DocumentId));
}

As you can see, the Search method returns a SearchResult object. The SearchResult contains a list of ResultDocument objects, each of which contains the document identifier you provided when indexing the document. In a real application, you would normally provide a URL or an ID from a database record as a document identifier. Then, to render the results page, you could use the identifier to build a link that points to a relevant page, or that retrieves the actual records from your database.

You will also note that SearchResult contains meta information about the search that was performed, such as Matches (the number of total matches found), SearchTime (how long the search took in seconds), and QueryText (the text that was supplied to the original query).

NOTE: For performance reasons, the maximum number of ResultDocument objects returned in any SearchResult is 5,000. Nevertheless, the Matches property will still correctly reflect the true number of matched documents in the index. So for example, if there are 8,267 documents that are matched by a query, result.Matches would be 8,267, but only the top 5,000 documents would be returned (result.ResultDocuments.Count would be 5,000).

You can get richer results by requesting fields and/or snippets when forming the query.

Query query = new Query(queryText)
   .WithSnippetFromFields("text")
   .WithFields("title", "timestamp");

SearchResult result = index.Search(query);

Console.WriteLine(string.Format("There were {0} matches found for '{1}'.", result.Matches, result.QueryText));
Console.WriteLine(string.Format("The search took {0} seconds.", result.SearchTime));

foreach (ResultDocument document in result.ResultDocuments)
{
   Console.WriteLine(string.Format("Document ID: {0}", document.DocumentId));
   Console.WriteLine(string.Format("Title: {0}", document.Fields["title"]));
   Console.WriteLine(string.Format("Timestamp: {0}", document.Fields["timestamp"]));
   Console.WriteLine(string.Format("Snippet: {0}", document.Snippets["text"]));
}

If you fetch fields, you get the entire content of the specified fields back in the Fields property of the ResultDocument objects. If you fetch snippets, the Snippets property of the ResultDocument objects will contain strings with the matched term(s) surrounded by <b></b> tags, plus the text immediately preceding and following the matched term(s).

You can implement server-side paging over the matched documents by calling the Skip and Take methods on the query. Even though a limited set of documents may be returned, the Matches property of the SearchResult will reflect the entire number of matched documents.

Query query = new Query(queryText).Skip(10).Take(10);

SearchResult result = index.Search(query);
NOTE: For performance reasons, the underlying API limits the sum of the arguments passed to the Skip and Take methods to 5,000 or less. If the sum exceeds 5,000, an ArgumentOutOfRangeException will be thrown.

Deleting a document from an index is also very easy:

index.DeleteDocument(documentId);
back to top

Specifying a Search Timeout

The IndexTank service is lightning-fast -- most searches complete in a matter of milliseconds. However, if you need to specify a timeout for searches, there is an overload of the Search method that lets you do that.

try
{
   var query = new Query("love");
   // set timeout for 2 seconds (2000 ms)
   SearchResult result = index.Search(query, 2000);
}
catch (TimeoutException)
{
   // Notify user that search timed out
}

You can specify the desired timeout in milliseconds. If the Search method does not return within the specified timeout, a TimeoutException is thrown. You can wrap the Search call in a try block, then handle the TimeoutException to notify the user that the search has timed out.

back to top

Additional Fields

Keep in mind that before adding a Document to an Index, you must define at least one field on it (failing to do so will throw an InvalidOperationException). (NOTE: The maximum size for the content of any field is 100 kbytes.)

With IndexTank, the “text” field is special, as it is the default field that all queries are performed against. Except in very special cases, you will almost always want to define a “text” field. You saw how to do that above. Because this is such a common task, the Document object constructor is overloaded to let you populate the text field directly.

Document document = new Document(documentId, documentText);

// This is the same as 
// Document document = new Document(documentId).AddField("text", documentText);

index.AddDocument(document);

When you index a document, you can define different fields by adding more elements to the document using a fluent interface.

string title = "<TITLE OF DOCUMENT>";
string author = "<AUTHOR'S NAME>";

Document document = new Document(documentId, documentText)
   .AddField("title", title)
   .AddField("author", author);
         
index.AddDocument(document);

By default, searches will only look at the "text" field, but you can use special query syntax that looks at other fields by prefixing a search term with the field name. The following example filters results to include only authors named “Dumond”.

Query query = new Query(queryText + " author:Dumond");

SearchResult result = index.Search(query);

A common use case is that you may want searches to look across multiple fields automatically. As an example, let's say you indexed the following document:

Document document = new Document("post_2", "The blueberry is the greatest fruit ever!")
   .AddField("title", "I Love Blueberries");

index.AddDocument(document);

In this case, a simple search for blueberries would not find this document, because that term does not appear in the text field. A search for title:blueberries would turn up this document, but again, a search for title:blueberry would not. To search across all fields, the user would have to specify a search like so: text:blueberries OR title:blueberries. This may be problematic; not only because of the complex syntax involved, but because the user may not even be aware of what fields exist on your documents and what their names are.

To address this common scenario, the library contains an overload of the Search method that lets you specify which fields a search should target. You can specify as many fields as you want, and any simple search will look in all of them.

Query query = new Query("blueberries");
         
SearchResult result = index.Search(query, "text", "title");

In this case, a simple search for blueberries would retrieve the document.

There's also a special field named "timestamp" that is expected to contain the publication date of the content in seconds since Unix epoch (1/1/1970 00:00 UTC). If none is provided, the time of indexation will be used by default. While it is possible to specify the timestamp manually using AddField(“timestamp”, “<NUMBER OF SECONDS>”), it is far easier to do so using the AddTimeStamp method, which lets you pass in a DateTime object.

Document document = new Document(documentId, documentText)
   .AddTimestamp(new DateTime(2011, 8, 1));

index.AddDocument(document);
NOTE: Because the underlying API stores the timestamp as a 32-bit integer, passing a date to the AddTimeStamp method that results in a timestamp exceeding the capacity of an Int32 will cause an ArgumentOutOfRangeException to be thrown. This effectively limits the range of valid dates that can be passed to the range December 14, 1901 to January 19, 2038.

back to top

Document Variables

When you index a document you can define special floating point fields that can be used in the results scoring. The following example can be used in an index that allows 3 variables:

float rating = 0.5f;         
float reputation = 1.5f;
float visits = 10.0f;

Document document = new Document(documentId)
   .AddVariable(0, rating)
   .AddVariable(1, reputation)
   .AddVariable(2, visits);

index.AddDocument(document);

NOTE: The number of variables you can define on a document is limited, and can vary by index. As of this writing, that number is 3 for indexes created on a free account. If you attempt to create more than is allowed, an exception will be thrown.

You can also update a document’s variables without having to resend the entire document. This is much faster and should always be used if no changes were made to the document itself.

float newRating = 0.7f;
float newVisits = 15.0f;

var variables = new Dictionary<int, float> {{0, newRating}, {2, newVisits}};

index.UpdateVariables(documentId, variables);
back to top

Scoring Functions

To use the variables in your searches you'll need to define scoring functions. These functions can be defined in your web dashboard by clicking "Manage" in your index details, or directly via the IndexTankDotNet library.

// FUNCTION 0: Sorts by most recent (this function is added to all newly created indexes automatically)
index.AddFunction(0, "-age");

// FUNCTION 1: Standard textual relevance
index.AddFunction(1, "relevance");

// FUNCTION 2: Sorts by rating
index.AddFunction(2, "doc.var[0]");

// FUNCTION 3: Sorts by reputation
index.AddFunction(3, "d[1]");

// FUNCTION 4: Advanced function
index.AddFunction(4, "log(d[0]) - age/50000");

Read the function definition syntax for help on how to write functions.

If you don't define any functions, you will get the default function 0 which sorts by age (most recent first). By default, searches will use the function 0. You can specify a different function when searching by using the following:

SearchResult result = index.Search(new Query(queryText).WithScoringFunction(2));
back to top

Query Variables and Geolocation

Besides the document variables, in the scoring functions you can refer to query variables. These variables are defined at query time and can be different for each query.

A common use-case for query variables is geolocation, where you use two variables for latitude and longitude both in the documents and in the query, and use a distance function to sort by proximity to the user. For this example will assume that every document stores its position in variables 0 and 1 representing latitude and longitude respectively; expressed in degrees as floating point numbers.

Defining a proximity scoring function:

// FUNCTION 5: Inverse distance calculated in miles
index.AddFunction(5, "-mi(d[0], d[1], q[0], q[1])");

Searching from a user’s position:

float latitudeOfUser = -34.70549341022545f;
float longitudeOfUser = -58.359375f;

SearchResult result = index.Search(
   new Query(queryText)
      .WithScoringFunction(5)
      .WithQueryVariable(0, latitudeOfUser)
      .WithQueryVariable(1, longitudeOfUser));
back to top

Faceting

Besides being able to define numeric variables on a document, you can tag documents with category values. Each category is defined by a string, and its values are also defined by strings. So for instance, you can define a category named "articleType" and its values can be "camera", "laptop", etc... You can have another category called "priceRange" and its values can be "$0 to $49", "$50 to $100", etc...

Documents can be tagged with a single value for each category when it is created, so if a document is in the "$0 to $49" priceRange it can't be in any other, and retagging over the same category results in overwriting the value.

Document document = new Document(documentId)
   .AddCategory("articleType", "video game")
   .AddCategory("priceRange", "$0 to $49");

index.AddDocument(document);

You can tag several categories at once on an already-indexed document like this:

var categories = new Dictionary<string, string> {{"priceRange", "$0 to $299"}, {"articleType", "camera"}};

index.UpdateCategories(documentId, categories);

When searching, the returned SearchResult object will have a property called Facets. If any of the found documents is tagged with a category, Facets will contain a dictionary with categories for keys. For each category the value will be another dictionary, with category value as key and the number of occurrences in the search results as value. So for instance:

SearchResult result = index.Search(new Query(queryText));

foreach (var facet in result.Facets)
{
   Console.WriteLine(string.Format("Found results for {0}:", facet.Key));
   foreach (var category in facet.Value)
   {
      Console.WriteLine(string.Format("\t{0} for {1}", category.Value, category.Key));
   }
   Console.WriteLine();
}

// Found results for articleType:
//    5 for camera
//    3 for laptop
//
// Found results for priceRange:
//    4 for $0 to $299
//    4 for $300 to $599
//

This means that from the matches, 5 are of the "camera" articleType and 3 are of the "laptop" articleType. Also, 4 of them all are in the "$0 to $299" priceRange, and 4 are in the "$300 to $599" priceRange.

Of course, since these are simply dictionaries, you can obtain the occurance counts directly if desired.

int occurancesOfCamera = result.Facets["articleType"]["camera"];  // equals 5
int occurancesOfLaptop = result.Facets["articleType"]["laptop"];  // equals 3

int occurancesOf0to299 = result.Facets["priceRange"]["$0 to $299"]; // equals 4
int occurancesOf300to599 = result.Facets["priceRange"]["$300 to $599"]; // equals 4

Then, you can also filter a query by restricting it to a particular set of category/values. For instance the following will only return results that are of the "camera" articleType and also are either in the "$0 to $299" or "$300 to $599" price range.

SearchResult result = index.Search(
   new Query(queryText)
      .WithCategoryFilter("articleType", "camera")
      .WithCategoryFilter("priceRange", "$0 to $299", "$300 to $599"));
back to top

Range Queries

Document variables and scoring functions can also be used to filter your query results. When performing a search it is possible to add variable and function filters. This will allow you to only retrieve, in the search results, documents whose variable values are within a specific range (e.g.: posts that have more than 10 votes but less than 100). Or only return documents for which a certain scoring function returns values within a specific range.

You can specify more than one range for each variable or function filter (the value must be within at least ONE range), and you can use as many filters as you want in every search (all filters must be met):

/*
In this sample, the results will only include documents 
whose variable #0 value is between 5 and 10 OR between 15 and 25
AND variable #1 value is less than or equal to 3
*/

SearchResult result = index.Search(
   new Query(queryText)
      .WithDocumentVariableFilter(0, 5f, 10f)
      .WithDocumentVariableFilter(0, 15f, 25f)
      .WithDocumentVariableFilter(1, float.NegativeInfinity, 3));

// The same applies to functions
SearchResult result2 = index.Search(
   new Query(queryText)
      .WithFunctionFilter(0, 0.5d, double.PositiveInfinity));
back to top

Batch Indexing

When populating an index for the first time, or when a batch task for adding documents makes sense, you can use the AddDocuments batch indexing method.

When using batch indexing, you can add a large batch of documents to an index with just one call. There is a limit to how many documents you can add in a single call, though. This limit is not related to the number of documents, but to the total size of the resulting HTTP request, which should be less than 1MB.

Making a batch indexing call reduces the number of requests needed (reducing the latency introduced by round-trips) and increases the maximum throughput which can be very useful when initially loading a large index.

var documents = new List<Document>
                   {
                      new Document("post 1", "I love Angry Birds"),
                      new Document("post 2", "I like Words With Friends"),
                      new Document("post 3", "I hate Doodle Jump")
                   };

BatchIndexResultCollection indexResults = index.AddDocuments(documents);

The indexing of individual documents may fail and your code should handle that and retry indexing them. (If there are formal errors in the request, the entire batch will be rejected with an exception.) The BatchIndexResultsCollection object provides you with a way to retry when some of the documents failed to be indexed:

var failedDocuments = indexResults.GetFailedDocuments();
index.AddDocuments(failedDocuments);
back to top

Bulk Delete

With this method, you can delete a batch of documents (reducing the latency introduced by round-trips). The total size of the resulting HTTP request should be less than 1MB.

string[] docIdsToDelete = new[]{"post 1", "post 2", "post 3"};
         
BatchDeleteResultCollection deleteResults = index.DeleteDocuments(docIdsToDelete);

The deletion of individual documents may fail and your code should handle that and retry deleting them. (If there are formal errors in the request, the entire batch will be rejected with an exception.)

var failedDocIds = deleteResults.GetFailedDocIds();
index.DeleteDocuments(failedDocIds);
NOTE: There is no method to delete all the documents in an index at once. The fastest way to do this is to delete the entire index, then re-create it. See Index Management for details.

back to top

Delete By Query

There is an overload of the DeleteDocuments method that lets you delete a batch of documents that match a particular search query. You can use many of the same arguments applied to a normal search - Skip (which will preserve the results found before the skip value), WithScoringFunction, WithCategoryFilter, WithQueryVariable and WithDocumentVariableFilter.

index.DeleteDocuments(new Query(queryText));

Of course, fetching properties is not meaningful for delete queries, so callling WithFields, WithSnippetsFromFields, WithCategories, or WithVariables on a Query passed to the DeleteDocuments method has no effect.

NOTE: As of this writing, the underlying API does not support the Take method for queries passed to the DeleteDocuments method. To protect against the possibility of inadvertently deleting documents that you did not intend to, calling Take on a Query that is passed to the DeleteDocuments method will throw a NotSupportedException. It is possible that the API may change in the near future to support this feature.

back to top

Index Management

Usually, you will want to manage your indexes via the IndexTank web dashboard. However, you can also manage indexes programmatically using the library.

The CreateIndex method will create and return a new index.

Index myIndex = client.CreateIndex("myIndex");

Keep in mind it may take a few to several seconds for the server to allocate the resources required by your new index. If you intend to use a newly created index immediately, it is good practice to ensure it is running first, by using the following pattern:

Index myIndex = client.CreateIndex("myIndex");

while (!myIndex.IsStarted)
{
   Thread.Sleep(300);
   myIndex = client.GetIndex("myIndex");
}

The DeleteIndex method completely removes the index with the supplied name.

client.DeleteIndex("myIndex");
back to top

Error Handling

As with any REST API, errors can and will occur. These usually fall into one of two categories:
  • connection failures, failed DNS lookups, authentication errors, and so on;
  • errors returned by the API itself, such as those that are a result of attempting an invalid operation, supplying an invalid argument, attempting to exceed some limit placed on your account, and so on.

In IndexTankDotNet, you will find an exception called IndexTankException, which is the base exception that represents both of these types of errors. There are two types that derive from IndexTankException, called IndexTankProtocolException and IndexTankApiException.

The library never throws an IndexTankException directly; rather, it will always throw one of the two more derived types. This allows you to catch the IndexTankException base type (which will catch both types) or either of the two subtypes.

The base type provides a protected method to its subtypes, called GetHttpStatusCode(), that will return the HTTP status code that was returned in the response. Both IndexTankProtocolException and IndexTankApiException will always have an associated HTTP status code.

When working with the library, it is considered good practice to wrap any statements in a try/catch block, to catch IndexTankException, and to handle the exception by displaying a message to the user that the operation was not successful.

Other errors

You may encounter other types of exceptions in development, such as ArgumentException, ArgumentNullException, InvalidOperationException, NotSupportedException, or FormatException. It is important to understand that these standard .NET exception types are thrown by the library itself -- never by the API. These are in place to ensure that a request is never placed if it can be determined in advance that doing so is guaranteed to return an error; thereby saving the unnecessary round trip.

For example, because the REST API relies on well-formed URLs in order to operate, there is a rule that an index name cannot contain forward slashes. If you attempted to create an index with one or more forward slashes in its name, the API would have trouble resolving the URL and would return an error with a HTTP status code of 502 (Bad Gateway). However, since it is known in advance that this would occur, the library does not submit the request, but rather throws a FormatException instead.

The standard .NET exceptions that can be thrown from the invocation of any public method in the library is listed in the Intellisense documentation. Fortunately, if you validate your inputs properly, you should never have to catch these types in production code. For example, if your code ensures that index names can only contain letters, digits, or the underscore (_) character, you would never encounter a FormatException from calling the CreateIndex method.

NOTE: The exception to this rule is the handling of TimeoutException when specifying a timeout for searches. See Specifying a Search Timeout.

back to top

Autocomplete

Besides the private URL that is required to access the IndexTank API, your IndexTank account includes a public URL that can be used to access the API from the client side. When you enable public search for an index, you can exchange data with the IndexTank servers using JSON/JSONP via AJAX. There is an indextank-jquery plug-in that makes this quite easy, as well as a tutorial on the IndexTank web site.

back to top

Additional Help

If you need additional help or have any questions about IndexTankDotNet, or have an idea or suggestion to make IndexTankDotNet better, please do not hesitate to speak right up. I monitor the Discussions tab closely, so the best way is to post your questions there. If you find a bug, please feel free to create an issue under the Issue Tracker tab.

You can also follow me on Twitter @leedumond, or subscribe to my blog.

If you are interested in more information about the IndexTank service itself, you will find several tutorials on their documentation page.

back to top

Acknowledgments

IndexTankDotNet was built using John Sheehan's RestSharp and James Newton-King's Json.NET.

back to top

Last edited May 9, 2012 at 8:11 PM by LDumond, version 144