Getting Started with SharePoint 2013 Search | Learning Search-driven Application Development with SharePoint 2013

The search architecture

SharePoint 2013 Search introduces a new search architecture that includes significant changes and new additions compared to previous versions. Since Microsoft consolidated FAST and SharePoint Search, the new search architecture has inherited components from both products while maintaining high scalability and performance.

Let's have a look at the new search architecture and discuss its components; refer to the following screenshot:

As we can see from the diagram, the search architecture can be divided into four components groups as follows:

Content components
Query components
The index component
The analytics-processing component

Content components

The content components are in charge of getting content ready for indexing. Each component has a well-defined role, which we will discuss next.

Crawl component

The crawl component is responsible for crawling content sources. It is the first stop for data that is about to be indexed by the search engine. The crawl component invokes connectors (both out-of-the-box and custom ones) that interact with the content source in order to crawl it.

While indexing, the crawl component uses one (or more) crawl database to temporarily store detailed tracking and historical information about the crawled item, such as the last time the item was crawled and the type of update during the last crawl.

Once an item is crawled, meaning both its data and its associated metadata is crawled, the crawl component delivers it to the content-processing component.

Content-processing component

The content-processing component's job is to analyze content it receives from the crawl component and feed it to the index component for indexing.

Content analysis is done by following a flow known as the Content Processing Flow, which is depicted in the following diagram:

The rectangular blocks in the diagram represent stages that we cannot interact with. We won't be discussing them as they are quite self-explanatory. The curved rectangular blocks, however, represent stages that we can interact with during the processing flow.

The Web service callout stage is similar to the pipeline extensibility stage of FAST for SharePoint 2010, and allows you to add a callout from the content-processing component to a web service of your own so you can manipulate the crawled content before it gets indexed by the index component.

Unlike FAST's pipeline-extensibility stage, where code had to be executed in a sandbox, the web service callout accepts a web service endpoint, which is much easier and reduces the overhead involved in writing a console application to accompany the content-flow process.

Calling a web service during the processing stage can be useful for two scenarios.

Creating new refiners by extracting data from unstructured text using our own logic
Calculating new refiners based on the data of managed properties

You can find a great example on using the web service callout in Kathrine Hammervold's post, Customize the SharePoint 2013 search experience with a Content Enrichment web service, located at http://blogs.msdn.com/b/sharepointdev/archive/2012/11/13/customize-the-sharepoint-2013-search-experience-with-a-content-enrichment-web-service.aspx.

The next point of interaction is the word-breaking stage, which allows you to write your own custom word-breaking logic for the content processor. Please refer to the MSDN documentation on custom word breakers, located at http://msdn.microsoft.com/en-us/library/jj163981.aspx.

Query components

The query components are in charge of analyzing the search query and processing the results.

Web frontend

The web frontend is where the search process actually begins. A user can interact with the search service by either writing a search query in the search center (or a search box) or developing against the new public APIs: REST/OData services and the CSOM. Both the search center and public APIs are hosted on the frontend.

Once the user creates a query, the query is sent to the query-processing component for analysis. The query-processing component analyzes the query and forwards it to the index component. The index component returns the matching results to the query-processing component for another analysis and from there the results are forwarded to the web frontend to be displayed.

Query processing component

As mentioned previously, the query-processing component's job is to analyze and process both search queries and results.

When the query-processing component receives a search query from the frontend, it analyzes it in an attempt to optimize its precision and relevance. A site administrator can interact with a query using different techniques such as query rules or result source. We will discuss these techniques in detail in the next chapter, but for now it is important to understand that these manipulations are handled within the query-processing components. As part of its query handling, the query-processing component performs linguistic processes on the query, such as word-breaking and stemming.

Once the query is optimized, it is sent to the index component, which will process the optimized query and return a result set back to the query-processing component and from there to the search frontend.

The index component

The index component is the heart of the search service, and without proper planning it can easily become the bottleneck of the service as well.

The index component has the following two roles:

Input: The index component is in charge of writing the optimized content it gets from the content-processing component to the index file
Output: The index component is in charge of returning results from the index file to the query-processing component, by request

How the index component saves and manages this index file is out of the scope of this book, but you can read more about this in the TechNet article Manage the index component in SharePoint Server 2013, located at http://technet.microsoft.com/en-us/library/jj862355.aspx.

Analytics processing component

The analytics-processing component is a new addition to SharePoint Search. Its role is to analyze both content and user actions with the content in order to improve the search relevance for the user.

The analytics architecture consists of three main parts, as follows:

The analytics-processing component, which runs the analytics jobs.
The analytics-reporting database, which stores statistical information such as usage data.
The link database, which stores information about searches and crawled documents. In addition, the link database is shared with the Content Processing Component, which in turn stores links and anchors in it. The information, the content-processing component stores is later used by the analytics-processing component.

The analytics-processing component runs two types of analytics: search analytics and usage analytics. The search analytics analyzes content from the content-processing component for information such as links, information related to people, and recommendations. The usage analytics analyzes user actions on an item, such as the number of views it had or how many users clicked on it.

An important output of usage analytics are the recommendations. The recommendations analysis creates recommendations on items based on how users have interacted with this specific item in the past. The analysis calculates an item-to-item relationship graph and updates it continuously based on search usage.

Keep in mind that the analytics-processing component is a "learning" component, which means it learns by usage. The more usage the search system will have, the better analytics it will provide.

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

Learning Search-driven Application Development with SharePoint 2013

By : Johnny Tordgeman

Learning Search-driven Application Development with SharePoint 2013

By: Johnny Tordgeman

Overview of this book

The search architecture

Content components

Crawl component

Content-processing component

Query components

Web frontend

Query processing component

The index component

Analytics processing component

Learning Search-driven Application Development with SharePoint 2013

By : Johnny Tordgeman

Learning Search-driven Application Development with SharePoint 2013

By: Johnny Tordgeman

Overview of this book

The search architecture

Content components

Crawl component

Content-processing component

Query components

Web frontend

Query processing component

The index component

Analytics processing component

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access