SharePoint 2013 Search introduces a new search architecture that includes significant changes and new additions compared to previous versions. Since Microsoft consolidated FAST and SharePoint Search, the new search architecture has inherited components from both products while maintaining high scalability and performance.
The crawl component is responsible for crawling content sources. It is the first stop for data that is about to be indexed by the search engine. The crawl component invokes connectors (both out-of-the-box and custom ones) that interact with the content source in order to crawl it.
While indexing, the crawl component uses one (or more) crawl database to temporarily store detailed tracking and historical information about the crawled item, such as the last time the item was crawled and the type of update during the last crawl.
Once an item is crawled, meaning both its data and its associated metadata is crawled, the crawl component delivers it to the content-processing component.
The rectangular blocks in the diagram represent stages that we cannot interact with. We won't be discussing them as they are quite self-explanatory. The curved rectangular blocks, however, represent stages that we can interact with during the processing flow.
The Web service callout stage is similar to the pipeline extensibility stage of FAST for SharePoint 2010, and allows you to add a callout from the content-processing component to a web service of your own so you can manipulate the crawled content before it gets indexed by the index component.
Unlike FAST's pipeline-extensibility stage, where code had to be executed in a sandbox, the web service callout accepts a web service endpoint, which is much easier and reduces the overhead involved in writing a console application to accompany the content-flow process.
Calling a web service during the processing stage can be useful for two scenarios.
Creating new refiners by extracting data from unstructured text using our own logic
Calculating new refiners based on the data of managed properties
You can find a great example on using the web service callout in Kathrine Hammervold's post, Customize the SharePoint 2013 search experience with a Content Enrichment web service, located at http://blogs.msdn.com/b/sharepointdev/archive/2012/11/13/customize-the-sharepoint-2013-search-experience-with-a-content-enrichment-web-service.aspx.
The next point of interaction is the word-breaking stage, which allows you to write your own custom word-breaking logic for the content processor. Please refer to the MSDN documentation on custom word breakers, located at http://msdn.microsoft.com/en-us/library/jj163981.aspx.
The web frontend is where the search process actually begins. A user can interact with the search service by either writing a search query in the search center (or a search box) or developing against the new public APIs: REST/OData services and the CSOM. Both the search center and public APIs are hosted on the frontend.
Once the user creates a query, the query is sent to the query-processing component for analysis. The query-processing component analyzes the query and forwards it to the index component. The index component returns the matching results to the query-processing component for another analysis and from there the results are forwarded to the web frontend to be displayed.
When the query-processing component receives a search query from the frontend, it analyzes it in an attempt to optimize its precision and relevance. A site administrator can interact with a query using different techniques such as query rules or result source. We will discuss these techniques in detail in the next chapter, but for now it is important to understand that these manipulations are handled within the query-processing components. As part of its query handling, the query-processing component performs linguistic processes on the query, such as word-breaking and stemming.
Once the query is optimized, it is sent to the index component, which will process the optimized query and return a result set back to the query-processing component and from there to the search frontend.
The index component has the following two roles:
How the index component saves and manages this index file is out of the scope of this book, but you can read more about this in the TechNet article Manage the index component in SharePoint Server 2013, located at http://technet.microsoft.com/en-us/library/jj862355.aspx.
The analytics architecture consists of three main parts, as follows:
The analytics-reporting database, which stores statistical information such as usage data.
The link database, which stores information about searches and crawled documents. In addition, the link database is shared with the Content Processing Component, which in turn stores links and anchors in it. The information, the content-processing component stores is later used by the analytics-processing component.
The analytics-processing component runs two types of analytics: search analytics and usage analytics. The search analytics analyzes content from the content-processing component for information such as links, information related to people, and recommendations. The usage analytics analyzes user actions on an item, such as the number of views it had or how many users clicked on it.
An important output of usage analytics are the recommendations. The recommendations analysis creates recommendations on items based on how users have interacted with this specific item in the past. The analysis calculates an item-to-item relationship graph and updates it continuously based on search usage.
Keep in mind that the analytics-processing component is a "learning" component, which means it learns by usage. The more usage the search system will have, the better analytics it will provide.