Search Architecture in SharePoint 2013

Hi All,

In a following post we will see how the search works in SharePoint 2013 and Its Architecture and Search Components.

Search in SharePoint 2013 have a lot improvements and new features compared with SharePoint 2010. The Core SharePoint 2010 Search Product and Fast Search for SharePoint 2010 merged into a single Product/Entity , we can call as SharePoint 2013 Search.

The search architecture 2013 consists of the following areas:

Crawl Component
Content processing Component
Index Component
Analytics Processing Component
Query processing Component
Search administration Component

Crawl component

The crawl component crawls the content sources.Ex: File shares, SharePoint, Lotus Notes, SQL Databases. etc.,
After retrieving the crawled items both actual and their associated Metadata, It delivers to the Content Processing Component.
To retrieve information, the crawl component connects to the content sources by invoking the appropriate indexing connector or protocol handler.
Multiple crawl components can be deployed simultaneously.

Crawl database :

Contains information about crawled items, such as last crawl time, the last crawl ID, and the type of update during the last crawl.

Content processing component

The content processing component processes crawled items and sends these items to the index component.
It performs operations such as document parsing and property mapping by transforming crawled items into artifacts which can be included in index.
It also performs linguistics processing such as language detection and entity extraction.
The content processing component also writes information about links and URLs to the link database.

In SharePoint 2010, the crawl component was ultimately responsible for extracting metadata, links, and property mappings and used multiple plug-ins for this purpose.

Index component

The index components host the actual index itself. It receives the processed items from the content processing component and writes them to the search index.

This component also handles incoming queries, retrieves information from the search index and sends back the result set to the query processing component.
The index stores both crawled items and their associated properties. The index is more efficient now because it's been broken up into update groups. Each update group contains a unique portion of the index. This allows for partial updates which mean if I make a change to a document, only that change is updated within the index of the associated update group instead of the entire document.

In SharePoint 2010, we would need to update and re-index the entire document. Also, we no longer store the index on servers hosting a Query component which was the case in SharePoint 2010. The whole concept of propagating index items from crawler to query server hosting a query component no longer applies in SharePoint 2013.

Analytics Processing Component

The Analytics Processing Component analyses the Search content and the way users interact with it.These Analysis can be done in 2 ways:

Search analytics

It is about extracting information, such as links, the number of times an item is clicked, anchor text, data related to people, and meta data, from the link database. This information is important to relevance.

Usage analytics

It is about analyzing usage log information received from the front-end via the event store. Usage analytics generates usage and statistics reports.

The results from the analysis are added to the items in the search index so that search relevance improves automatically over time.
The Search results are used in reports that helps the search administrators to take any further steps to improve the search performance.

The analytics architecture consists of the analytics processing component, analytics reporting database, and link database.

Analytics processing component

Performs search analytics and usage analytics. Runs the Analytics jobs.

Link database

Stores unprocessed information that is extracted by the content processing component and information about search clicks. The analytics processing component analyzes this information

Analytics reporting database

Stores the results of usage analytics. SharePoint Server uses the information in this database to create Excel reports for the search administrators.

Event store

Stores the usage events that are captured on the front-end.

Query Processing Component

The query component analyzes and processes queries and results.
It performs linguistics processing such as word breaking and stemming.When the query processing component receives a query(keyword) from the search front-end, it analyzes and processes the query to optimize precision, recall and relevance.
The processed query is submitted to the index component. The index component returns a result set based on the processed query to the query processing component, which in turn processes that result set, before returning it to the search front-end.

Search administration Component

The search admin component manages and controls the entire search infrastructure.

It maps to a Search Admin database and It can be made fault tolerant (add additional search admin components) which is yet another improvement over SharePoint 2010 search.
The Search Administration Component is responsible for the search topology and search provisioning. It coordinates with search components Content Processing, Query, Index and Analytics.
The search admin component governs topology changes and stores things like the following:

Topology

Crawl and Query Rules

Managed Property Mappings (Search Schema)

Content Sources

Crawl Schedules

Search administration database

Stores search configuration data.

Search Service Application Databases

Search_Service_Application_AnalyticsReportingStoreDB: Used by the Analyzer component to store results of search usage analytic and recommendation data.

Search_Service_Application_CrawlStoreDB: Used by the Crawler/Gatherer to manage crawl operations and store history, url, delete, error data. Each crawl database can have one or more crawlers associated with it.

Search_Service_Application_LinkStoreDB: The link database stores information extracted by the content processing component. In addition, it stores information about search clicks; the number of times people click on a search result from the search result page. This information is stored unprocessed, to be analyzed by the analytics processing component

Search_Service_Application_DB: Used by the Search Services to store search core partition state, search configuration and feature data and topology configurations and. Only one search admin database per SSA

Services:

SharePoint Search has two system services.

SharePoint Host Controller
SharePoint Server Search 15

SharePoint Host Controller

The SharePoint Search Host Controller service performs the deployment and management for SharePoint Search components on a host.

It provides functionality for installing, removing, starting and stopping search components (now referred to as nodes) within a host.

SharePoint Server Search 15

The SharePoint Server Search 15 service launches the mssearch process.

The MSSearch.exe process responsible for crawling content from various repositories, such as SharePoint sites, HTTP sites, file shares, Exchange Server, etc.

When a request is issued to crawl a 'Content Source', the MSSearch.exe invokes a 'Filter Daemon' process called MssDmn.exe.

MssDmn.exe loads the required protocol handlers and filters necessary to connect, fetch and parse the content.

Thanks .

SharePoint Waves