Hi All,
In a following post we will see how the search works in SharePoint 2013 and Its Architecture and Search Components.
Search in SharePoint 2013 have a
lot improvements and new features compared with SharePoint 2010. The Core SharePoint 2010 Search Product and Fast Search for SharePoint 2010
merged into a single Product/Entity , we can call as SharePoint 2013 Search.
The search architecture 2013 consists of the following areas:
- Crawl Component
- Content processing Component
- Index Component
- Analytics Processing Component
- Query processing Component
- Search administration Component
Crawl component
- The crawl component crawls the content sources.Ex: File shares, SharePoint, Lotus Notes, SQL Databases. etc.,
- After retrieving the crawled items both actual and their associated Metadata, It delivers to the Content Processing Component.
- To retrieve information, the crawl component connects to the content sources by invoking the appropriate indexing connector or protocol handler.
- Multiple crawl components can be deployed simultaneously.
Crawl database :
Contains information about crawled items, such as last crawl time, the
last crawl ID, and the type of update during the last crawl.
Content processing component
- The content processing component processes crawled items and sends these items to the index component.
- It performs operations such as document parsing and property mapping by transforming crawled items into artifacts which can be included in index.
- It also performs linguistics processing such as language detection and entity extraction.
- The content processing component also writes information about links and URLs to the link database.
In SharePoint 2010, the crawl component was ultimately responsible for
extracting metadata, links, and property mappings and used multiple plug-ins
for this purpose.
Index component
The index components host the actual index itself. It receives the
processed items from the content processing component and writes them to the
search index.
- This component also handles incoming queries, retrieves information from the search index and sends back the result set to the query processing component.
- The index stores both crawled items and their associated properties. The index is more efficient now because it's been broken up into update groups. Each update group contains a unique portion of the index. This allows for partial updates which mean if I make a change to a document, only that change is updated within the index of the associated update group instead of the entire document.
In SharePoint 2010, we would need to update and re-index the entire
document. Also, we no longer store the index on servers hosting a Query
component which was the case in SharePoint 2010. The whole concept of
propagating index items from crawler to query server hosting a query component
no longer applies in SharePoint 2013.
Analytics Processing Component
The Analytics Processing Component analyses the Search content and the
way users interact with it.These Analysis can be done in 2 ways:
Search analytics
It is about extracting information, such as links, the number of times an item is clicked, anchor text, data related to people, and meta data, from the link database. This information is important to relevance.
Usage analytics
It is about analyzing usage log information received from the front-end via the event store. Usage analytics generates usage and statistics reports.
- The results from the analysis are added to the items in the search index so that search relevance improves automatically over time.
- The Search results are used in reports that helps the search administrators to take any further steps to improve the search performance.
The analytics architecture consists of the analytics processing
component, analytics reporting database, and link database.
Analytics processing component
Performs search analytics and usage analytics. Runs the Analytics jobs.
Link database
Stores unprocessed information that is extracted by the content
processing component and information about search clicks. The analytics
processing component analyzes this information
Analytics reporting database
Stores the results of usage analytics. SharePoint Server uses the
information in this database to create Excel reports for the search
administrators.
Event store
Stores the usage events that are captured on the front-end.
Query Processing Component
- The query component analyzes and processes queries and results.
- It performs linguistics processing such as word breaking and stemming.When the query processing component receives a query(keyword) from the search front-end, it analyzes and processes the query to optimize precision, recall and relevance.
- The processed query is submitted to the index component. The index component returns a result set based on the processed query to the query processing component, which in turn processes that result set, before returning it to the search front-end.
Search administration Component
The search admin component manages and controls the entire search
infrastructure.
- It maps to a Search Admin database and It can be made fault tolerant (add additional search admin components) which is yet another improvement over SharePoint 2010 search.
- The Search Administration Component is responsible for the search topology and search provisioning. It coordinates with search components Content Processing, Query, Index and Analytics.
- The search admin component governs topology changes and stores things like the following:
Topology
Crawl and Query Rules
Managed Property Mappings (Search Schema)
Content Sources
Crawl Schedules
Search administration database
- Stores search configuration data.
Search Service Application Databases
Search_Service_Application_AnalyticsReportingStoreDB: Used by the
Analyzer component to store results of search usage analytic and recommendation
data.
Search_Service_Application_CrawlStoreDB: Used by the Crawler/Gatherer to manage crawl
operations and store history, url, delete, error data. Each crawl database can have one or more
crawlers associated with it.
Search_Service_Application_LinkStoreDB: The link database stores
information extracted by the content processing component. In addition, it
stores information about search clicks; the number of times people click on a
search result from the search result page. This information is stored
unprocessed, to be analyzed by the
analytics processing component
Search_Service_Application_DB: Used by the Search Services to store
search core partition state, search configuration and feature data and topology
configurations and. Only one search admin database per SSA
Services:
SharePoint Search has two system services.
- SharePoint Host Controller
- SharePoint Server Search 15
SharePoint Host Controller
The SharePoint Search Host Controller service performs the deployment
and management for SharePoint Search components on a host.
It provides functionality for installing, removing, starting and
stopping search components (now referred to as nodes) within a host.
SharePoint Server Search 15
The SharePoint Server Search 15
service launches the mssearch process.
The MSSearch.exe process responsible for crawling content from various repositories, such as
SharePoint sites, HTTP sites, file shares, Exchange Server, etc.
When a request is issued to crawl a 'Content Source', the MSSearch.exe
invokes a 'Filter Daemon' process called MssDmn.exe.
MssDmn.exe loads the required protocol handlers and filters necessary
to connect, fetch and parse the content.
Thanks .
0 comments:
Post a Comment