Let me warn you there are a lot of new capabilities and features for search. Knowing the FAST technologies fairly well it seems like some of the FAST technologies have leaked into the SharePoint 2010 search infrastructure. Here are some quick notes on some of the search internals for SharePoint 2010.
- View in browser option for all office documents in search results.
- Boolean operators supported “SharePoint search” or “Bing Search” title:”a” or title:”b”
- Federation to people search
- Pin it to windows search so Win7, Vista search can federate to SharePoint search and the search experience can be tied to SharePoint.
People Search:
- Phonetic name matching and nickname matching for people search. e.g. “jon coughman” finds “Jonathan kaufman” “shartam mikellsen” finds “kjartan mikkelsen”
- Query suggestions mined from search logs
- Self Search – to drive people to participate content. Vanity search and you have richer options to manage how to improve your personal search profile
Scale-out (lots of improvements):
OLD

NEW
| Before | Now |
| 1 indexer | multiple indexers |
| One table for search | Property store and crawler are separate now. |
| One point of failure | Indexer do not hold the index anymore. They propagate the index to the query servers Multiple Indexers Stateless Crawlers Crawl Distribution |
| | Query: Query Mirroring Index Partitioning Decrease Query Latency Multiple Property DB’s |
| | Admin Database: To manage the search infrastructure |
Content Distribution:
Crawl Distribution:
- Build in load balancer to hosts to crawl databases
- you can overrides the host distribution rules
- Time stamp based incremental crawl, change log crawl, delete log crawl
Query:
- distribution by hash of documentID. a document has a unique id and depending on the hash generated the document is sent to a query server to keep partitions equal in size.
- crawlers partition indexed data and propagate to query servers
- query processor gets the search command and then does a a sync call to all the query servers and returns an aggregated result. since the relevance ranking used by all query servers the results are relevant.
Resiliency:
- Mirroring still supported
- More crawl components
- Native support for SQL mirroring
Engine Enhancements:
- Support for regular expressions in crawl rules. ie. Remove SSN from search.
- Native support for crawling case sensitive
- Content source priority
- New Crawl Policy to define how crawler treats error conditions
Extensibility:
- All OOB web parts are public
- public federation OM
- Connector framework: (BDC Data Connector – ( OOB Databases/ECF/.net)
BDC Connector Enhancements:
- Item level security for BCS components
- Crawl through entity associations
Admin:
- All deployment is scriptable
- configure topologies, content sources and everything else
- tons of more analytic for query, system resources, separation by content type etc.
More to come.
Posted
10-20-2009 12:31 PM
by
Shikhar Thapa