Have you ever wanted to crawl a specific document library in a SharePoint site? Have you tried and instead you successfully crawled the entire portal or site collection? I've been told that this can be accomplished by crawling the library by its UNC path; however, that still doesn't work.
A customer I work with had a technical requirement to have multiple portal implementations across the US. Each portal implementation had a specific document library that contained documents and metadata. The main portal in this scenario was responsible for crawling the individual document libraries in each portal implementation; however, we always ran into issues with the crawler "jumping outside" of the intended crawl scope. Sure, we could add include/exclude paths until our eyes bled, but that process never really seems to work as one would expect. We opened a ticket with Microsoft and we're presented with an approach that actually works.
1. Identify the underlying document library's "site" or "area" and use the crawl logs to find the URL that SharePoint used to crawl the content. This is a painful process; however, it can be made easier by searching the gatherer logs that are stored in the portal's underlying _Serv database. The URL you should look for will have the form: sts2://<servername>/webid=000/listid={listid}. To this date I have no clue where the web id comes from since web ids are typically GUIDs. The list id on the other hand is a GUID that can be easy to ascertain by looking at querystrings on the portal site. Regardless, it is much easier to find this URL by querying the gather log tables in the _Serv database.
2. Once the exact URL is identified, you can add an Exchange Public Folder content source that points to aforementioned URL. Configure the content source to crawl as desired and start the crawl. Assuming the crawl account you are using has access to the SharePoint site, you're in business.
By following these two steps, you can crawl individual document libraries and/or lists in SharePoint. This is very powerful for content aggregation across an enterprise that has disparate stores for documents. Assuming the documents have like metadata, an advanced search scenario makes this even more interesting. Since each underlying URL that was identified in 1 and 2 above is a content source, you can create a scope that includes each content source. With some custom programming, a SharePoint developer can create an interface for business users to choose which content source or sources they want to search as well as providing search inputs to search for documents by metadata in an advanced search.
An example of this scenario is as follows: Joe User wants to find all documents in Portal A's library, Portal C's library, but not Portal B's library with department = HR and document type = specification and a free text search for documents containing the word SharePoint.
IMO, this is a powerful customization.
Well... after several months or maybe even several years I am posting to my blog. I've got a couple ideas on what I can blog about during the next couple of weeks.
- Demystifying Ghosting and Unghosting -- IMO this will be a hot topic with the upcoming release of MOS 2007. Raymond Mitchell and I made some interesting discoveries while working at a customer site this week. I'm sure there will be a number of posts that the IW Team will have in this area related to migration from SharePoint v2 (WSS and SPS) to MOS 2007.
- Demystifying Users Security, Profiles, Members, etc., etc. in SPS v2 -- SharePoint v2 can be quite confusing when looking at all the place a "User" record exists. This isn't very exciting, but it needs some clarification. Plus, I like to use the term "Demystifying" in the topic...
- Creating a Framework for Accessing SharePoint List Data via Web Services -- Who knows if it is even necessary, but I think it's cool. Plus, it's using a framework similar to one I've used on other non-SharePoint projects. It's basically using the Data Mapper approach to map data coming from a SharePoint web service to a Domain Layer.
- General Consulting Topics -- I've been in the consulting world for over 7 years now. I've accumulated some thoughts / opinions on what I believe make a good -- no *great --consultant.
*Side Note: While driving home, I realized saying great in the item above may sound a little arrogant. I'm definitely not saying I'm a great consultant...
There it is. My first post. Jake - I wonder when you'll see this. You'll have to give me some feedback. Am I doing the link thing properly?