SharePoint Use Cases

02 Jul, 2007

Crawling external web sites

Posted by: Toni Frankola In: SharePoint  Bookmark and Share

My company is in process of upgrading our intranet site to SharePoint 2007. We are looking forward to new search features. We want to make our SharePoint a central place that will allow our employees to search for all business information.

Therefore we are going to crawl:

  • Old intranet site
  • Our CRM (MS Dynamics 3.0)
  • Custom LOB applications
  • Public folders
  • Active directory in search of user profiles
  • Our web sites

The first thing I wanted to setup was crawl of our web sites. I went to Search settings page and entered my company’s URLs and then started full crawl. However crawler indexed only one page – homepage. All other pages were rejected for some strange reasons. Crawl log stated the following problem:

“The specified address was excluded from the index. The crawl rules may have to be modified to include this address”

The problem was that URLs on our site contained question marks in the URLs. All URLs look the same, e.g. http://www.contoso.com/default.aspx?sec=12.

To force your crawler to index such pages you need to go to: Shared Services Administration: SharedServices1 > Search Settings > Crawl Rules and create a new crawl rule.

  • 1. In page section enter your site URL (e.g. http://www.contoso.com/*)
  • 2. In Crawl configuration section select „Include all items in this path ” and option „Crawl complex URLs (URLs that contain a question mark (?)) ” below it
  • 3. Click OK
Once I did that our webs were added to index. In following posts I will explain how we are going to organize enterprise search in our company.


Documentation Toolkit for SharePoint

Comments

1 | drbuae

April 30th, 2008 at 9:44 am

Avatar

drbuae

Comment Form


About

Real-life use case and opinions about collaboration, CRM and web technologies and stuff by Toni Frankola. More...

Toni Frankola - SharePoint MVP Profile

All postings on this blog are provided "AS IS" with no warranties, and confer no rights. All entries in this blog are my opinion and don't necessarily reflect the opinion of my employer.

Page optimized by WP Minify WordPress Plugin