My company is in process of upgrading our intranet site to SharePoint 2007. We are looking forward to new search features. We want to make our SharePoint a central place that will allow our employees to search for all business information.
Therefore we are going to crawl:
- Old intranet site
- Our CRM (MS Dynamics 3.0)
- Custom LOB applications
- Public folders
- Active directory in search of user profiles
- Our web sites
The first thing I wanted to setup was crawl of our web sites. I went to Search settings page and entered my company’s URLs and then started full crawl. However crawler indexed only one page – homepage. All other pages were rejected for some strange reasons. Crawl log stated the following problem:
“The specified address was excluded from the index. The crawl rules may have to be modified to include this address”
The problem was that URLs on our site contained question marks in the URLs. All URLs look the same, e.g. http://www.contoso.com/default.aspx?sec=12.
To force your crawler to index such pages you need to go to: Shared Services Administration: SharedServices1 > Search Settings > Crawl Rules and create a new crawl rule.
- 1. In page section enter your site URL (e.g. http://www.contoso.com/*)
- 2. In Crawl configuration section select „Include all items in this path ” and option „Crawl complex URLs (URLs that contain a question mark (?)) ” below it
- 3. Click OK