Google Crawl

November 1st, 2007 by Erick Beck

This morning I started a preliminary indexing of campus websites with the Google Appliance. This is more of an explatory crawl to see what is out there and find the garbage that needs to be excluded. I feel confident that we will need to throw this index away and start over with some more rigorous filters. I expect this to be an iterative process of finding and excluding problematic sites until we are only indexing worthwhile content.

In the meantime you can go to and the system will serve out queries for the sites that it has already spidered. (Please be aware that the returns, appearance, and even availablility of this site is subject to change at any time without notice until the service is ready for final release.)

Our next step is to start looking at Keyword matches that will let us affect the biasing of results — for example, to make sure that The College of Engineering shows up as the first result of a search for “engineering” rather than being buried on the second page of returns.

Thursday, November 1st, 2007 Ongoing Projects, Search
