Sunday, November 15, 2009

Readings- 11/17/09

Web Search Engines-
Part 1 of this two part article address basics of how search engines work, how they crawl through web sites and spread their search load among different servers. These search engines, referred to in this article by the acronym GYM (Google, Yahoo, Microsoft), have politeness filters built in so they do not overwhelm any one site. The second part of this article talks about how the search algorithm works to index the web and find the best match to one's search. Many issues addressed in this article have been seen previously in past articles we have read about how Google was created and works. While informative, I did not feel that I was being told anything that I did not already know.

Current developments and future trends for the OAI protocol for metadata harvesting-
This article takes a look at the OAI, how it was developed, how it is now being used and what the future holds for the database. One of the most interesting things about the OAI database is that there are multiple contributers who archive for their specific communities. So while the metadata requirements are the same for all items archived in this database, each community interprets it a different way and while it does not end up to be complete chaos, there seems to be some confusion within the communities on how strictly each group must adhere to the rules. It was great to see different groups coming together and contributing towards a common goal of archiving information, and for there to be a central place for them to do just that.

The Deep Web: Surfacing Hidden Value-
This article delves deeper into the issues of the deep web and finding information located that far down in the web. There is a huge amount of information out there that we dod not have access to because of how search engines work. They merely skim the surface of things as opposed to searching deep down. Many of the websites located in the deep web are public sites, and those that are not are accessible by paying a fee, such as sites like JSTOR. With the wealth of information available below the surface of the web we have come to know, it is now time to start being able to search for that information. This article clearly lays out just how much we are missing and in a day and age where finding the most relevant information is imperative to many people, ignoring this recourse can only be detrimental to our learning process in the end. The graphs using boats and fish were also highly amusing.

4 comments:

  1. I agreed with your opinion that the first article was repetitive. It is amazing though what Google and Microsoft have been able to accomplish.

    ReplyDelete
  2. Does anyone know if there are open source non-commercial search engines out there?

    ReplyDelete
  3. Many people depend on Google and other search engine. i think this is not a good way to find al information that we need because sometimes these search engines do not find all information that we need but people think they get all they need by searching Google or yahoo,

    ReplyDelete
  4. I agree that the first site was fairly useless and repetitive. I'd be curious to know if websites can manage to start searching parts, at least, of the deeper web.

    ReplyDelete