LSCI100


LESSON 4: FINDING WEBSITES: USING SEARCH ENGINES


 Reading

 Tutorial: Using ANDs & ORs In Google Searches

 Lesson 4 Assignment
 


LEARNING OBJECTIVES:


 

*  To recognize the 3 basic parts of a URL.

*  To recognize domain names.

*  To understand the difference between subject directories and search engines and know the appropriate ways to use them in research.

*  To understand the importance of selective directories.

*  To know the difference between general search engines and site-specific search engines.


*  To be aware of the common features of web search engines.


LESSON 4 TABLE OF CONTENTS:

 

1.  Preface
2.  Websites, URL’s, and Domain Names
3.
  General Web Surfing and Types of Web Pages & Web Documents
4.  Subject Directories and Selective Directories
5.
  General Search Engines and Site-specific Search Engines
6.
  Common Features of General Search Engines
7.
  Key Points to Remember




1.
PREFACE

Thus far in the course, we have concentrated primarily on finding books and periodical articles.  We have used online catalogs (to find books) and web databases (to find periodical articles).  In this lesson, we expand our search for information sources to include information found in websites.



2. WEBSITES, URL’s, AND DOMAIN NAMES

You may recall from Lesson 1 our definition of a website:

     DEFINITION: WEBSITE


A website is a coherent collection of Web pages that are linked together and reside on that part of the Internet known as the World Wide Web.  Millions of websites exist, offering vast amounts of information of varying credibility and worth.


Every website (and every Web page) has a unique address known as a URL (Uniform Resource Locator) which identifies where it is located on the Web.  For example, here is the URL for Skyline Library
’s home page:

http://skylinecollege.edu/library/index.html


URLs have three basic parts: the protocol, the server name and the resource ID. These parts provide "clues" to where a Web page originates and who might be responsible for the information at that page or site. Let's look at each part:

·        PROTOCOL: appears at the start of the URL before the double slash and identifies the method (set of rules) by which the resource is transmitted. All Web pages use HyperText Transfer Protocol (HTTP). Thus, all Web URL's begin with http://.

·        SERVER NAME: appears between the double slash (//) and the first single slash (/)
The server name for the Skyline Library URL is: skylinecollege.edu

The server name identifies the computer on which the resource is found. (Computers that store and "serve up" Web pages are called servers.) This part of the URL commonly identifies which organization or company is either directly responsible for the information or simply providing the computer space where the information is stored.

The server name always ends with a dot and a three-letter or two-letter extension called the domain name (sometimes called the domain type). The domain is important because it usually identifies the type of organization that created or sponsored the resource. Sometimes it indicates the country where the server is located. The most common domain names are:

.com for company or commercial sites

.org for non-profit organization sites

.edu for educational sites (most commonly four-year universities)

.gov for government sites

.net for Internet service providers or other types of networks 

.mil for a military body

If the domain name is two letters, it identifies a country, e.g. .us for the United States, .uk for the United Kingdom, .au for Australia, .mx for Mexico or .ca for Canada.

·       RESOURCE ID: everything after the first single slash (/)
The resource ID for the Skyline Library URL is: library/index.html

The resource ID contains directories and subdirectories, thereby giving you the exact location of the document on the server.  Following the last slash (/), you are given the file name for the specific page. (The file name for the Skyline Library homepage is: index.html.) The file name ends with a three or four letter designation that specifies the file type (e.g., .htm or .html for a standard Web page, .jpg or .gif for common graphic files).

 

3. GENERAL WEB SURFING AND TYPES OF WEB PAGES & WEB DOCUMENT FORMATS

 

At some point in your research -- usually after searching the Deep Web using Web databases and online catalogs -- you may want to look for information and opinion found on free websites within the Visible Web.  This is often referred to as general Web surfing.  Be cautious, however, when searching the Visible Web because no quality control is in effect here.  You may find highly accurate and reliable information at one website, and complete falsehoods at another.

 

When you do look for information from web pages or other documents on the Internet, it is very useful to be aware of the different types of pages and document formats that you may find.  Below is a description of some common types of web pages and web document formats:

 

Common types of Web pages:

 

- Web articles originally published in print publications

Articles that were originally published in print magazines, journals or newspapers are accessible through online databases; but they are also often accessible on websites and they can look much like regular web pages.
–Examples:
Article from a subscription database (originally published in a print newspaper) ;
Same article as above from the publication’s website
Article from a free online database (originally published in a print journal) ;
 

- Web articles written only for websites (never published in print publications)

Many articles are written only for websites that are organized as online magazines or journals. 
The basic distinction between online articles and web pages is that articles are
published on a website that produces articles on a regular basis (daily, weekly, biweekly, monthly) with each article usually clearly dated.  In contrast, web pages are generally not published on a regularly-scheduled dated basis but are added to a website as the website producers build or update a site.
–Examples:
Article from a website (only published online, never in print) (Example 1)

Article from a website (only published online, never in print) (Example 2)

 

- Blog posts

A blog (short for "web log") is a type of web page that serves as a publicly accessible personal journal (or log) for an individual. Typically updated daily, blogs often reflect the personality of the author. Blog software usually has an archive of old blog postings. Many blogs can be searched for terms in the archive. Blogs have become a vibrant, fast-growing medium for communication in professional, political, news, trendy, and other specialized web communities.  Example of a weblog

 

- Wiki pages

A term meaning "quick" in Hawaiian, that is used for technology that gathers in one place a number of web pages focused on a theme, project, or collaboration. Wikis are generally used when users or group members are invited to develop, contribute, and update the content of the wiki. Wikis can be passworded in various ways to control or allow contributions. The most famous wiki is the Wikipedia. - Example of a wiki

 

- Group discussion threads

Discussion forums one can participate in, share ideas with, and form community. Most are free and some are open to new members. Yahoo Groups and Google Groups are both popular. Google Groups includes the former Usenet Newsgroups. Blogs (see “Blog posts” above) are replacing some of the need for this type of community sharing and information exchange. – Example of a discussion thread

 

Common types of Web document formats:

 

Regular web pages are files that are created in a document format called HyperText Markup Language or HTML.  The file names for these files end with the file extension .html or .htm.  Numerous other types of files, which may be retrieved through general Web searching, are in document formats that are not HTML.  Some of the most common of these other types of files include:

 

- PDFs (.pdf file extension)

Abbreviation for Portable Document Format, a file format developed by Adobe Systems, which is used to display almost any kind of document with the formatting in the original. Viewing a PDF file requires Acrobat Reader, which is built into most browsers and can be downloaded free from Adobe.  -  Example of a .pdf document

 

- Word documents (.doc file extension)

Microsoft Word documents accessible on the Internet – Example of a Word document

 

- Excel documents (.xls file extension)

Microsoft Excel spreadsheet files accessible on the Internet - Example of an Excel document

 

- PowerPoint files (.ppt file extension)

Microsoft Powerpoint slideshow files accessible on the Internet – Example of a PowerPoint slideshow file




4. SUBJECT DIRECTORIES AND SELECTIVE DIRECTORIES

Two types of Web search tools are available to help you find websites and/or web pages: subject directories and search engines. Let's examine each separately.

Web subject directories (such as InfoMine, Librarians’ Internet Index, Google Directory or Yahoo Directory) provide lists of websites arranged by subject category.  The websites included in a subject directory are chosen by people known as indexers.  Each site in the directory is listed under one or more subject categories, as determined by the directory's indexers.  A brief description of each site listed is usually included. 

Directories are often a good place to start when you’re looking for information on relatively general subjects or if you want an overview of what’s available on the Web on a given subject.

To find websites on general subjects using subject directories: 

*  browse through the directory’s list of subject categories, OR

*  do a keyword search using terms that describe the general subject you are researching  (click here for an example)

There is wide variation in the number and quality of sites included in different Web subject directories.  Some large directories try to be as comprehensive as possible, with very extensive listings. However, one disadvantage of these large directories is that they usually do little evaluation of the quality of the sites they list, thus making them somewhat less effective at finding the best sites in a particular subject area.

For that reason, you are wise to use a subject directory that only lists sites known to be high quality. These directories are known as selective directories. In addition to indexing credible websites, selective directories often provide links to leading sites in many subject areas, which in turn, provide links to more specific high-quality documents on a particular topic within the broader subject area.

Recommended selective directories:

InfoMine (http://infomine.ucr.edu/) -- academic resources

Internet Public Library (http://www.ipl.org/)

Librarians' Internet Index (http://www.lii.org) -- high-quality resources on a range of general subjects

AcademicInfo (http://www.academicinfo.net) -- scholarly sites on a wide range of subjects

Scout Report Archives (http://scout.cs.wisc.edu/archives/) -- academic resources



5. GENERAL SEARCH ENGINES AND SITE-SPECIFIC SEARCH ENGINES

Web search engines (such as Google, Yahoo search, and many others) allow you to search through millions of websites using your own keyword(s).  Websites gathered and indexed by search engines are not selected, organized or previewed by humans. Instead, their collection of websites is created entirely by computer programs called spiders (also known as robots) that continuously scan the Internet looking for sites to add to their index. 

Since the collection of websites indexed by search engines are huge (numbering in the millions) and often have no subject organization at all, it is very important to think carefully about what search words to use and be aware of the various search features available before performing a search. Always look for the "Search Help," "Search Tips," or other pages that explain the features of the search engine you're using. Remember that Web search engines, unlike library online catalogs, do not use a common set of subject headings. Therefore, to use search engines effectively, it is usually best to use very precise search words or phrases, or combine several search terms using Boolean logic (as discussed in Lesson 3).
 
Search engines should be used when you have a focused research question in mind or when you’re looking for a specific item of information, such as a known document (e.g. the U.S. Declaration of Independence), image, etc. or a specific web page. They're not recommended for finding sites on broad subjects, such as "astronomy" or "history." As discussed earlier, Web subject directories should be used to find sites on general subjects.

Finally, there is a special type of search engine you should be aware of.  Sometimes, websites offer their own internal search engine that allows you to search just that website’s collection of information.  These are known as site-specific search engines.  Click HERE to see an example of a website that contains a site-specific search engine.



6. COMMON FEATURES OF GENERAL SEARCH ENGINES 

Listed below are features common to many search engines, with a particular focus on Google because Google has one of the largest databases of web pages and because its PageRank™ system is considered to be among the most effective at identifying high quality pages.  Keep in mind that these features may not work the same -- or even be available -- on every search engine.  

Note: The search examples shown below (in bold and italics) are links to actual Google searches for those examples.  Click on any of the examples to see the search in Google.

AND: Many search engines use the + sign (often called the "require" sign) in front of words that must be included in the search results. For example, +immigration +economy may be used instead of immigration AND economy. Google and many other search engines that allow the use of AND and OR require that they be capitalized. (Thus, it's a good idea to always capitalize these connectors if you use them.) 

OR: the OR should be used between synonymous, equivalent terms, or variant spellings or endings that should be included in a search for the same idea.  ORs should usually be used in combination with ANDs or other methods of limiting your search.

Synonyms: Google will search for words with similar meanings using the ~ symbol.  For example, the search: ~food  would find web pages with the words: recipes, nutrition or cooking.  This feature does not always work effectively and should be used cautiously.

Phrase searching:  by putting a phrase in quotation marks, documents will be retrieved that contain that exact phrase.  For example: "illegal immigration" will retrieve documents containing those two words next to each other as a phrase. 

Truncation not available: Google (like most major search engines) does not provide truncation (a symbol--usually an asterisk--that allows you to search for all variations of a common root).  Instead of truncation, use the OR (capitalized) between words with variant endings.

Organizing precise searches: When doing a search in Google or any other Web search engine, be sure to use at least one search term from each of the concepts for your research question or topic.  You may add parentheses to make it easier to see & organize your concepts, but they are not necessary. For example: 

  
Research question: “How does illegal immigration affect the U.S. economy?”
   Google search:
("illegal immigration" OR "illegal aliens") AND (economy) AND ("United States" OR U.S.)

Relevance ranking: a programming method that attempts to rank search results in order to place those pages that are most relevant to your search and/or are the highest quality pages at or near the top of the results list.  Search engines' ranking systems are based on various factors. Documents returned from a search can be ranked on such factors as:

  • frequency of search words in document
  • words found in title or near beginning of document
  • search words found close to one another (proximity)
  • popularity - based on the number of links to a page and the importance of the pages that link to it
  • importance - amount of traffic to the page, quality of links


A few general suggestions for using search engines:

·        Be as specific as possible. Use search terms for all of your concepts and use phrases (with " ") when appropriate.  
Note: Google limits queries to 32 words.

·        Do not type in your search as a question.  For example, do not enter a search such as: 
How do illegal immigrants affect the U.S. economy?. 

Instead, do a search that includes just the key words for all of the concepts in your topic.  For example:
"illegal immigration" OR "illegal aliens" AND economy AND "United States" OR U.S.
Use only words related to your topic that are mostly likely to be used in web pages on your topic.

·        Do not include words that describe the type or format of information you are searching for.
For example, do not use words such as: “listings of,” “articles about,” “discussion of,” “documentation on,” and “pages about.



7. KEY POINTS TO REMEMBER

·        A website is a coherent collection of Web pages linked together.

·        URL’s have 3 basic parts: the protocol, the server name, and the resource ID.

·        The server name always ends with a dot and a 3-letter or 2-letter extension called the domain name (or domain type).  The domain name is important because it usually identifies the type of organization that created or sponsored the website.

·        Looking for information and opinion found on free websites within the Visible Web (as opposed to the Deep Web) is known as general Web surfing. Be cautious, however, because surfing can uncover highly credible sites as well as sites containing very questionable or false information.

·        Two types of Web search tools are available to help you find websites and/or web pages: subject directories and search engines.

·        Web subject directories provide lists of websites arranged by subject category.  The websites included in a subject directory are selected, organized, and previewed by human beings. They’re often a good place to start when you’re looking for information on relatively general subjects or if you want an overview of what is available on the Web on a given subject.

·        Selective directories, such as the Librarians' Internet Index, are a type of subject directory that only list sites recognized to be high in academic quality.

·        Web search engines (such as Google, Yahoo! Search, and many others) allow you to search through millions of websites using your own keyword(s).  Computer programs known as spiders collect and index the websites found with a search engine. It is appropriate to use search engines when you have a focused research question in mind rather than a broad subject.

·        Sometimes, websites offer their own internal search engine that allows you to search just that website’s collection of information.  These are known as site-specific search engines.

 

Go to  Tutorial: Using ANDs & ORs In Google Searches

Optional:
Additional tutorials on using Web search engines:
- UC Berkeley Tutorial on "Recommended Search Engines";
- "Googling to the Max" exercises
(.pdf)

List of Lessons

 

LSCI 100 Home

 


last revised: 11-18-08 by Eric Brenner & Dennis Wolbers, Skyline College, San Bruno, CA
These materials may be used for educational purposes.  Please inform and credit the author and cite the source as: LSCI 100: Introduction to Information Research. All commercial rights are reserved. Send comments or suggestions to: Eric Brenner at: brenner@smccd.edu.