Saving Web Pages

Data is available on the Internet in many different formats, and some formats are easier to work with than others. Sites that provide a lot of data often offer more than one format to accommodate differing users' needs. The most common formats for data available on the Internet are:

Excel Files
The most convenient format to work with since they are already in Excel format.
Lotus Files
Lotus files are now considered to be generic spreadsheet files, and all versions of major spreadsheet programs can import Lotus files. The extension on Lotus files can vary somewhat but they always start with the letters wk (examples: *.wks, *.wk1, *.wk2). Open these files in Excel by setting the "Files of Type" field to either "All Files (*.*)" or "Lotus 1-2-3 Files (*.wk?)." The file will open normally, but remember to change the file type to Excel when you save the file to avoid loosing any format changes you make.
Delimited Files
These are files that are intended to be imported into spreadsheets or databases. The columns are separated by a character (usually a tab or comma). Excel's Import Wizard will usually recognized these files, and import the file correctly, but it is best to check that the program has correctly identified the delimiter. See the section "How to Save as Plain Text" below if you are unsure how to save a delimited file in your web browser.
HTML Tables
If the data is presented in a table on a Web page, save the whole page as an HTML file and open this file in Excel. Tables are not always obvious, since they may not have borders. If the text looks like the fixed width text in the example below, the data is probably not aligned in a table and you should use . Otherwise, it almost certainly is a table and you will want to save the file as HTML. See the section "How to Save as HTML File" below if you are unsure how to save a file as HTML in your web browser.
Plain Text
Excel's Import Wizard will help import plain text files into Excel. See the section "How to Save as Plain Text" below if you are unsure how to save a delimited file in your web browser.
 

Note:

Remember to change the file type to Excel after you have opened your data in Excel. It is usually best to do this earlier rather than waiting until you have spent significant time reformatting the text and discover a problem. To change the file type to Excel, select Save As from the File menu and in the "Save type as" field, select "Microsoft Excel Workbook" (this will be the option on the top of the list).


How to Save as Plain Text

  1. If the site you are retrieving data from uses frames, click in the frame that contains the data.
  2. Pull down the File menu.
  3. Select Save As or Save Frame As.
  4. If the data on the Web page is plain text or delimited, set the "Save as type" field to Plain Text (*.txt).
 

Plain Text

Save the file on a disk or on your H: drive.


How to Save as HTML File...

  1. If the site you are retrieving data from uses frames, click in the frame that contains the data.
  2. Pull down the File menu.
  3. Select Save As or Save Frame As.
  4. The next step is different, depending on which browser you are using.
...in Microsoft Internet Explorer

If there are pictures on the web page, then you may want to save the file as Web Page, complete, which will save the HTML file and all of the images.

IE-Web Page, Complete

If you do not care about saving images, then just save the file as Web Page, HTML only.

IE-Webpage, HTML only

If the data is in an HTML table, then open the HTML file in Excel, change the file type by saving it in Excel as an Excel file.

...in Netscape Navigator

Netscape-Webpage, HTML

If the data is in an HTML table, set the "Save as type" field to HTML Files, and open the HTML file in Excel. Once the file is opened in Excel, change the file type by saving it in Excel as an Excel file.

Save the file on a disk or on your H: drive.

© Copyright 2007 Washington & Lee University
This website is provided by the Leyburn Library and University Computing
Website design and implementation by Jack Jeong , Class of 2007