Beginner’s guide to Web Scraping using Excel

Beginner Guide to Web Scraping Using Excel
Excel or not, web scraping is hugely important, isn’t it?

It is vital for your business whatever it may be.

Do you ever wonder how?

Well, to start with, web scraping is the process of extracting web data.

Why do you need web data?

You need web data because you base all your decisions related to business strategy on web data.

Whether it is price intelligence, sentiment analysis, or lead generation, you need data to arrive at your strategy.

With the e-commerce boom, businesses have gone online. Every moment, companies change their strategies and you need to keep a close watch on the market trends.

Guess what you need?

Data. Live. Current. Up-to-the-minute.

The question is how to get it?

Web scraping is the way out.

It can enable you to scrape web data in an automated fashion and allow you to save the same in a format of your choice. Isn’t it amazing?

It truly is.

There are various tools and techniques. But one of the great ways to scrape data in such a manner is to leverage excel for web scraping.

You might wonder how but no worries.

Here’s a complete, step-by-step tutorial to use excel to scrape data:

Excel for Web Scraping

Excel is amazing anyway. You are no stranger to Excel. You use it for various purposes.

However, you have perhaps not yet explored how to capitalize on its potential for web scraping.

As you know, Excel is a fantastic tool to deal with data in a structured format!

Why Excel is great is because it cannot only carry out basic data computations but also perform data collection as well as analysis.

It’s quite obvious how Excel can sort, apply filter, use chart, use group and outline data, use pivot table for data analysis.

There is always a huge amount of data that needs to be scraped. It is this data that needs to be analyzed for new insights and business strategy. Where Excel comes in handy is that we need to import into Excel in order to analyze it.

When you use Excel as a web scraper tool, what it does is that it saves a lot of time and energy to scrape web data.

Here’re the ways to use it to extract web data:

Ways to Extract Data Using Excel

1. Copy/Paste

Anyone who has used excel knows how to use the copy/past method. Anyway, it works in this way. All you need to do is to view or export the data to a new Excel, CSV or text file and then copy and paste data into the destination Excel worksheet. It should be quite easy for anyone conversant with Excel. There could be other methods more efficient for recurring use of workbooks.

2. Push

On the other hand, you can also use this method wherein you push the data from system into a specific worksheet in an existing workbook. Bear in mind that not all applications support this method. However, when they do, it will save you a step, in return some time and energy!

3. Pull

Likewise, you can pull data directly from the external data source with the help of external data feature of Excel.

This particular blog post focuses on this feature, and the relevant details related to the same.

How to Scrape Data from Website using Excel?

Excel web query is an excellent way to automate the routine task of accessing a web page and copying the data on an Excel sheet.

If you use web query, you can instruct Excel where to look (web page) and what to copy (tables of data). What this will accomplish is that Excel will automatically import the data onto a worksheet for you.

What is Excel Web Query?

It’s merely a feature in Excel. What it does is that it enables you to extract data stored on the Internet.

You would be amazed by how it can automatically find tables on the web page and let you select the particular one that you need data from.

This feature is quite useful because it creates HTML content in an Excel worksheet by passing along the necessary parameters required by the structure of the web page to display data in a workbook.

Web queries can also be fairly useful for a scenario wherein a standard ODBC (Open Database Connectivity) connect would not be easy to create or maintain for merely getting data from the web pages.

Getting Started

The best way to understand the way Excel can effectively scrape data is through an example. So we would try to retrieve Forecast data from timeanddate.com.

However, you got to remember that this sample tutorial is possible using Excel 2013 for Windows. You might not see the same kind of steps or efficiency if you use older/newer version of the program.

Step 1: Creating Excel Web Query

  • Let’s get started by creating a simple Excel Web Query.
  • The first step is to copy URL from which you wish to retrieve data.
  • For this tutorial, we will use this URL = https://www.timeanddate.com/weather/usa/new-york
  • Now, go to Excel and open a workbook with a blank worksheet.
  • Go to Data > From Web
    Data From Web
  • After you click on From Web, you would be returned to the New Web Query window as illustrated in screenshot given below.
    New Web Query
  • Enter the web page URL (which we copied from timeanddate.com) in the Address bar and hit the Go button.
    Paste URL Below

Step 2: Select Data

  • In the New Web Query dialog box, you will be able to see a yellow box with a black arrow right at the top left of every table on the page.
  • By positioning your mouse cursor over one of these arrows, the corresponding table becomes outlined in blue. If this is the table you wish to import into your spreadsheet, click on the arrow so that it changes to a green box with a checkmark and the table is highlighted in blue.
  • For this example, we will select table: Forecast for the next 48 hours as shown in screenshot given below.
    Select Table

Step 3: Store Data to Worksheet

  • Once you have completed the selection of tables to import, click on the Import button to store data into the worksheet.
    Import Data
  • Now, Excel would display the Import Data dialog box.
    Data Dialog Box
  • In the Import Data dialog box, instruct Excel where to place the imported web data.
  • Then, select the Existing Worksheet radio button to place the table data into the existing, open, empty worksheet. Alternatively, select the New Worksheet radio button to have Excel place the table data into a newly inserted blank sheet.
  • For this example, we will choose existing worksheet, and click OK button to store data in existing worksheet.
    Dialog Box OK
  • Now, Excel would place the table data into the specified location. Sometimes, grabbing the table data might take a few moments. Excel makes an effort to grab and arrange the table information.
  • Output would look somewhat like screenshot given below:
    Output Sheet

Advanced tactics

1. Customizing web query

  • Once you create a Web Query, you can customize it to suit your needs. To access Web query properties, right-click on a cell in the query results and choose Edit Query.
  • When the Web page you’re querying appears, click on the Options button in the upper-right corner of the window to open the dialog box shown in screenshot given below.
  • The options here allow you to change how the query interacts with the Web page itself.
    Web Query Option

2. Auto refreshing Data

  • Once the data is in your spreadsheet, you can manually refresh it by right-clicking any cell of the Web-queried data and selecting Refresh from the shortcut menu.
    Refresh All
  • Likewise, you can tell Excel to automatically refresh the Web query data at regular intervals by again right-clicking any cell of the Web-queried data and selecting Data Range Properties from the shortcut menu. This will open the External Data Range Properties dialog box.
  • Under the Refresh control options, you can specify that the data is to be refreshed automatically as often as once every 60 minutes.
  • If you want to overwrite existing data by new data (updated data), all you need to do is, choose radio button: overwrite existing cells with new data, clear unused cells.
    Data Range Properties

3. Data analysis

  • Data Analysis is a process of inspecting, cleaning, transforming and modeling data with the goal of discovering useful information, suggesting conclusions and supporting decision-making
  • Excel is useful because it provides commands, functions and tools that make your data analysis tasks easy. You can avoid many time consuming and/or complex calculations using Excel.
  • Here you can find more about data analysis with excel.

 

Another example of Web Scraping using excel:

Limitations with Excel

  • Not Scalable
    • While Excel is great for extracting HTML tables from the web, you would do well to remember that it is not a proper web scraping tool or solution. If you wish to scrape data for business insights, Excel is not hugely helpful because it cannot extract large quantities of data. In short, you cannot scale it up as you can with a proper web scraping tool.
  • Not able to scrape data that you want
    • When you use Excel, you can merely extract data which is in table format.
    • If you need to extract data which is not in table format, you need to find some other solution.
  • Other limitations
    • Moreover, you can’t get to use important features like Ajax pagination, infinite scrolling, deduplication etc. for web scraping when you use Excel for web scraping.

What’s the Solution???

  • You basically need a web scraping tool or service through which you can do web scraping in a scalable fashion and provide a feature like Ajax pagination, infinite scrolling, deduplication etc.
  • To sum it up, make sure you select the right kind of web scraping tool or service provider that provides good customer support and cloud-based infrastructure.