Web Scraping Project: Scraping Book Data with BeautifulSoup4

🌍 Overview

In this blog post, we will walk through building a real-world web scraping project using Python and BeautifulSoup4. Our goal is to scrape all books listed on Books to Scrape, extracting detailed information such as:

  • Title
  • Price
  • Rating (converted to number)
  • Stock availability
  • Product description
  • Category
  • UPC, Tax info, and more

By the end of this tutorial, you will have a fully working project that scrapes over 1000 books and saves the data into a CSV file for further analysis.


🔧 Prerequisites

Make sure you have Python installed, then install the necessary packages:

pip install requests beautifulsoup4

📂 Project Structure

book_scraper_pro/
├── main.py
├── scraper.py
└── books.csv

📃 scraper.py

This file contains all the logic for fetching, parsing, and extracting book data from the website.

import requests
from bs4 import BeautifulSoup
import time

BASE_URL = "http://books.toscrape.com/"

def get_soup(url):
    res = requests.get(url)
    if res.status_code != 200:
        return None
    return BeautifulSoup(res.text, 'html.parser')

def get_rating_as_number(class_list):
    ratings = {
        'One': 1, 'Two': 2, 'Three': 3,
        'Four': 4, 'Five': 5
    }
    for cls in class_list:
        if cls in ratings:
            return ratings[cls]
    return 0

def scrape_book_details(book_url):
    soup = get_soup(book_url)
    if not soup:
        return {}

    description = soup.find('meta', attrs={"name": "description"})
    description = description['content'].strip() if description else "No description."

    table = soup.find('table', class_='table table-striped')
    info = {row.th.get_text(): row.td.get_text() for row in table.find_all('tr')}

    category = soup.select_one('ul.breadcrumb li:nth-of-type(3) a').get_text()

    return {
        'description': description,
        'upc': info.get('UPC'),
        'product_type': info.get('Product Type'),
        'price_excl_tax': info.get('Price (excl. tax)'),
        'price_incl_tax': info.get('Price (incl. tax)'),
        'tax': info.get('Tax'),
        'availability': info.get('Availability'),
        'category': category
    }

def scrape_all_books():
    books = []
    next_page = "catalogue/page-1.html"

    while next_page:
        print(f"Scraping {BASE_URL + next_page}")
        soup = get_soup(BASE_URL + next_page)
        if not soup:
            break

        for book in soup.select('article.product_pod'):
            title = book.h3.a['title']
            price = book.select_one('.price_color').get_text()[1:]
            rating = get_rating_as_number(book.p['class'])
            rel_link = book.h3.a['href'].replace('../../../', 'catalogue/')
            book_url = BASE_URL + rel_link

            details = scrape_book_details(book_url)

            books.append({
                'title': title,
                'price': price,
                'rating': rating,
                'url': book_url,
                **details
            })

        next_btn = soup.select_one('li.next a')
        next_page = 'catalogue/' + next_btn['href'] if next_btn else None
        time.sleep(1)  # Be polite

    return books

📃 main.py

This file runs the scraping logic and saves the results to a CSV file.

import csv
from scraper import scrape_all_books

def save_books_csv(books, filename='books.csv'):
    keys = books[0].keys()
    with open(filename, mode='w', newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=keys)
        writer.writeheader()
        writer.writerows(books)

def main():
    print("Scraping all books from books.toscrape.com...")
    books = scrape_all_books()
    print(f"Total books scraped: {len(books)}")
    save_books_csv(books)
    print("Data saved to books.csv")

if __name__ == "__main__":
    main()

📊 Output Sample

The resulting books.csv will contain rows like:

titlepriceratingcategoryavailabilitydescription
A Light in the Attic51.773PoetryIn stockA book of poems and whimsical illustrations.

🤖 Tips and Next Steps

Here are some fun ways you can extend this project:

  • Filter books by price or rating
  • Export to JSON or Excel
  • Analyze trends with pandas
  • Build a simple web interface with Flask
  • Store in a SQLite or MongoDB database

🎓 Summary

In this tutorial, you built a complete web scraper using BeautifulSoup that extracts valuable book data from a real website. You structured your code cleanly, handled pagination, and saved the results into a CSV for analysis. This is an excellent foundation for more advanced data scraping and analysis projects.

Happy scraping! 🚀

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top