🥣 Getting Started with BeautifulSoup4

Welcome back! 👋 In the previous post, we learned the basics of BeautifulSoup4 and wrote a small program to extract links from a simple web page.

Now, let’s take it up a notch.

In this post, we’ll:

Download a real website’s HTML
Extract meaningful data (like article titles)
Print it in a clean, readable format

This will give you practical web scraping skills you can build on. Let’s go! 🚀

🧰 What We’ll Be Scraping

For this demo, we’ll use a real, beginner-friendly site: https://quotes.toscrape.com

This website is specifically made for practicing web scraping, so it’s legal and safe to use!

We’ll extract:

Quote text
Author name

🧪 Full Code Example

Here’s the complete code, followed by a breakdown of what each part does:

import requests
from bs4 import BeautifulSoup

# Step 1: Download the webpage
url = "https://quotes.toscrape.com"
response = requests.get(url)

# Step 2: Parse the HTML
soup = BeautifulSoup(response.text, 'html.parser')

# Step 3: Find all quote containers
quote_blocks = soup.find_all('div', class_='quote')

# Step 4: Loop through each quote and extract text and author
for quote in quote_blocks:
    text = quote.find('span', class_='text').get_text()
    author = quote.find('small', class_='author').get_text()
    print(f'"{text}" — {author}')

🧱 Code Breakdown

🧩 1. Import the Libraries

import requests
from bs4 import BeautifulSoup

We need:

requests to fetch the website’s HTML
BeautifulSoup to parse and search the HTML content

🌍 2. Fetch the Web Page

url = "https://quotes.toscrape.com"
response = requests.get(url)

We set the target URL
requests.get(url) downloads the page
response.text contains the raw HTML

🧹 3. Parse the HTML

soup = BeautifulSoup(response.text, 'html.parser')

This line gives us a BeautifulSoup object (soup) to work with. Think of it as a structured version of the raw HTML.

🔍 4. Find Quote Containers

quote_blocks = soup.find_all('div', class_='quote')

Each quote on the page is inside a <div> with the class quote. This line finds all such blocks.

🔧 5. Extract Quote and Author

for quote in quote_blocks:
    text = quote.find('span', class_='text').get_text()
    author = quote.find('small', class_='author').get_text()
    print(f'"{text}" — {author}')

Let’s break this down:

Loop through each quote block
Use .find() to get the quote text and author
.get_text() extracts the actual text content
Finally, print the quote and author nicely formatted

📦 Example Output

When you run the script, you’ll see something like:

“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.” — Albert Einstein
“It is our choices, Harry, that show what we truly are, far more than our abilities.” — J.K. Rowling
...

Beautiful, right? 😄

💡 Bonus Tip: Viewing the HTML Structure

To understand what to extract, always inspect the page using your browser’s Developer Tools (right-click → Inspect). Look at the HTML tags and class names.

⚠️ Friendly Reminder

Only scrape sites you have permission to scrape.
Be respectful: don’t overload servers with too many requests.
Use time.sleep() between requests if scraping multiple pages.

✅ What’s Next?

You’ve now learned to:

Scrape a real website
Extract specific data
Print it in a readable format

Getting Started with BeautifulSoup4 – Part 2: Extracting Real Data from a Website