How to Scrape eBay Data Using Python

Comments: 0

eBay is a well-known online platform that offers trading opportunities in a wide range of products among its registered users. In this guide, we’ll explain how to scrape data from eBay listing using Python. As such, we will be interested in details that are available from the listing itself as well as from going to each of the products in turn for more fine details.

Requirements

To get started, make sure you have the following Python libraries installed:

  • Requests: For making HTTP requests.
  • lxml: For parsing HTML content.
  • Pandas: For saving data to a CSV file.

Install these libraries using:


pip install requests lxml pandas

Understanding the eBay URL Structure for Pagination

When searching for products on eBay, each page URL can be modified to navigate through paginated results. For example:

  • Page 1: https://www.ebay.com/sch/i.html?_nkw=laptop
  • Page 2: https://www.ebay.com/sch/i.html?_nkw=laptop&_pgn=2

The _pgn parameter is used to navigate through multiple pages of listings, enabling the retrieval of extensive data. Let's begin the scraping process.

Step 1: Sending requests to eBay

To begin, we’ll set up headers to mimic a real browser request, which helps avoid detection and potential blocking by eBay’s anti-bot measures. Then we’ll send a request to the listing page to gather the links for each product.


import requests
from lxml.html import fromstring

# Define headers to simulate a real browser
headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'accept-language': 'en-IN,en;q=0.9',
    'cache-control': 'no-cache',
    'dnt': '1',
    'pragma': 'no-cache',
    'priority': 'u=0, i',
    'sec-ch-ua': '"Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Linux"',
    'sec-fetch-dest': 'document',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-site': 'none',
    'sec-fetch-user': '?1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36',
}

# Request parameters for the search query
params = {
    '_nkw': 'laptop',
}

# Send a request to the eBay listing page
listing_page_response = requests.get('https link', params=params, headers=headers)
listing_parser = fromstring(listing_page_response.text)

Step 2: Parsing the listing page

On the listing page, we’ll extract the URLs for individual products. This allows us to visit each product page to gather specific details, such as the product title, price, and more.


# Parse the listing page to extract product links
links = listing_parser.xpath('//div[@class="s-item__info clearfix"]/a[@_sp="p2351460.m1686.l7400"]/@href')

# Output a sample of the links found
print("Product Links:", links[:5])  # Display the first five product links

Step 3: Scraping product data

With the product URLs in hand, we’ll visit each product page and extract the following details:

  • Product title;
  • Price;
  • Shipping cost;
  • Product condition;
  • Available quantity;
  • Sold quantity;
  • Payment options;
  • Return policy.

Next, we’ll loop through each link and use XPath expressions to locate the required information on the product page.


product_data = []

for url in links:
    # Send a request to the product page
    product_page_response = requests.get(url, headers=headers)
    product_parser = fromstring(product_page_response.text)
    
    # Extract data using XPath
    try:
        product_title = product_parser.xpath('//h1[@class="x-item-title__mainTitle"]/span/text()')[0]
        price = product_parser.xpath('//div[@data-testid="x-price-primary"]/span/text()')[0]
        shipping_cost = product_parser.xpath('//div[@class="ux-labels-values col-12 ux-labels-values--shipping"]//div[@class="ux-labels-values__values-content"]/div/span/text()')[0]
        product_condition = product_parser.xpath('//div[@class="x-item-condition-text"]/div/span/span[2]/text()')[0]
        available_quantity = product_parser.xpath('//div[@class="x-quantity__availability"]/span/text()')[0]
        sold_quantity = product_parser.xpath('//div[@class="x-quantity__availability"]/span/text()')[1]
        payment_options = ', '.join(product_parser.xpath('//div[@class="ux-labels-values col-12 ux-labels-values__column-last-row ux-labels-values--payments"]/div[2]/div/div//span/@aria-label'))
        return_policy = product_parser.xpath('//div[@class="ux-layout-section ux-layout-section--returns"]//div[@class="ux-labels-values__values-content"]/div/span/text()')[0]
        
        # Store data in a dictionary
        product_info = {
            'Title': product_title,
            'Price': price,
            'Shipping Cost': shipping_cost,
            'Condition': product_condition,
            'Available Quantity': available_quantity,
            'Sold Quantity': sold_quantity,
            'Payment Options': payment_options,
            'Return Policy': return_policy,
        }
        product_data.append(product_info)
    
    except IndexError as e:
        print(f"An error occurred: {e}")

Step 4: Saving data to a CSV file

After collecting the data, we can save it into a CSV file using Pandas.


import pandas as pd

# Convert data to DataFrame
df = pd.DataFrame(product_data)

# Save to CSV
df.to_csv('ebay_product_data.csv', index=False)
print("Data saved to ebay_product_data.csv")

Handling rate-limiting and bypassing detection on eBay

eBay employs rate-limiting to prevent excessive requests. Here are a few methods to avoid detection:

  • Use Proxies: Rotate between different IP addresses.
  • Adjust Request Intervals: Implement delays between requests.
  • Randomize User Agents: Vary the user-agent string to avoid detection.

By following these best practices, you can minimize the risk of getting blocked and continue scraping data efficiently.

Complete code

Here’s the full code for scraping eBay data and saving it to a CSV file:


import requests
import random
from lxml.html import fromstring
import pandas as pd

useragents = ['Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36',
             'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36',
             'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36']

# Define headers for request
headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'accept-language': 'en-IN,en;q=0.9',
    'cache-control': 'no-cache',
    'dnt': '1',
    'pragma': 'no-cache',
    'priority': 'u=0, i',
    'sec-ch-ua': '"Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Linux"',
    'sec-fetch-dest': 'document',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-site': 'none',
    'sec-fetch-user': '?1',
    'upgrade-insecure-requests': '1',
    'user-agent': random.choice(useragents),
}

# Search query parameters
params = {'_nkw': 'laptop'}
proxies = {
    'http': 'IP:PORT',
    'https': 'IP:PORT'
}

# Fetch the listing page
listing_page_response = requests.get('https://www.ebay.com/sch/i.html', params=params, headers=headers, proxies=proxies)
listing_parser = fromstring(listing_page_response.text)
links = listing_parser.xpath('//div[@class="s-item__info clearfix"]/a[@_sp="p2351460.m1686.l7400"]/@href')

# Extract product data
product_data = []
for url in links:
    product_page_response = requests.get(url, headers=headers, proxies=proxies)
    product_parser = fromstring(product_page_response.text)
    try:
        product_info = {
            'Title': product_parser.xpath('//h1[@class="x-item-title__mainTitle"]/span/text()')[0],
            'Price': product_parser.xpath('//div[@data-testid="x-price-primary"]/span/text()')[0],
            'Shipping Cost': product_parser.xpath('//div[@class="ux-labels-values col-12 ux-labels-values--shipping"]//div[@class="ux-labels-values__values-content"]/div/span/text()')[0],
            'Condition': product_parser.xpath('//div[@class="x-item-condition-text"]/div/span/span[2]/text()')[0],
            'Available Quantity': product_parser.xpath('//div[@class="x-quantity__availability"]/span/text()')[0],
            'Sold Quantity': product_parser.xpath('//div[@class="x-quantity__availability"]/span/text()')[1],
            'Payment Options': ', '.join(product_parser.xpath('//div[@class="ux-labels-values col-12 ux-labels-values__column-last-row ux-labels-values--payments"]/div[2]/div/div//span/@aria-label')),
            'Return Policy': product_parser.xpath('//div[@class="ux-layout-section ux-layout-section--returns"]//div[@class="ux-labels-values__values-content"]/div/span/text()')[0]
        }
        product_data.append(product_info)
    except IndexError:
        continue

# Save to CSV
df = pd.DataFrame(product_data)
df.to_csv('ebay_product_data.csv', index=False)
print("Data saved to ebay_product_data.csv")

Scraping eBay with Python allows for efficient data collection on products, pricing, and trends. In this guide, we covered scraping listings, handling pagination, setting headers, and using proxies to avoid detection. Remember to respect eBay’s terms of service by using responsible request intervals and proxy rotation. With these tools, you can now easily gather and analyze eBay data for market insights. Happy scraping!

Comments:

0 comments