eBay is a well-known online platform that offers trading opportunities in a wide range of products among its registered users. In this guide, we’ll explain how to scrape data from eBay listing using Python. As such, we will be interested in details that are available from the listing itself as well as from going to each of the products in turn for more fine details.
To get started, make sure you have the following Python libraries installed:
Install these libraries using:
pip install requests lxml pandas
When searching for products on eBay, each page URL can be modified to navigate through paginated results. For example:
The _pgn parameter is used to navigate through multiple pages of listings, enabling the retrieval of extensive data. Let's begin the scraping process.
To begin, we’ll set up headers to mimic a real browser request, which helps avoid detection and potential blocking by eBay’s anti-bot measures. Then we’ll send a request to the listing page to gather the links for each product.
import requests
from lxml.html import fromstring
# Define headers to simulate a real browser
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-language': 'en-IN,en;q=0.9',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'priority': 'u=0, i',
'sec-ch-ua': '"Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Linux"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36',
}
# Request parameters for the search query
params = {
'_nkw': 'laptop',
}
# Send a request to the eBay listing page
listing_page_response = requests.get('https link', params=params, headers=headers)
listing_parser = fromstring(listing_page_response.text)
On the listing page, we’ll extract the URLs for individual products. This allows us to visit each product page to gather specific details, such as the product title, price, and more.
# Parse the listing page to extract product links
links = listing_parser.xpath('//div[@class="s-item__info clearfix"]/a[@_sp="p2351460.m1686.l7400"]/@href')
# Output a sample of the links found
print("Product Links:", links[:5]) # Display the first five product links
With the product URLs in hand, we’ll visit each product page and extract the following details:
Next, we’ll loop through each link and use XPath expressions to locate the required information on the product page.
product_data = []
for url in links:
# Send a request to the product page
product_page_response = requests.get(url, headers=headers)
product_parser = fromstring(product_page_response.text)
# Extract data using XPath
try:
product_title = product_parser.xpath('//h1[@class="x-item-title__mainTitle"]/span/text()')[0]
price = product_parser.xpath('//div[@data-testid="x-price-primary"]/span/text()')[0]
shipping_cost = product_parser.xpath('//div[@class="ux-labels-values col-12 ux-labels-values--shipping"]//div[@class="ux-labels-values__values-content"]/div/span/text()')[0]
product_condition = product_parser.xpath('//div[@class="x-item-condition-text"]/div/span/span[2]/text()')[0]
available_quantity = product_parser.xpath('//div[@class="x-quantity__availability"]/span/text()')[0]
sold_quantity = product_parser.xpath('//div[@class="x-quantity__availability"]/span/text()')[1]
payment_options = ', '.join(product_parser.xpath('//div[@class="ux-labels-values col-12 ux-labels-values__column-last-row ux-labels-values--payments"]/div[2]/div/div//span/@aria-label'))
return_policy = product_parser.xpath('//div[@class="ux-layout-section ux-layout-section--returns"]//div[@class="ux-labels-values__values-content"]/div/span/text()')[0]
# Store data in a dictionary
product_info = {
'Title': product_title,
'Price': price,
'Shipping Cost': shipping_cost,
'Condition': product_condition,
'Available Quantity': available_quantity,
'Sold Quantity': sold_quantity,
'Payment Options': payment_options,
'Return Policy': return_policy,
}
product_data.append(product_info)
except IndexError as e:
print(f"An error occurred: {e}")
After collecting the data, we can save it into a CSV file using Pandas.
import pandas as pd
# Convert data to DataFrame
df = pd.DataFrame(product_data)
# Save to CSV
df.to_csv('ebay_product_data.csv', index=False)
print("Data saved to ebay_product_data.csv")
eBay employs rate-limiting to prevent excessive requests. Here are a few methods to avoid detection:
By following these best practices, you can minimize the risk of getting blocked and continue scraping data efficiently.
Here’s the full code for scraping eBay data and saving it to a CSV file:
import requests
import random
from lxml.html import fromstring
import pandas as pd
useragents = ['Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36']
# Define headers for request
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-language': 'en-IN,en;q=0.9',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'priority': 'u=0, i',
'sec-ch-ua': '"Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Linux"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': random.choice(useragents),
}
# Search query parameters
params = {'_nkw': 'laptop'}
proxies = {
'http': 'IP:PORT',
'https': 'IP:PORT'
}
# Fetch the listing page
listing_page_response = requests.get('https://www.ebay.com/sch/i.html', params=params, headers=headers, proxies=proxies)
listing_parser = fromstring(listing_page_response.text)
links = listing_parser.xpath('//div[@class="s-item__info clearfix"]/a[@_sp="p2351460.m1686.l7400"]/@href')
# Extract product data
product_data = []
for url in links:
product_page_response = requests.get(url, headers=headers, proxies=proxies)
product_parser = fromstring(product_page_response.text)
try:
product_info = {
'Title': product_parser.xpath('//h1[@class="x-item-title__mainTitle"]/span/text()')[0],
'Price': product_parser.xpath('//div[@data-testid="x-price-primary"]/span/text()')[0],
'Shipping Cost': product_parser.xpath('//div[@class="ux-labels-values col-12 ux-labels-values--shipping"]//div[@class="ux-labels-values__values-content"]/div/span/text()')[0],
'Condition': product_parser.xpath('//div[@class="x-item-condition-text"]/div/span/span[2]/text()')[0],
'Available Quantity': product_parser.xpath('//div[@class="x-quantity__availability"]/span/text()')[0],
'Sold Quantity': product_parser.xpath('//div[@class="x-quantity__availability"]/span/text()')[1],
'Payment Options': ', '.join(product_parser.xpath('//div[@class="ux-labels-values col-12 ux-labels-values__column-last-row ux-labels-values--payments"]/div[2]/div/div//span/@aria-label')),
'Return Policy': product_parser.xpath('//div[@class="ux-layout-section ux-layout-section--returns"]//div[@class="ux-labels-values__values-content"]/div/span/text()')[0]
}
product_data.append(product_info)
except IndexError:
continue
# Save to CSV
df = pd.DataFrame(product_data)
df.to_csv('ebay_product_data.csv', index=False)
print("Data saved to ebay_product_data.csv")
Scraping eBay with Python allows for efficient data collection on products, pricing, and trends. In this guide, we covered scraping listings, handling pagination, setting headers, and using proxies to avoid detection. Remember to respect eBay’s terms of service by using responsible request intervals and proxy rotation. With these tools, you can now easily gather and analyze eBay data for market insights. Happy scraping!
Comments: 0