Using Python and Playwright to scrape data from Google Trends enables a detailed examination of keyword popularity and the monitoring of trend shifts over time. This approach delivers crucial insights for marketing analytics.
Before diving into the code, ensure you have the following tools installed:
You can install Playwright using pip:
pip install playwright
For using Playwright with asynchronous code, you’ll also need the asyncio library, which is included in Python 3.7+ by default.
We'll use Playwright, a powerful browser automation tool, to navigate the Google Trends website and download CSV files containing trend data. This tutorial will guide you through the entire process.
First, ensure Playwright is installed:
playwright install
If you don’t want to install all the browsers you just use this command to install chromium browser only.
playwright install chromium
When scraping platforms like Google, which actively counter bot activity, using proxies is essential. Proxies enable IP rotation, helping to reduce the risk of getting blocked. In our script, we utilize private proxies to route our requests.
proxy = {
"server": "IP:PORT",
"username": "your_username",
"password": "your_password"
}
Replace the variables IP, PORT, username, and password with the actual data from your proxy server.
In this example, we first navigate to google.com to bypass any potential blocks before heading to the Google Trends page. This is done to mimic normal user behavior and avoid detection.
This step involves preliminary actions to prevent being flagged and blocked by Google:
import asyncio
from playwright.async_api import Playwright, async_playwright
async def run(playwright: Playwright) -> None:
# Launching the browser with proxy settings
browser = await playwright.chromium.launch(headless=False, proxy={
"server": "IP:PORT",
"username": "your_username",
"password": "your_password"
})
# Creating a new browser context
context = await browser.new_context()
# Opening a new page
page = await context.new_page()
# Visiting Google to mimic normal browsing
await page.goto("https://google.com")
Next, navigate directly to the Google Trends page where the required data is located. Google Trends provides options for downloading the data directly in CSV format, which facilitates the extraction process. Automate the action of clicking the "Download" button to begin the data download. This allows for the extraction of trend data without manual intervention. Once the “Download” button becomes visible, the automation should proceed to click it, initiating the download of the CSV file that contains the needed trend data.
# Navigating to Google Trends
await page.goto("https://trends.google.com/trends/explore?q=%2Fg%2F11bc6c__s2&date=now%201-d&geo=US&hl=en-US")
# Waiting for the download button and clicking it
async with page.expect_download() as download_info:
await page.get_by_role("button", name="file_download").first.click()
# Handling the download
download = await download_info.value
print(download.suggested_filename)
The downloaded CSV file is automatically saved in a specified directory on your local device.
# Saving the downloaded file
await download.save_as("/path/to/save/" + download.suggested_filename)
Here’s the complete code for downloading Google Trends data as a CSV file using Playwright:
import asyncio
import os
import re
from playwright.async_api import Playwright, async_playwright
async def run(playwright: Playwright) -> None:
# Launch browser with proxy settings
browser = await playwright.chromium.launch(headless=False, proxy={
"server": "IP:PORT",
"username": "your_username",
"password": "your_password"
})
# Create a new browser context
context = await browser.new_context()
# Open a new page
page = await context.new_page()
# Visit Google to avoid detection
await page.goto("https://google.com")
# Navigate to Google Trends
await page.goto("https://trends.google.com/trends/explore?q=%2Fg%2F11bc6c__s2&date=now%201-d&geo=US&hl=en-US")
# Click the download button
async with page.expect_download() as download_info:
await page.get_by_role("button", name=re.compile(r"file_download")).first.click()
# Save the downloaded file
download = await download_info.value
destination_path = os.path.join("path/to/save", download.suggested_filename)
await download.save_as(destination_path)
# Close the context and browser
await context.close()
await browser.close()
async def main() -> None:
async with async_playwright() as playwright:
await run(playwright)
asyncio.run(main())
Following this guide, you can efficiently download trend data, manage proxy rotation, and bypass bot protection mechanisms. For effective blocking avoidance, using reliable proxy servers is crucial. Residential proxies, which offer dynamic IP addresses and don't need rotation configuration, are highly recommended. Alternatively, static ISP proxies are also effective; purchase the required number of IPs and set up regular IP rotation in your script. Either choice ensures minimal risk of blocking and captcha, facilitating faster and smoother data scraping.
Comments: 0