Guide to scraping Google Maps data with Python

Comments: 0

Scraping data from Google Maps using Python allows for the collection of valuable information about locations, businesses, and services, which is beneficial for market analysis, identifying optimal new venue locations, maintaining current directories, competitor analysis, and gauging the popularity of places. This guide provides a comprehensive walkthrough on how to extract information from Google Maps utilizing the Python libraries requests and lxml. It includes detailed instructions on making requests, handling responses, parsing structured data, and exporting it to a CSV file.

Setting up your environment

Ensure you have the following Python libraries installed:

  • requests;
  • lxml;
  • csv (standard library).

Install these libraries using pip if needed:


pip install requests
pip install lxml

Below, we'll present a step-by-step process of scraping, complete with examples.

Step-by-step Guide to scraping data from Google Maps

In the following sections, we will walk through a detailed step-by-step process for scraping data from Google Maps, complete with visual examples to guide you through each stage.

Step 1. Define the target URL

Specify the URL from which you want to scrape data.


url = "https link"

Step 2. Define headers and proxies

Setting up appropriate headers is crucial for mimicking the activities of a genuine user, significantly reducing the chances of the scraper being flagged as a bot. Additionally, integrating proxy servers helps maintain continuous scraping activities by circumventing any blocks that might arise from exceeding the request limits associated with a single IP address.


headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'accept-language': 'en-IN,en;q=0.9',
    'cache-control': 'no-cache',
    'dnt': '1',
    'pragma': 'no-cache',
    'priority': 'u=0, i',
    'sec-ch-ua': '"Not)A;Brand";v="99", "Google Chrome";v="127", "Chromium";v="127"',
    'sec-ch-ua-arch': '"x86"',
    'sec-ch-ua-bitness': '"64"',
    'sec-ch-ua-full-version-list': '"Not)A;Brand";v="99.0.0.0", "Google Chrome";v="127.0.6533.72", "Chromium";v="127.0.6533.72"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-model': '""',
    'sec-ch-ua-platform': '"Linux"',
    'sec-ch-ua-platform-version': '"6.5.0"',
    'sec-ch-ua-wow64': '?0',
    'sec-fetch-dest': 'document',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-site': 'none',
    'sec-fetch-user': '?1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36',
}

proxies = {
    "http": "http://username:password@your_proxy_ip:port",
    "https": "https://username:password@your_proxy_ip:port",
}

Step 3. Fetch the page content

Send a request to the Google Maps URL and get the page content:


import requests

response = requests.get(url, headers=headers, proxies=proxies)
if response.status_code == 200:
    page_content = response.content
else:
    print(f"Failed to retrieve the page. Status code: {response.status_code}")

Step 4. Parse the HTML content

Use lxml to parse the HTML content:


from lxml import html

parser = html.fromstring(page_content)

Identifying the data XPaths

Understanding the structure of the HTML document is crucial for extracting data correctly. You need to identify the XPath expressions for the data points you want to scrape. Here’s how you can do it:

  1. Inspect the Web Page: Open the Google Maps page in a web browser and use the browser’s developer tools (right-click > Inspect) to examine the HTML structure.
  2. Find the Relevant Elements: Look for the HTML elements that contain the data you want to scrape (e.g., restaurant names, addresses).
  3. Write XPaths: Based on the HTML structure, write XPath expressions to extract the data. For this guide, the XPaths are:

Restaurant Name:


//div[@jscontroller="AtSb"]/div/div/div/a/div/div/div/span[@class="OSrXXb"]/text()

Address:


 //div[@jscontroller="AtSb"]/div/div/div/a/div/div/div[2]/text()

Options:


 = ', '.join(result.xpath('.//div[@jscontroller="AtSb"]/div/div/div/a/div/div/div[4]/div/span/span[1]//text()'))

Geo Latitude:


//div[@jscontroller="AtSb"]/div/@data-lat

Geo Longitude:


 //div[@jscontroller="AtSb"]/div/@data-lng

Step 5. Extracting data

Extract the data using the identified XPaths:


results = parser.xpath('//div[@jscontroller="AtSb"]')
data = []

for result in results:
    restaurant_name = result.xpath('.//div/div/div/a/div/div/div/span[@class="OSrXXb"]/text()')[0]
    address = result.xpath('.//div/div/div/a/div/div/div[2]/text()')[0]
    options = ', '.join(result.xpath('.//div/div/div/a/div/div/div[4]/div/span/span[1]//text()'))
    geo_latitude = result.xpath('.//div/@data-lat')[0]
    geo_longitude = result.xpath('.//div/@data-lng')[0]

    # Append to data list
    data.append({
        "restaurant_name": restaurant_name,
        "address": address,
        "options": options,
        "geo_latitude": geo_latitude,
        "geo_longitude": geo_longitude
    })

Step 6. Save data to CSV

Save the extracted data to a CSV file:


import csv

with open("google_maps_data.csv", "w", newline='', encoding='utf-8') as csv_file:
    writer = csv.DictWriter(csv_file, fieldnames=["restaurant_name", "address", "options", "geo_latitude", "geo_longitude"])
    writer.writeheader()
    for entry in data:
        writer.writerow(entry)

Complete code

Here is the complete code for scraping Google Maps data:


import requests
from lxml import html
import csv

# Define the target URL and headers
url = "https://www.google.com/search?sca_esv=04f11db33f1535fb&sca_upv=1&tbs=lf:1,lf_ui:4&tbm=lcl&sxsrf=ADLYWIIFVlh6WQCV6I2gi1yj8ZyvZgLiRA:1722843868819&q=google+map+restaurants+near+me&rflfq=1&num=10&sa=X&ved=2ahUKEwjSs7fGrd2HAxWh1DgGHbLODasQjGp6BAgsEAE&biw=1920&bih=919&dpr=1"
headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'accept-language': 'en-IN,en;q=0.9',
    'cache-control': 'no-cache',
    'dnt': '1',
    'pragma': 'no-cache',
    'priority': 'u=0, i',
    'sec-ch-ua': '"Not)A;Brand";v="99", "Google Chrome";v="127", "Chromium";v="127"',
    'sec-ch-ua-arch': '"x86"',
    'sec-ch-ua-bitness': '"64"',
    'sec-ch-ua-full-version-list': '"Not)A;Brand";v="99.0.0.0", "Google Chrome";v="127.0.6533.72", "Chromium";v="127.0.6533.72"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-model': '""',
    'sec-ch-ua-platform': '"Linux"',
    'sec-ch-ua-platform-version': '"6.5.0"',
    'sec-ch-ua-wow64': '?0',
    'sec-fetch-dest': 'document',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-site': 'none',
    'sec-fetch-user': '?1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36',
}
proxies = {
    "http": "http://username:password@your_proxy_ip:port",
    "https": "https://username:password@your_proxy_ip:port",
}

# Fetch the page content
response = requests.get(url, headers=headers, proxies=proxies)
if response.status_code == 200:
    page_content = response.content
else:
    print(f"Failed to retrieve the page. Status code: {response.status_code}")
    exit()

# Parse the HTML content
parser = html.fromstring(page_content)

# Extract data using XPath
results = parser.xpath('//div[@jscontroller="AtSb"]')
data = []

for result in results:
    restaurant_name = result.xpath('.//div/div/div/a/div/div/div/span[@class="OSrXXb"]/text()')[0]
    address = result.xpath('.//div/div/div/a/div/div/div[2]/text()')[0]
    options = ', '.join(result.xpath('.//div/div/div/a/div/div/div[4]/div/span/span[1]//text()'))
    geo_latitude = result.xpath('.//div/@data-lat')[0]
    geo_longitude = result.xpath('.//div/@data-lng')[0]

    # Append to data list
    data.append({
        "restaurant_name": restaurant_name,
        "address": address,
        "options": options,
        "geo_latitude": geo_latitude,
        "geo_longitude": geo_longitude
    })

# Save data to CSV
with open("google_maps_data.csv", "w", newline='', encoding='utf-8') as csv_file:
    writer = csv.DictWriter(csv_file, fieldnames=["restaurant_name", "address", "options", "geo_latitude", "geo_longitude"])
    writer.writeheader()
    for entry in data:
        writer.writerow(entry)

print("Data has been successfully scraped and saved to google_maps_data.csv.")

For effective web scraping, it's crucial to use the right request headers and proxies. The optimal proxy choices are data center or ISP proxies, which offer high speeds and low latency. However, since these are static proxies, implementing IP rotation is necessary to prevent blocking effectively. An alternative and more user-friendly option is to use residential proxies. These dynamic proxies simplify the rotation process and have a higher trust factor, making them more effective at circumventing blocks.

Comments:

0 comments