Scraping data from Google Maps using Python allows for the collection of valuable information about locations, businesses, and services, which is beneficial for market analysis, identifying optimal new venue locations, maintaining current directories, competitor analysis, and gauging the popularity of places. This guide provides a comprehensive walkthrough on how to extract information from Google Maps utilizing the Python libraries requests and lxml. It includes detailed instructions on making requests, handling responses, parsing structured data, and exporting it to a CSV file.
Ensure you have the following Python libraries installed:
Install these libraries using pip if needed:
pip install requests
pip install lxml
Below, we'll present a step-by-step process of scraping, complete with examples.
In the following sections, we will walk through a detailed step-by-step process for scraping data from Google Maps, complete with visual examples to guide you through each stage.
Specify the URL from which you want to scrape data.
url = "https link"
Setting up appropriate headers is crucial for mimicking the activities of a genuine user, significantly reducing the chances of the scraper being flagged as a bot. Additionally, integrating proxy servers helps maintain continuous scraping activities by circumventing any blocks that might arise from exceeding the request limits associated with a single IP address.
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-language': 'en-IN,en;q=0.9',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'priority': 'u=0, i',
'sec-ch-ua': '"Not)A;Brand";v="99", "Google Chrome";v="127", "Chromium";v="127"',
'sec-ch-ua-arch': '"x86"',
'sec-ch-ua-bitness': '"64"',
'sec-ch-ua-full-version-list': '"Not)A;Brand";v="99.0.0.0", "Google Chrome";v="127.0.6533.72", "Chromium";v="127.0.6533.72"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-model': '""',
'sec-ch-ua-platform': '"Linux"',
'sec-ch-ua-platform-version': '"6.5.0"',
'sec-ch-ua-wow64': '?0',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36',
}
proxies = {
"http": "http://username:password@your_proxy_ip:port",
"https": "https://username:password@your_proxy_ip:port",
}
Send a request to the Google Maps URL and get the page content:
import requests
response = requests.get(url, headers=headers, proxies=proxies)
if response.status_code == 200:
page_content = response.content
else:
print(f"Failed to retrieve the page. Status code: {response.status_code}")
Use lxml to parse the HTML content:
from lxml import html
parser = html.fromstring(page_content)
Understanding the structure of the HTML document is crucial for extracting data correctly. You need to identify the XPath expressions for the data points you want to scrape. Here’s how you can do it:
Restaurant Name:
//div[@jscontroller="AtSb"]/div/div/div/a/div/div/div/span[@class="OSrXXb"]/text()
Address:
//div[@jscontroller="AtSb"]/div/div/div/a/div/div/div[2]/text()
Options:
= ', '.join(result.xpath('.//div[@jscontroller="AtSb"]/div/div/div/a/div/div/div[4]/div/span/span[1]//text()'))
Geo Latitude:
//div[@jscontroller="AtSb"]/div/@data-lat
Geo Longitude:
//div[@jscontroller="AtSb"]/div/@data-lng
Extract the data using the identified XPaths:
results = parser.xpath('//div[@jscontroller="AtSb"]')
data = []
for result in results:
restaurant_name = result.xpath('.//div/div/div/a/div/div/div/span[@class="OSrXXb"]/text()')[0]
address = result.xpath('.//div/div/div/a/div/div/div[2]/text()')[0]
options = ', '.join(result.xpath('.//div/div/div/a/div/div/div[4]/div/span/span[1]//text()'))
geo_latitude = result.xpath('.//div/@data-lat')[0]
geo_longitude = result.xpath('.//div/@data-lng')[0]
# Append to data list
data.append({
"restaurant_name": restaurant_name,
"address": address,
"options": options,
"geo_latitude": geo_latitude,
"geo_longitude": geo_longitude
})
Save the extracted data to a CSV file:
import csv
with open("google_maps_data.csv", "w", newline='', encoding='utf-8') as csv_file:
writer = csv.DictWriter(csv_file, fieldnames=["restaurant_name", "address", "options", "geo_latitude", "geo_longitude"])
writer.writeheader()
for entry in data:
writer.writerow(entry)
Here is the complete code for scraping Google Maps data:
import requests
from lxml import html
import csv
# Define the target URL and headers
url = "https://www.google.com/search?sca_esv=04f11db33f1535fb&sca_upv=1&tbs=lf:1,lf_ui:4&tbm=lcl&sxsrf=ADLYWIIFVlh6WQCV6I2gi1yj8ZyvZgLiRA:1722843868819&q=google+map+restaurants+near+me&rflfq=1&num=10&sa=X&ved=2ahUKEwjSs7fGrd2HAxWh1DgGHbLODasQjGp6BAgsEAE&biw=1920&bih=919&dpr=1"
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-language': 'en-IN,en;q=0.9',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'priority': 'u=0, i',
'sec-ch-ua': '"Not)A;Brand";v="99", "Google Chrome";v="127", "Chromium";v="127"',
'sec-ch-ua-arch': '"x86"',
'sec-ch-ua-bitness': '"64"',
'sec-ch-ua-full-version-list': '"Not)A;Brand";v="99.0.0.0", "Google Chrome";v="127.0.6533.72", "Chromium";v="127.0.6533.72"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-model': '""',
'sec-ch-ua-platform': '"Linux"',
'sec-ch-ua-platform-version': '"6.5.0"',
'sec-ch-ua-wow64': '?0',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36',
}
proxies = {
"http": "http://username:password@your_proxy_ip:port",
"https": "https://username:password@your_proxy_ip:port",
}
# Fetch the page content
response = requests.get(url, headers=headers, proxies=proxies)
if response.status_code == 200:
page_content = response.content
else:
print(f"Failed to retrieve the page. Status code: {response.status_code}")
exit()
# Parse the HTML content
parser = html.fromstring(page_content)
# Extract data using XPath
results = parser.xpath('//div[@jscontroller="AtSb"]')
data = []
for result in results:
restaurant_name = result.xpath('.//div/div/div/a/div/div/div/span[@class="OSrXXb"]/text()')[0]
address = result.xpath('.//div/div/div/a/div/div/div[2]/text()')[0]
options = ', '.join(result.xpath('.//div/div/div/a/div/div/div[4]/div/span/span[1]//text()'))
geo_latitude = result.xpath('.//div/@data-lat')[0]
geo_longitude = result.xpath('.//div/@data-lng')[0]
# Append to data list
data.append({
"restaurant_name": restaurant_name,
"address": address,
"options": options,
"geo_latitude": geo_latitude,
"geo_longitude": geo_longitude
})
# Save data to CSV
with open("google_maps_data.csv", "w", newline='', encoding='utf-8') as csv_file:
writer = csv.DictWriter(csv_file, fieldnames=["restaurant_name", "address", "options", "geo_latitude", "geo_longitude"])
writer.writeheader()
for entry in data:
writer.writerow(entry)
print("Data has been successfully scraped and saved to google_maps_data.csv.")
For effective web scraping, it's crucial to use the right request headers and proxies. The optimal proxy choices are data center or ISP proxies, which offer high speeds and low latency. However, since these are static proxies, implementing IP rotation is necessary to prevent blocking effectively. An alternative and more user-friendly option is to use residential proxies. These dynamic proxies simplify the rotation process and have a higher trust factor, making them more effective at circumventing blocks.
Comments: 0