ga
English
Español
中國人
Tiếng Việt
Deutsch
Українська
Português
Français
भारतीय
Türkçe
한국인
Italiano
اردو
Indonesia
Polski Is féidir le scriobadh léirmheasanna Amazon le Python a bheith úsáideach — cibé acu atá tú ag anailísiú iomaitheoirí, ag seiceáil cad atá custaiméirí fíor ag rá, nó ag tochailt isteach i dtreochtaí margaidh. Má tá tú ag fiafraí conas léirmheasanna Amazon a scríobadh ag úsáid Python, siúlann an teagasc gearr seo thú tríd an bpróiseas praiticiúil a úsáideann an pacáiste Requests agus BeautifulSoup chun ábhar léirmheasa a fháil go clárúil.
Sula ndéanann tú aon rud eile, beidh ort cúpla leabharlann a shuiteáil. Is féidir an dá spleáchas lárnacha, Requests le haghaidh glaonna líonra agus BeautifulSoup le haghaidh dul tríd an gcrann HTML, a fháil le líne amháin sa teirminéal:
pip install requests
pip install beautifulsoup4
Dírimid ar léirmheasanna Amazon ag úsáid Python agus scrúdóimid gach céim den phróiseas scriobtha céim ar chéim.
Tá sé riachtanach struchtúr HTML an láithreáin a thuiscint chun eilimintí léirmheas a aithint. Cuimsíonn an rannóg léirmheasanna réimsí mar ainm an léirmheastóra, rátáil réalta agus trácht scríofa; caithfear iad a aimsiú trí uirlisí iniúchta an bhrabhsálaí.
Teideal an táirge agus URL:
Rátáil iomlán:
Rannóg na léirmheasanna:
Ainm an údair:
Rátáil:
Tráchtaireacht:
Tá ról tábhachtach ag na ceannlíneacha. Socraítear snáitheanna User-Agent agus ceannlínte eile chun tú a shamhlaíonn mar bhrabhsálaí rialta agus chun an seans go mbeidh tú faoi deara a laghdú. Má tá tú ag iarraidh é seo a dhéanamh i gceart, taispeánfaidh treoir scríobtha Amazon le Python conas na ceannlínte seo a shocrú agus conas proxies a úsáid chun do iarratais a choinneáil réidh agus níos lú intuigthe.
Ligeann proxies athrú IP chun an riosca bacanna agus na teorainneacha ráta a laghdú. Tá siad thar a bheith tábhachtach do scriobha ar scála mór.
Trí cheannlínte éagsúla a áireamh mar Accept-Encoding, Accept-Language, Referer, Connection, agus Upgrade-Insecure-Requests déanann tú iarratas cosúil le hiarratas brabhsálaí dlisteanach, rud a laghdaíonn an seans go gcuirfear tú marcáilte mar bot.
import requests
url = "https://www.amazon.com/Portable-Mechanical-Keyboard-MageGee-Backlit/product-reviews/B098LG3N6R/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews"
# Example of a proxy provided by the proxy service
proxy = {
'http': 'http://your_proxy_ip:your_proxy_port',
'https': 'https://your_proxy_ip:your_proxy_port'
}
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'sec-ch-ua': '"Not/A)Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
'sec-ch-ua-mobile': '?0',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'same-origin',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
}
# Send HTTP GET request to the URL with headers and proxy
try:
response = requests.get(url, headers=headers, proxies=proxy, timeout=10)
response.raise_for_status() # Raise an error if the request failed
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
Tar éis don leathanach luchtú, athraíonn BeautifulSoup an HTML amh ina chrann inchuardaithe. Ón struchtúr sin, glacann an scraper na naisc chanónacha táirge, teidil leathanaigh, agus aon chomhiomláin rátála infheicthe.
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
# Extracting common product details
product_url = soup.find('a', {'data-hook': 'product-link'}).get('href', '')
product_title = soup.find('a', {'data-hook': 'product-link'}).get_text(strip=True)
total_rating = soup.find('span', {'data-hook': 'rating-out-of-text'}).get_text(strip=True)
Téimid ar ais go dtí an struchtúr HTML céanna, ach an uair seo dírímid ar ainmneacha na léirmheastóirí, rátálacha réalta, agus tráchtanna scríofa a bhailiú — déantar gach rud seo le Python chun léirmheasanna Amazon a scríobadh go héifeachtach ag úsáid roghnóirí réamhshainithe.
reviews = []
review_elements = soup.find_all('div', {'data-hook': 'review'})
for review in review_elements:
author_name = review.find('span', class_='a-profile-name').get_text(strip=True)
rating_given = review.find('i', class_='review-rating').get_text(strip=True)
comment = review.find('span', class_='review-text').get_text(strip=True)
reviews.append({
'Product URL': product_url,
'Product Title': product_title,
'Total Rating': total_rating,
'Author': author_name,
'Rating': rating_given,
'Comment': comment,
})
Is féidir le csv.writer ionsuite Python na sonraí léirmheasa bailithe a shábháil i gcomhad .csv le haghaidh anailíse níos déanaí.
import csv
# Define CSV file path
csv_file = 'amazon_reviews.csv'
# Define CSV fieldnames
fieldnames = ['Product URL', 'Product Title', 'Total Rating', 'Author', 'Rating', 'Comment']
# Writing data to CSV file
with open(csv_file, mode='w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
for review in reviews:
writer.writerow(review)
print(f"Data saved to {csv_file}")
Cuirtear bloc cód i láthair a nascann le chéile na céimeanna tógála iarratais, parsála agus aschuir comhaid, ag clúdach an phróisis scríobtha iomláin i script amháin inrite:
import requests
from bs4 import BeautifulSoup
import csv
import urllib3
urllib3.disable_warnings()
# URL of the Amazon product reviews page
url = "https://www.amazon.com/Portable-Mechanical-Keyboard-MageGee-Backlit/product-reviews/B098LG3N6R/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews"
# Proxy provided by the proxy service with IP-authorization
path_proxy = 'your_proxy_ip:your_proxy_port'
proxy = {
'http': f'http://{path_proxy}',
'https': f'https://{path_proxy}'
}
# Headers for the HTTP request
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'sec-ch-ua': '"Not/A)Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
'sec-ch-ua-mobile': '?0',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'same-origin',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
}
# Send HTTP GET request to the URL with headers and handle exceptions
try:
response = requests.get(url, headers=headers, timeout=10, proxies=proxy, verify=False)
response.raise_for_status() # Raise an error if the request failed
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
# Use BeautifulSoup to parse the HTML and grab the data you need
soup = BeautifulSoup(response.content, 'html.parser')
# Extracting common product details
product_url = soup.find('a', {'data-hook': 'product-link'}).get('href', '') # Extract product URL
product_title = soup.find('a', {'data-hook': 'product-link'}).get_text(strip=True) # Extract product title
total_rating = soup.find('span', {'data-hook': 'rating-out-of-text'}).get_text(strip=True) # Extract total rating
# Extracting individual reviews
reviews = []
review_elements = soup.find_all('div', {'data-hook': 'review'})
for review in review_elements:
author_name = review.find('span', class_='a-profile-name').get_text(strip=True) # Extract author name
rating_given = review.find('i', class_='review-rating').get_text(strip=True) # Extract rating given
comment = review.find('span', class_='review-text').get_text(strip=True) # Extract review comment
# Store each review in a dictionary
reviews.append({
'Product URL': product_url,
'Product Title': product_title,
'Total Rating': total_rating,
'Author': author_name,
'Rating': rating_given,
'Comment': comment,
})
# Define CSV file path
csv_file = 'amazon_reviews.csv'
# Define CSV fieldnames
fieldnames = ['Product URL', 'Product Title', 'Total Rating', 'Author', 'Rating', 'Comment']
# Writing data to CSV file
with open(csv_file, mode='w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
for review in reviews:
writer.writerow(review)
# Print confirmation message
print(f"Data saved to {csv_file}")
Feabhsaíonn proxies iontaofa na seansanna bacanna a sheachbhóthar agus cabhraíonn siad le braite ag scagairí frith-bot a laghdú. Maidir le scríobha, is minic a mheastar residential proxies mar gheall ar a scóir muiníne, agus soláthraíonn static ISP proxies luas agus seasmhacht.
Tá sé go hiomlán indéanta léirmheasanna táirgí Amazon a scríobadh ag úsáid Python, agus soláthraíonn Python na huirlisí riachtanacha chun é a dhéanamh. Le cúpla leabharlann agus beagán iniúchta chiallmhar ar an leathanach, is féidir leat gach saghas faisnéise úsáideach a fháil: ó na smaointe atá ag custaiméirí i ndáiríre go dtí an áit a mbíonn laigí ag do iomaitheoirí.
Ar ndóigh, tá roinnt constaicí ann: ní maith le Amazon scríobha. Mar sin má tá tú ag iarraidh léirmheasanna táirgí Amazon a scríobadh ar scála ag úsáid Python, beidh ort proxies a úsáid chun fanacht faoi radair. Is iad na roghanna is iontaofa ná residential proxies (scór muiníne maith, IPanna rothlacha) nó static ISP proxies (tapa agus seasmhach).
Tuairimí: 0