Treoir chun léirmheasanna Amazon a scríobadh ag úsáid Python

Tuairimí: 0

Is féidir le scriobadh léirmheasanna Amazon le Python a bheith úsáideach — cibé acu atá tú ag anailísiú iomaitheoirí, ag seiceáil cad atá custaiméirí fíor ag rá, nó ag tochailt isteach i dtreochtaí margaidh. Má tá tú ag fiafraí conas léirmheasanna Amazon a scríobadh ag úsáid Python, siúlann an teagasc gearr seo thú tríd an bpróiseas praiticiúil a úsáideann an pacáiste Requests agus BeautifulSoup chun ábhar léirmheasa a fháil go clárúil.

Céim 1. Leabharlanna riachtanacha a shuiteáil

Sula ndéanann tú aon rud eile, beidh ort cúpla leabharlann a shuiteáil. Is féidir an dá spleáchas lárnacha, Requests le haghaidh glaonna líonra agus BeautifulSoup le haghaidh dul tríd an gcrann HTML, a fháil le líne amháin sa teirminéal:

pip install requests
pip install beautifulsoup4

Céim 2. Cumrú an phróisis scriobtha

Dírimid ar léirmheasanna Amazon ag úsáid Python agus scrúdóimid gach céim den phróiseas scriobtha céim ar chéim.

Ag tuiscint struchtúr an láithreáin

Tá sé riachtanach struchtúr HTML an láithreáin a thuiscint chun eilimintí léirmheas a aithint. Cuimsíonn an rannóg léirmheasanna réimsí mar ainm an léirmheastóra, rátáil réalta agus trácht scríofa; caithfear iad a aimsiú trí uirlisí iniúchta an bhrabhsálaí.

Teideal an táirge agus URL:

1.png

Rátáil iomlán:

2.png

Rannóg na léirmheasanna:

3.png

Ainm an údair:

4.png

Rátáil:

5.png

Tráchtaireacht:

6.png

Ag seoladh iarratais HTTP

Tá ról tábhachtach ag na ceannlíneacha. Socraítear snáitheanna User-Agent agus ceannlínte eile chun tú a shamhlaíonn mar bhrabhsálaí rialta agus chun an seans go mbeidh tú faoi deara a laghdú. Má tá tú ag iarraidh é seo a dhéanamh i gceart, taispeánfaidh treoir scríobtha Amazon le Python conas na ceannlínte seo a shocrú agus conas proxies a úsáid chun do iarratais a choinneáil réidh agus níos lú intuigthe.

Proxies

Ligeann proxies athrú IP chun an riosca bacanna agus na teorainneacha ráta a laghdú. Tá siad thar a bheith tábhachtach do scriobha ar scála mór.

Ceannlíneacha iomlána iarratais

Trí cheannlínte éagsúla a áireamh mar Accept-Encoding, Accept-Language, Referer, Connection, agus Upgrade-Insecure-Requests déanann tú iarratas cosúil le hiarratas brabhsálaí dlisteanach, rud a laghdaíonn an seans go gcuirfear tú marcáilte mar bot.

import requests

url = "https://www.amazon.com/Portable-Mechanical-Keyboard-MageGee-Backlit/product-reviews/B098LG3N6R/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews"

# Example of a proxy provided by the proxy service
proxy = {
    'http': 'http://your_proxy_ip:your_proxy_port',
    'https': 'https://your_proxy_ip:your_proxy_port'
}

headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'accept-language': 'en-US,en;q=0.9',
    'cache-control': 'no-cache',
    'dnt': '1',
    'pragma': 'no-cache',
    'sec-ch-ua': '"Not/A)Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
    'sec-ch-ua-mobile': '?0',
    'sec-fetch-dest': 'document',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-user': '?1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
}

# Send HTTP GET request to the URL with headers and proxy
try:
    response = requests.get(url, headers=headers, proxies=proxy, timeout=10)
    response.raise_for_status()  # Raise an error if the request failed

except requests.exceptions.RequestException as e:
    print(f"Error: {e}")

Céim 3. Eolas an táirge a bhaint ag úsáid BeautifulSoup

Tar éis don leathanach luchtú, athraíonn BeautifulSoup an HTML amh ina chrann inchuardaithe. Ón struchtúr sin, glacann an scraper na naisc chanónacha táirge, teidil leathanaigh, agus aon chomhiomláin rátála infheicthe.

from bs4 import BeautifulSoup

soup = BeautifulSoup(response.content, 'html.parser')

# Extracting common product details
product_url = soup.find('a', {'data-hook': 'product-link'}).get('href', '')
product_title = soup.find('a', {'data-hook': 'product-link'}).get_text(strip=True)
total_rating = soup.find('span', {'data-hook': 'rating-out-of-text'}).get_text(strip=True)

Céim 4. Sonraí léirmheasa a bhaint ag úsáid BeautifulSoup

Téimid ar ais go dtí an struchtúr HTML céanna, ach an uair seo dírímid ar ainmneacha na léirmheastóirí, rátálacha réalta, agus tráchtanna scríofa a bhailiú — déantar gach rud seo le Python chun léirmheasanna Amazon a scríobadh go héifeachtach ag úsáid roghnóirí réamhshainithe.

reviews = []
review_elements = soup.find_all('div', {'data-hook': 'review'})
for review in review_elements:
    author_name = review.find('span', class_='a-profile-name').get_text(strip=True)
    rating_given = review.find('i', class_='review-rating').get_text(strip=True)
    comment = review.find('span', class_='review-text').get_text(strip=True)

    reviews.append({
        'Product URL': product_url,
        'Product Title': product_title,
        'Total Rating': total_rating,
        'Author': author_name,
        'Rating': rating_given,
        'Comment': comment,
    })

Céim 5. Sonraí a shábháil go CSV

Is féidir le csv.writer ionsuite Python na sonraí léirmheasa bailithe a shábháil i gcomhad .csv le haghaidh anailíse níos déanaí.

import csv

# Define CSV file path
csv_file = 'amazon_reviews.csv'

# Define CSV fieldnames
fieldnames = ['Product URL', 'Product Title', 'Total Rating', 'Author', 'Rating', 'Comment']

# Writing data to CSV file
with open(csv_file, mode='w', newline='', encoding='utf-8') as file:
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    writer.writeheader()
    for review in reviews:
        writer.writerow(review)

print(f"Data saved to {csv_file}")

Cód iomlán

Cuirtear bloc cód i láthair a nascann le chéile na céimeanna tógála iarratais, parsála agus aschuir comhaid, ag clúdach an phróisis scríobtha iomláin i script amháin inrite:

import requests
from bs4 import BeautifulSoup
import csv
import urllib3

urllib3.disable_warnings()

# URL of the Amazon product reviews page
url = "https://www.amazon.com/Portable-Mechanical-Keyboard-MageGee-Backlit/product-reviews/B098LG3N6R/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews"

# Proxy provided by the proxy service with IP-authorization
path_proxy = 'your_proxy_ip:your_proxy_port'
proxy = {
   'http': f'http://{path_proxy}',
   'https': f'https://{path_proxy}'
}

# Headers for the HTTP request
headers = {
   'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
   'accept-language': 'en-US,en;q=0.9',
   'cache-control': 'no-cache',
   'dnt': '1',
   'pragma': 'no-cache',
   'sec-ch-ua': '"Not/A)Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
   'sec-ch-ua-mobile': '?0',
   'sec-fetch-dest': 'document',
   'sec-fetch-mode': 'navigate',
   'sec-fetch-site': 'same-origin',
   'sec-fetch-user': '?1',
   'upgrade-insecure-requests': '1',
   'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
}

# Send HTTP GET request to the URL with headers and handle exceptions
try:
   response = requests.get(url, headers=headers, timeout=10, proxies=proxy, verify=False)
   response.raise_for_status()  # Raise an error if the request failed

except requests.exceptions.RequestException as e:
   print(f"Error: {e}")

# Use BeautifulSoup to parse the HTML and grab the data you need
soup = BeautifulSoup(response.content, 'html.parser')

# Extracting common product details
product_url = soup.find('a', {'data-hook': 'product-link'}).get('href', '')  # Extract product URL
product_title = soup.find('a', {'data-hook': 'product-link'}).get_text(strip=True)  # Extract product title
total_rating = soup.find('span', {'data-hook': 'rating-out-of-text'}).get_text(strip=True)  # Extract total rating

# Extracting individual reviews
reviews = []
review_elements = soup.find_all('div', {'data-hook': 'review'})
for review in review_elements:
   author_name = review.find('span', class_='a-profile-name').get_text(strip=True)  # Extract author name
   rating_given = review.find('i', class_='review-rating').get_text(strip=True)  # Extract rating given
   comment = review.find('span', class_='review-text').get_text(strip=True)  # Extract review comment

   # Store each review in a dictionary
   reviews.append({
       'Product URL': product_url,
       'Product Title': product_title,
       'Total Rating': total_rating,
       'Author': author_name,
       'Rating': rating_given,
       'Comment': comment,
   })

# Define CSV file path
csv_file = 'amazon_reviews.csv'

# Define CSV fieldnames
fieldnames = ['Product URL', 'Product Title', 'Total Rating', 'Author', 'Rating', 'Comment']

# Writing data to CSV file
with open(csv_file, mode='w', newline='', encoding='utf-8') as file:
   writer = csv.DictWriter(file, fieldnames=fieldnames)
   writer.writeheader()
   for review in reviews:
       writer.writerow(review)

# Print confirmation message
print(f"Data saved to {csv_file}")

Feabhsaíonn proxies iontaofa na seansanna bacanna a sheachbhóthar agus cabhraíonn siad le braite ag scagairí frith-bot a laghdú. Maidir le scríobha, is minic a mheastar residential proxies mar gheall ar a scóir muiníne, agus soláthraíonn static ISP proxies luas agus seasmhacht.

Conclúid

Tá sé go hiomlán indéanta léirmheasanna táirgí Amazon a scríobadh ag úsáid Python, agus soláthraíonn Python na huirlisí riachtanacha chun é a dhéanamh. Le cúpla leabharlann agus beagán iniúchta chiallmhar ar an leathanach, is féidir leat gach saghas faisnéise úsáideach a fháil: ó na smaointe atá ag custaiméirí i ndáiríre go dtí an áit a mbíonn laigí ag do iomaitheoirí.

Ar ndóigh, tá roinnt constaicí ann: ní maith le Amazon scríobha. Mar sin má tá tú ag iarraidh léirmheasanna táirgí Amazon a scríobadh ar scála ag úsáid Python, beidh ort proxies a úsáid chun fanacht faoi radair. Is iad na roghanna is iontaofa ná residential proxies (scór muiníne maith, IPanna rothlacha) nó static ISP proxies (tapa agus seasmhach).

Tuairimí:

0 tuairimí