Is féidir le faisnéis réadmhaoine a bhaint as Zillow anailís iontach a thairiscint don mhargadh agus d'infheistíochtaí. Tá sé mar aidhm ag an bpost seo liostaí maoine Zillow a scríobadh le Python áit a mbeidh sé mór ar chéimeanna riachtanacha a thógtar agus ar threoirlínte. Taispeánfaidh an treoir seo duit conas faisnéis a scríobadh ó láithreán gréasáin Zillow ag úsáid leabharlanna cosúil le hiarratais, agus LXML.
Sula dtosaímid, cinntigh go bhfuil python suiteáilte agat ar do chóras. Beidh ort na leabharlanna seo a leanas a shuiteáil freisin:
pip install requests
pip install lxml
Chun sonraí a bhaint as Zillow, ní mór duit struchtúr an leathanaigh ghréasáin a thuiscint. Oscail leathanach liostála maoine ar Zillow agus iniúchadh a dhéanamh ar na heilimintí is mian leat a scrape (m.sh., teideal maoine, praghas meastacháin cíosa, agus praghas measúnaithe).
Teideal:
Sonraí praghais:
Anois déanaimis iarratais HTTP a sheoladh. Ar an gcéad dul síos, ní mór dúinn ábhar HTML an leathanach Zillow a fháil. Bainfimid úsáid as an leabharlann iarratais chun iarratas HTTP GET a sheoladh chuig an sprioc URL. Bunóimid na ceanntásca iarratais freisin chun aithris a dhéanamh ar iarratas fíor -bhrabhsálaí agus úsáid a bhaint as seachvótálaithe chun blocáil IP a sheachaint.
import requests
# Sainmhínigh an sprioc URL don liosta maoine Zillow
url = "https://www.zillow.com/homedetails/1234-Main-St-Some-City-CA-90210/12345678_zpid/"
# Cuir na ceanntásca iarratais ar bun chun iarratas ar bhrabhsálaí a aithris
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'sec-ch-ua': '"Not/A)Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
'sec-ch-ua-mobile': '?0',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'same-origin',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
}
# Go roghnach, cuir seachvótálaithe ar bun chun blocáil IP a sheachaint
proxies = {
'http': 'http://username:password@your_proxy_address',
'https://username:password@your_proxy_address',
}
# Seol an t -iarratas HTTP Get le ceanntásca agus seachvótálaithe
response = requests.get(url, headers=headers, proxies=proxies)
response.raise_for_status() # Ensure we got a valid response
Ansin, ní mór dúinn an t -ábhar HTML a pharsáil ag úsáid LXML. Bainfimid úsáid as an bhfeidhm asstring ón modúl LXML.html chun ábhar HTML an leathanaigh ghréasáin a pharsáil i réad eilimint.
from lxml.html import fromstring
# Parse an t -ábhar HTML ag úsáid LXML
parser = fromstring(response.text)
Anois, bainfimid pointí sonraí sonracha ar nós teideal na maoine, praghas meastacháin cíosa, agus praghas measúnaithe ag baint úsáide as ceisteanna XPath ar an ábhar HTML parsáilte.
# Teideal na Maoine a Bhaint Amach Ag baint úsáide as XPath
title = ' '.join(parser.xpath('//h1[@class="Text-c11n-8-99-3__sc-aiai24-0 dFxMdJ"]/text()'))
# An praghas meastacháin cíosa maoine a bhaint ag baint úsáide as xpath
rent_estimate_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-2]
# An praghas measúnaithe maoine a bhaint as XPath
assessment_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-1]
# Stóráil na sonraí eastósctha i bhfoclóir
property_data = {
'title': title,
'Rent estimate price': rent_estimate_price,
'Assessment price': assessment_price
}
Mar fhocal scoir, sábhálfaimid na sonraí eastósctha chuig comhad JSON le haghaidh tuilleadh próiseála.
import json
# Sainmhínigh ainm an chomhaid aschuir JSON
output_file = 'zillow_properties.json'
# Oscail an comhad i mód scríbhneoireachta agus dumpáil na sonraí
with open(output_file, 'w') as f:
json.dump(all_properties, f, indent=4)
print(f"Scraped data saved to {output_file}")
Chun liostaí maoine iolracha a scríobadh, déanfaimid liosta de URLanna a athrá agus déanfaimid an próiseas eastósctha sonraí a athdhéanamh do gach ceann acu.
# Liosta URLanna le scrape
urls = [
"https://www.zillow.com/homedetails/1234-Main-St-Some-City-CA-90210/12345678_zpid/",
"https://www.zillow.com/homedetails/5678-Another-St-Some-City-CA-90210/87654321_zpid/"
]
# Liosta chun sonraí a stóráil do gach réadmhaoin
all_properties = []
for url in urls:
# Seol an t -iarratas HTTP Get le ceanntásca agus seachvótálaithe
response = requests.get(url, headers=headers, proxies=proxies)
response.raise_for_status() # Ensure we got a valid response
# Parse an t -ábhar HTML ag úsáid LXML
parser = fromstring(response.text)
# Sonraí a bhaint as XPath ag baint úsáide as XPath
title = ' '.join(parser.xpath('//h1[@class="Text-c11n-8-99-3__sc-aiai24-0 dFxMdJ"]/text()'))
rent_estimate_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-2]
assessment_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-1]
# Stóráil na sonraí eastósctha i bhfoclóir
property_data = {
'title': title,
'Rent estimate price': rent_estimate_price,
'Assessment price': assessment_price
}
# Cuir na sonraí réadmhaoine ar an liosta
all_properties.append(property_data)
Seo an cód iomlán chun sonraí maoine Zillow a scrape agus é a shábháil ar chomhad JSON:
import requests
from lxml.html import fromstring
import json
# Sainmhínigh na spriocanna URLanna do liostaí maoine Zillow
urls = [
"https://www.zillow.com/homedetails/1234-Main-St-Some-City-CA-90210/12345678_zpid/",
"https://www.zillow.com/homedetails/5678-Another-St-Some-City-CA-90210/87654321_zpid/"
]
# Cuir na ceanntásca iarratais ar bun chun iarratas ar bhrabhsálaí a aithris
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'sec-ch-ua': '"Not/A)Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
'sec-ch-ua-mobile': '?0',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'same-origin',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
}
# Go roghnach, cuir seachvótálaithe ar bun chun blocáil IP a sheachaint
proxies = {
'http': 'http://username:password@your_proxy_address',
'https': 'https://username:password@your_proxy_address',
}
# Liosta chun sonraí a stóráil do gach réadmhaoin
all_properties = []
for url in urls:
try:
# Seol an t -iarratas HTTP Get le ceanntásca agus seachvótálaithe
response = requests.get(url, headers=headers, proxies=proxies)
response.raise_for_status() # Ensure we got a valid response
# Parse an t -ábhar HTML ag úsáid LXML
parser = fromstring(response.text)
# Sonraí a bhaint as XPath ag baint úsáide as XPath
title = ' '.join(parser.xpath('//h1[@class="Text-c11n-8-99-3__sc-aiai24-0 dFxMdJ"]/text()'))
rent_estimate_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-2]
assessment_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-1]
# Stóráil na sonraí eastósctha i bhfoclóir
property_data = {
'title': title,
'Rent estimate price': rent_estimate_price,
'Assessment price': assessment_price
}
# Cuir na sonraí réadmhaoine ar an liosta
all_properties.append(property_data)
except requests.exceptions.HTTPError as e:
print(f"HTTP error occurred: {e}")
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
# Sainmhínigh ainm an chomhaid aschuir JSON
output_file = 'zillow_properties.json'
# Oscail an comhad i mód scríbhneoireachta agus dumpáil na sonraí
with open(output_file, 'w') as f:
json.dump(all_properties, f, indent=4)
print(f"Scraped data saved to {output_file}")
Trí struchtúr na leathanach HTML a thuiscint agus trí leabharlanna cumhachtacha ar nós iarratais agus LXML a ghiaráil, is féidir leat sonraí maoine a bhaint go héifeachtach. Trí sheachvótálaithe a fhostú agus gníomhairí úsáideoirí rothlacha a fhostú is féidir leat méid mór iarratas a dhéanamh ar shuíomhanna cosúil le Zillow gan an baol go gcuirfí bac ort. I gcás na ngníomhaíochtaí seo, meastar gur roghanna is fearr iad seachvótálaithe ISP statach nó seachvótálaithe cónaitheacha rothlacha.
Tuairimí: 0