ga
English
Español
中國人
Tiếng Việt
Deutsch
Українська
Português
Français
भारतीय
Türkçe
한국인
Italiano
اردو
Indonesia
Polski Ní bhaineann anailís Ghréasáin le Google amháin. Cuireann Bing léargas malartach ar an SERP ar fáil atá úsáideach do thaighde SEO, cuardach nasc-ionad, monatóireacht branda, anailís chomórtais, agus taighde ábhair. Is uirlis iontach é Python don uathoibriú seo: éiceachóras aibí, comhréir shimplí, agus leabharlanna láidre le haghaidh parsála HTML agus oibriú le JSON, rud a ligeann duit torthaí cuardaigh Bing a scrabhadh níos tapúla agus níos áisiúla.
Úsáideann Bing a threoirlínte rangaithe agus a chomharthaí cáilíochta féin, agus mar sin is minic a bhíonn torthaí difriúil ó Google. Tá sé sin luachmhar chun deiseanna breise a aimsiú i gcuardach orgánach agus i gcéimeanna eireabaill fhada. Ina mholtaí do lucht gréasáin, cuireann Bing béim ar ábharthacht, cáilíocht/iontaofacht, rannpháirtíocht úsáideoirí, nuafhiosracht, fachtóirí geografacha, agus luas leathanaigh – meascán comharthaí atá éagsúil ó Google. Sin é an fáth a bhfuil roinnt leathanaigh níos airde ar Bing go sonrach.
Cásanna úsáide praiticiúla nuair a scriostar torthaí cuardaigh Bing:
Ó SERP “clasaiceach” is féidir leat a bhaint amach go hiontaofa:
Tábhachtach: Athraíonn marcáil Bing ó am go ham, mar sin b’fhéidir go mbeidh gá leis na roghnóirí sa chód thíos a choigeartú.
Suiteáil na bunrudaí:
pip install requests beautifulsoup4 lxml fake-useragent selenium
Úsáidfimid é seo mar bhunlíne chun an sreabhadh oibre a léiriú: iarratais GET a sheoladh, User-Agent a shocrú, cártaí torthaí a pharsáil, agus an teideal, an URL, an sleamhnán agus an suíomh a bhailiú.
import time
import random
from typing import List, Dict
import requests
from bs4 import BeautifulSoup
BING_URL = "https://www.bing.com/search"
HEADERS_POOL = [
# You can add more — or use fake-useragent
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 13_5) AppleWebKit/605.1.15 "
"(KHTML, like Gecko) Version/17.0 Safari/605.1.15",
]
def fetch_serp(query: str, count: int = 10, first: int = 1,
proxy: str | None = None) -> List[Dict]:
"""
Returns a list of results: title, url, snippet, position.
`first` — starting position (pagination), `count` — how many records to fetch.
"""
params = {"q": query, "count": count, "first": first}
headers = {"User-Agent": random.choice(HEADERS_POOL)}
proxies = {"http": proxy, "https": proxy} if proxy else None
resp = requests.get(BING_URL, params=params, headers=headers,
proxies=proxies, timeout=15)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "lxml")
# Typical Bing markup: <li class="b_algo"> ... <h2><a href="">Title</a></h2>
items = []
for idx, li in enumerate(soup.select("li.b_algo"), start=first):
a = li.select_one("h2 a")
if not a:
continue
title = a.get_text(strip=True)
url = a.get("href")
# Snippet is often in .b_caption p or simply the first <p>
sn_el = li.select_one(".b_caption p") or li.select_one("p")
snippet = sn_el.get_text(" ", strip=True) if sn_el else ""
items.append({
"position": idx,
"title": title,
"url": url,
"snippet": snippet
})
return items
if __name__ == "__main__":
data = fetch_serp("python web scraping tutorial", count=10)
for row in data:
print(f"{row['position']:>2}. {row['title']} -- {row['url']}")
print(f" {row['snippet']}\n")
Míniú:
Scoireadh d’API poiblí scrapála Bing Microsoft i Lúnasa 2025. Molann Microsoft aistriú go Grounding with Bing Search laistigh d’Azure AI Agents.
Cad is brí leis seo i gcleachtas
Úsáid APIs/ardáin SERP tríú páirtí (m.sh. Apify Bing Search Scraper) a thugann torthaí struchtúrtha ar ais: teideal, URL, sleamhnán, suíomh, srl.
Sampla íosta d’iarratas Apify:
import requests
API_TOKEN = "apify_xxx" # store in ENV
actor = "tri_angle/bing-search-scraper"
payload = {
"queries": ["python web scraping tutorial"],
"countryCode": "US",
"includeUnfilteredResults": False
}
r = requests.post(
f"https://api.apify.com/v2/acts/{actor}/runs?token={API_TOKEN}",
json=payload, timeout=30
)
run = r.json()
# Retrieve dataset items using run['data']['defaultDatasetId']
Tacaíonn doiciméid Apify le torthaí orgánacha, PAA, ceisteanna gaolmhara agus eile. Cinntigh go gcomhlíonann do chás úsáide rialacha an ardáin agus dlíthe do dhlínse féin.
Leid: Má oibríonn tú i gcruach Azure AI Agents agus nach dteastaíonn uait ach tagairtí talmhaíochta do LLM (in ionad JSON amh), léigh an treoir ar Grounding with Bing Search.
Nuair a bhíonn carúisíl, bloic idirghníomhacha nó ábhar rindreáilte ag JavaScript sa SERP, ba cheart duit aistriú go Selenium (Headless Chrome/Firefox).
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
def selenium_bing(query: str, headless: bool = True):
opts = Options()
if headless:
opts.add_argument("--headless=new")
opts.add_argument("--disable-gpu")
opts.add_argument("--no-sandbox")
with webdriver.Chrome(options=opts) as driver:
driver.get("https://www.bing.com/")
box = driver.find_element(By.NAME, "q")
box.send_keys(query)
box.submit()
# Consider adding explicit waits via WebDriverWait
cards = driver.find_elements(By.CSS_SELECTOR, "li.b_algo h2 a")
results = []
for i, a in enumerate(cards, start=1):
results.append({"position": i, "title": a.text, "url": a.get_attribute("href")})
return results
if __name__ == "__main__":
print(selenium_bing("site:docs.python.org requests headers"))
Féach ar na doiciméid oifigiúla Selenium le haghaidh suiteáil tiománaithe agus samplaí WebDriverWait.
Don chur i bhfeidhm deiridh, déanfar scrapáil Bing go díreach ón HTML:
Ar an mbealach seo ní bhíonn cuntais Microsoft de dhíth ort agus ní bhíonn tú faoi cheangal ag APIs íoctha tríú páirtí. Maidir le roghnú torthaí, úsáidimid an coimeádán cárta-torthaí li.b_algo, a úsáidtear go coitianta do bhlocanna orgánacha Bing.
from __future__ import annotations
import argparse
import csv
import dataclasses
import pathlib
import random
import sys
import time
from typing import List, Optional, Tuple
import requests
from bs4 import BeautifulSoup, FeatureNotFound
BING_URL = "https://www.bing.com/search"
# Pool of user agents
UA_POOL = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
]
@dataclasses.dataclass
class SerpItem:
position: int
title: str
url: str
snippet: str
def build_session(proxy: Optional[str] = None) -> requests.Session:
"""Create a session with baseline headers and an optional proxy."""
s = requests.Session()
s.headers.update(
{
"User-Agent": random.choice(UA_POOL),
"Accept-Language": "uk-UA,uk;q=0.9,en;q=0.8",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
}
)
if proxy:
# Requests proxy dict format: {'http': 'http://host:port', 'https': 'http://host:port'}
s.proxies.update({"http": proxy, "https": proxy})
return s
def _soup_with_fallback(html: str) -> BeautifulSoup:
"""Parse HTML with a forgiving fallback chain: lxml -> html.parser -> html5lib (if available)."""
for parser in ("lxml", "html.parser", "html5lib"):
try:
return BeautifulSoup(html, parser)
except FeatureNotFound:
continue
# If none are installed, bs4 will raise; let it propagate
return BeautifulSoup(html, "html.parser")
def parse_serp_html(html: str, start_pos: int) -> List[SerpItem]:
"""Extract organic results from Bing SERP HTML."""
soup = _soup_with_fallback(html)
items: List[SerpItem] = []
# Organic blocks typically look like <li class="b_algo"> with h2>a and a snippet under .b_caption p or the first <p>.
for i, li in enumerate(soup.select("li.b_algo"), start=start_pos):
a = li.select_one("h2 > a")
if not a:
continue
title = (a.get_text(strip=True) or "").strip()
url = a.get("href") or ""
p = li.select_one(".b_caption p") or li.select_one("p")
snippet = (p.get_text(" ", strip=True) if p else "").strip()
items.append(SerpItem(position=i, title=title, url=url, snippet=snippet))
return items
def fetch_bing_page(
session: requests.Session,
query: str,
first: int = 1,
count: int = 10,
cc: str = "UA",
setlang: str = "uk",
timeout: int = 20,
) -> List[SerpItem]:
"""Download one results page and return parsed items."""
params = {
"q": query,
"count": count, # 10, 15, 20...
"first": first, # 1, 11, 21...
"cc": cc, # country code for results
"setlang": setlang, # interface/snippet language
}
r = session.get(BING_URL, params=params, timeout=timeout)
r.raise_for_status()
return parse_serp_html(r.text, start_pos=first)
def search_bing(
query: str,
pages: int = 1,
count: int = 10,
pause_range: Tuple[float, float] = (1.2, 2.7),
proxy: Optional[str] = None,
cc: str = "UA",
setlang: str = "uk",
timeout: int = 20,
) -> List[SerpItem]:
"""Iterate over pages and return an aggregated list of results."""
session = build_session(proxy=proxy)
all_items: List[SerpItem] = []
first = 1
for _ in range(pages):
items = fetch_bing_page(
session, query, first=first, count=count, cc=cc, setlang=setlang, timeout=timeout
)
all_items.extend(items)
time.sleep(random.uniform(*pause_range)) # polite delay
first += count
return all_items
def _normalize_cell(s: str) -> str:
"""Optional: collapse internal whitespace so simple viewers show one‑line cells."""
# Convert tabs/newlines/multiple spaces to a single space
return " ".join((s or "").split())
def save_csv(
items: List[SerpItem],
path: str,
excel_friendly: bool = False,
normalize: bool = False,
delimiter: str = ",",
) -> int:
"""
Write results to CSV.
— excel_friendly=True -> write UTF‑8 with BOM (utf‑8‑sig) so Excel auto‑detects Unicode.
— normalize=True -> collapse whitespace inside string fields.
— delimiter -> change if your consumer expects ';', etc.
Returns the number of rows written (excluding header).
"""
p = pathlib.Path(path)
p.parent.mkdir(parents=True, exist_ok=True)
encoding = "utf-8-sig" if excel_friendly else "utf-8"
# newline='' is required so Python's csv handles line endings correctly on all platforms
with p.open("w", newline="", encoding=encoding) as f:
writer = csv.DictWriter(
f,
fieldnames=["position", "title", "url", "snippet"],
delimiter=delimiter,
quoting=csv.QUOTE_MINIMAL,
)
writer.writeheader()
for it in items:
row = dataclasses.asdict(it)
if normalize:
row = {k: _normalize_cell(v) if isinstance(v, str) else v for k, v in row.items()}
writer.writerow(row)
return len(items)
def main() -> int:
ap = argparse.ArgumentParser(description="Bing SERP scraper (Requests + BS4)")
ap.add_argument("-q", "--query", required=True, help="Search query")
ap.add_argument("--pages", type=int, default=1, help="Number of pages (x count)")
ap.add_argument("--count", type=int, default=10, help="Results per page")
ap.add_argument("--cc", default="UA", help="Country code for results (cc)")
ap.add_argument("--setlang", default="uk", help="Interface/snippet language (setlang)")
ap.add_argument("--proxy", help="Proxy, e.g. http://user:pass@host:port")
ap.add_argument("--csv", help="Path to CSV to save results")
ap.add_argument(
"--excel-friendly",
action="store_true",
help="Add BOM (UTF‑8‑SIG) so Excel opens the file correctly",
)
ap.add_argument(
"--normalize-cells",
action="store_true",
help="Remove line breaks and extra spaces in cells",
)
ap.add_argument(
"--delimiter",
default=",",
help="CSV delimiter (default ','); e.g.: ';'",
)
args = ap.parse_args()
try:
items = search_bing(
args.query,
pages=args.pages,
count=args.count,
proxy=args.proxy,
cc=args.cc,
setlang=args.setlang,
)
except requests.HTTPError as e:
print(f"[ERROR] HTTP error: {e}", file=sys.stderr)
return 2
except requests.RequestException as e:
print(f"[ERROR] Network error: {e}", file=sys.stderr)
return 2
if args.csv:
try:
n = save_csv(
items,
args.csv,
excel_friendly=args.excel_friendly,
normalize=args.normalize_cells,
delimiter=args.delimiter,
)
print(f"Saved {n} rows to {args.csv}")
except OSError as e:
print(f"[ERROR] Could not write CSV to {args.csv}: {e}", file=sys.stderr)
return 3
else:
for it in items:
print(f"{it.position:>2}. {it.title} -- {it.url}")
if it.snippet:
print(" ", it.snippet[:180])
return 0
if __name__ == "__main__":
sys.exit(main())
Sampla úsáide le paraiméadair bhreise agus seachfhreastalaí:
python bing_scraper.py -q "Python web scraping" --pages 3 --csv out.csv \
--proxy "http://username:password@proxy:port"
Cad a dhéanann an script:
Leideanna cobhsaíochta:
Cá háit le tuilleadh a léamh faoi na huirlisí
Leid: Má theastaíonn bonneagar seachfhreastalaí uait le haghaidh bailiú sonraí níos cobhsaí, féach ar na seachfhreastalaithe is fearr do Bing.
Prionsabail thábhachtacha chun a chinntiú nach “bhfaigheann” do scraper bás i gcéad timthriall:
Tá sé úsáideach Bing a scrabhadh nuair is mian leat taighde a leathnú thar Google, fearainn deontóra breise a bhailiú, gnéithe malartacha SERP a rianú, agus léargas neamhspleách ar an tírdhreach a fháil. Le haghaidh comhtháthú cobhsaí agus “oifigiúil”, molann Microsoft Grounding with Bing Search in Azure AI Agents; tá sé níos sábháilte ó thaobh téarmaí úsáide de ach ní thugann sé sonraí amh JSON SERP ar ais. Má tá do thasc dírithe ar thorthaí struchtúrtha a bhaint, roghnaigh parsáil HTML dhíreach trí Requests/BS4 nó Selenium, nó bain úsáid as API SERP speisialaithe. Roghnaigh an uirlis cheart don jab: parsáil thapa HTML le haghaidh fréamhshamhlacha, gníomhairí le haghaidh tagairtí talmhaíochta LLM, agus APIs SERP le haghaidh bailiú ar scála níos mó.
Tuairimí: 0