How to Scrape Bing Search Results with Python

Comments: 0

Web analytics isn’t limited to Google. Bing offers an alternative view of the SERP that’s useful for SEO research, link prospecting, brand monitoring, competitive analysis, and content research. Python is an ideal tool for this kind of automation: a mature ecosystem, straightforward syntax, and robust libraries for HTML parsing and working with JSON allows you to scrape Bing search results faster and more conveniently.

Why Focus on Bing Rather Than Google?

Bing uses its own ranking guidelines and quality signals, so results often differ from Google. That’s valuable for uncovering additional opportunities in organic search and long-tail queries. In its webmaster recommendations, Bing emphasizes relevance, quality/trust, user engagement, freshness, geo factors, and page speed–a different balance of signals than Google. That’s why some pages rank higher specifically on Bing.

Practical use cases when scrape Bing search results:

  • Expanding your link-building donor list–this engine sometimes elevates sites that don’t appear in Google’s top 10.
  • Tracking PAA (“People also ask”) and Bing’s universal SERP elements (video, carousels) to adjust your content strategy.

What Data Can You Extract From Bing Search?

From a “classic” SERP you can reliably extract:

  • Title;
  • URL (document link);
  • Snippet (description);
  • Position in the results (ordinal index);
  • Some universal results: “Related/People also ask”, embedded image/video results (when included directly in the main SERP).

Important: Bing’s markup changes periodically, so the selectors in the code below may need tweaks.

Legal and Ethical Considerations When Scrape Bing Search

  • Follow Microsoft’s Terms of Use: for “official” access to web data, Microsoft now offers Grounding with Bing Search as part of Azure AI Agents. The public Bing Search APIs were fully sunset on August 11, 2025.
  • Grounding with Bing Search has its own TOU and constraints: it’s used through Azure agents, and results come back in the agent’s responses rather than as “raw” JSON SERP data.
  • Respect robots.txt and avoid overloading hosts—adhering to robots is baseline scraping ethics.

Setting up Your Python Environment for Scraping

Install the basics:

pip install requests beautifulsoup4 lxml fake-useragent selenium
  • requests — HTTP client (lets you set headers such as User-Agent);
  • beautifulsoup4 + lxml — HTML parsing;
  • fake-useragent — random UA generation (or build your own list);
  • selenium — render dynamic blocks when needed.

Method 1 – Scraping Bing via Requests and BeautifulSoup

We’ll use this as the baseline to demonstrate the workflow: issue GET requests, set a User-Agent, parse result cards, and collect title, URL, snippet, and position.

import time
import random
from typing import List, Dict
import requests
from bs4 import BeautifulSoup

BING_URL = "https://www.bing.com/search"

HEADERS_POOL = [
    # You can add more — or use fake-useragent
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
    "(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_5) AppleWebKit/605.1.15 "
    "(KHTML, like Gecko) Version/17.0 Safari/605.1.15",
]

def fetch_serp(query: str, count: int = 10, first: int = 1,
               proxy: str | None = None) -> List[Dict]:
    """
Returns a list of results: title, url, snippet, position.
`first` — starting position (pagination), `count` — how many records to fetch.

    """
    params = {"q": query, "count": count, "first": first}
    headers = {"User-Agent": random.choice(HEADERS_POOL)}
    proxies = {"http": proxy, "https": proxy} if proxy else None

    resp = requests.get(BING_URL, params=params, headers=headers,
                        proxies=proxies, timeout=15)
    resp.raise_for_status()
    soup = BeautifulSoup(resp.text, "lxml")

   # Typical Bing markup: <li class="b_algo"> ... <h2><a href="">Title</a></h2>
    items = []
    for idx, li in enumerate(soup.select("li.b_algo"), start=first):
        a = li.select_one("h2 a")
        if not a:
            continue
        title = a.get_text(strip=True)
        url = a.get("href")
         # Snippet is often in .b_caption p or simply the first <p>
        sn_el = li.select_one(".b_caption p") or li.select_one("p")
        snippet = sn_el.get_text(" ", strip=True) if sn_el else ""
        items.append({
            "position": idx,
            "title": title,
            "url": url,
            "snippet": snippet
        })
    return items

if __name__ == "__main__":
    data = fetch_serp("python web scraping tutorial", count=10)
    for row in data:
        print(f"{row['position']:>2}. {row['title']} -- {row['url']}")
        print(f"   {row['snippet']}\n")

Explanation:

  • Use count/first parameters for pagination.
  • Selectors li.b_algo h2 a and .b_caption p are baseline; the layout can change (inspect in DevTools).
  • Add a proxy when needed and regulate pauses between requests.
  • We’ll enhance this example a bit further below, since it’s the most effective approach for our purposes under current conditions.

Method 2 — Scrape Bing Search Results via API (state in 2025)

Microsoft’s public Bing scraper API was retired in August 2025. Microsoft recommends migrating to Grounding with Bing Search within Azure AI Agents.

What this means in practice

  • The classic REST endpoint with “raw” JSON SERP data is no longer available to most developers.
  • Grounding with Bing Search is connected as a tool inside an Azure agent; the agent can “look up” the web and return a synthesized answer. The service has its own TOU and specifics: it isn’t designed for bulk extraction of raw SERP results.

Alternative for raw SERP in JSON

Use third‑party SERP APIs/platforms (e.g., Apify Bing Search Scraper) that return structured results: title, URL, snippet, position, etc.

Minimal Apify request example:

import requests

API_TOKEN = "apify_xxx"  # store in ENV
actor = "tri_angle/bing-search-scraper"
payload = {
    "queries": ["python web scraping tutorial"],
    "countryCode": "US",
    "includeUnfilteredResults": False
}

r = requests.post(
    f"https://api.apify.com/v2/acts/{actor}/runs?token={API_TOKEN}",
    json=payload, timeout=30
)
run = r.json()
# Retrieve dataset items using run['data']['defaultDatasetId']

Apify documents support for organic results, PAA, related queries, and more. Make sure your use case complies with platform rules and with the laws of your jurisdiction.

Tip: If you work in the Azure AI Agents stack and only need grounded references for an LLM (rather than raw JSON), read the guide on Grounding with Bing Search.

Method 3 – Parsing Dynamic Content with Selenium

When the SERP includes carousels, interactive blocks, or content rendered by JavaScript, switch to Selenium (Headless Chrome/Firefox).

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

def selenium_bing(query: str, headless: bool = True):
    opts = Options()
    if headless:
        opts.add_argument("--headless=new")
    opts.add_argument("--disable-gpu")
    opts.add_argument("--no-sandbox")
    with webdriver.Chrome(options=opts) as driver:
        driver.get("https://www.bing.com/")
        box = driver.find_element(By.NAME, "q")
        box.send_keys(query)
        box.submit()

        # Consider adding explicit waits via WebDriverWait
        cards = driver.find_elements(By.CSS_SELECTOR, "li.b_algo h2 a")
        results = []
        for i, a in enumerate(cards, start=1):
            results.append({"position": i, "title": a.text, "url": a.get_attribute("href")})
        return results

if __name__ == "__main__":
    print(selenium_bing("site:docs.python.org requests headers"))

Refer to the official Selenium docs for driver installation and WebDriverWait examples.

Practical Solution: Parsing Strategy and Example Code

For the final implementation, we’ll perform Bing scraping directly from HTML:

  1. Send HTTP requests to https://www.bing.com/search.
  2. Set a User-Agent.
  3. Parse HTML via BeautifulSoup + lxml to extract titles, URLs, and snippets.

This way you don’t need Microsoft accounts and you’re not tied to third‑party paid APIs. For result selection we use the result‑card container li.b_algo, which is commonly used for Bing’s organic blocks.

Working Example (pagination, delays, optional proxy)

from __future__ import annotations

import argparse
import csv
import dataclasses
import pathlib
import random
import sys
import time
from typing import List, Optional, Tuple

import requests
from bs4 import BeautifulSoup, FeatureNotFound

BING_URL = "https://www.bing.com/search"

# Pool of user agents
UA_POOL = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
]

@dataclasses.dataclass
class SerpItem:
    position: int
    title: str
    url: str
    snippet: str


def build_session(proxy: Optional[str] = None) -> requests.Session:
    """Create a session with baseline headers and an optional proxy."""
    s = requests.Session()
    s.headers.update(
        {
            "User-Agent": random.choice(UA_POOL),
            "Accept-Language": "uk-UA,uk;q=0.9,en;q=0.8",
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        }
    )
    if proxy:
        # Requests proxy dict format: {'http': 'http://host:port', 'https': 'http://host:port'}
        s.proxies.update({"http": proxy, "https": proxy})
    return s


def _soup_with_fallback(html: str) -> BeautifulSoup:
    """Parse HTML with a forgiving fallback chain: lxml -> html.parser -> html5lib (if available)."""
    for parser in ("lxml", "html.parser", "html5lib"):
        try:
            return BeautifulSoup(html, parser)
        except FeatureNotFound:
            continue
    # If none are installed, bs4 will raise; let it propagate
    return BeautifulSoup(html, "html.parser")


def parse_serp_html(html: str, start_pos: int) -> List[SerpItem]:
    """Extract organic results from Bing SERP HTML."""
    soup = _soup_with_fallback(html)
    items: List[SerpItem] = []

    # Organic blocks typically look like <li class="b_algo"> with h2>a and a snippet under .b_caption p or the first <p>.
    for i, li in enumerate(soup.select("li.b_algo"), start=start_pos):
        a = li.select_one("h2 > a")
        if not a:
            continue
        title = (a.get_text(strip=True) or "").strip()
        url = a.get("href") or ""
        p = li.select_one(".b_caption p") or li.select_one("p")
        snippet = (p.get_text(" ", strip=True) if p else "").strip()
        items.append(SerpItem(position=i, title=title, url=url, snippet=snippet))

    return items


def fetch_bing_page(
    session: requests.Session,
    query: str,
    first: int = 1,
    count: int = 10,
    cc: str = "UA",
    setlang: str = "uk",
    timeout: int = 20,
) -> List[SerpItem]:
    """Download one results page and return parsed items."""
    params = {
        "q": query,
        "count": count,   # 10, 15, 20...
        "first": first,   # 1, 11, 21...
        "cc": cc,         # country code for results
        "setlang": setlang,  # interface/snippet language
    }
    r = session.get(BING_URL, params=params, timeout=timeout)
    r.raise_for_status()
    return parse_serp_html(r.text, start_pos=first)


def search_bing(
    query: str,
    pages: int = 1,
    count: int = 10,
    pause_range: Tuple[float, float] = (1.2, 2.7),
    proxy: Optional[str] = None,
    cc: str = "UA",
    setlang: str = "uk",
    timeout: int = 20,
) -> List[SerpItem]:
    """Iterate over pages and return an aggregated list of results."""
    session = build_session(proxy=proxy)
    all_items: List[SerpItem] = []
    first = 1
    for _ in range(pages):
        items = fetch_bing_page(
            session, query, first=first, count=count, cc=cc, setlang=setlang, timeout=timeout
        )
        all_items.extend(items)
        time.sleep(random.uniform(*pause_range))  # polite delay
        first += count
    return all_items


def _normalize_cell(s: str) -> str:
    """Optional: collapse internal whitespace so simple viewers show one‑line cells."""
    # Convert tabs/newlines/multiple spaces to a single space
    return " ".join((s or "").split())


def save_csv(
    items: List[SerpItem],
    path: str,
    excel_friendly: bool = False,
    normalize: bool = False,
    delimiter: str = ",",
) -> int:
    """
Write results to CSV.
— excel_friendly=True -> write UTF‑8 with BOM (utf‑8‑sig) so Excel auto‑detects Unicode.
— normalize=True -> collapse whitespace inside string fields.
— delimiter -> change if your consumer expects ';', etc.
Returns the number of rows written (excluding header).

    """
    p = pathlib.Path(path)
    p.parent.mkdir(parents=True, exist_ok=True)

    encoding = "utf-8-sig" if excel_friendly else "utf-8"

    # newline='' is required so Python's csv handles line endings correctly on all platforms
    with p.open("w", newline="", encoding=encoding) as f:
        writer = csv.DictWriter(
            f,
            fieldnames=["position", "title", "url", "snippet"],
            delimiter=delimiter,
            quoting=csv.QUOTE_MINIMAL,
        )
        writer.writeheader()
        for it in items:
            row = dataclasses.asdict(it)
            if normalize:
                row = {k: _normalize_cell(v) if isinstance(v, str) else v for k, v in row.items()}
            writer.writerow(row)
    return len(items)


def main() -> int:
    ap = argparse.ArgumentParser(description="Bing SERP scraper (Requests + BS4)")
    ap.add_argument("-q", "--query", required=True, help="Search query")
    ap.add_argument("--pages", type=int, default=1, help="Number of pages (x count)")
    ap.add_argument("--count", type=int, default=10, help="Results per page")
    ap.add_argument("--cc", default="UA", help="Country code for results (cc)")
    ap.add_argument("--setlang", default="uk", help="Interface/snippet language (setlang)")
    ap.add_argument("--proxy", help="Proxy, e.g. http://user:pass@host:port")
    ap.add_argument("--csv", help="Path to CSV to save results")
    ap.add_argument(
        "--excel-friendly",
        action="store_true",
        help="Add BOM (UTF‑8‑SIG) so Excel opens the file correctly",
    )
    ap.add_argument(
        "--normalize-cells",
        action="store_true",
        help="Remove line breaks and extra spaces in cells",
    )
    ap.add_argument(
        "--delimiter",
        default=",",
        help="CSV delimiter (default ','); e.g.: ';'",
    )
    args = ap.parse_args()

    try:
        items = search_bing(
            args.query,
            pages=args.pages,
            count=args.count,
            proxy=args.proxy,
            cc=args.cc,
            setlang=args.setlang,
        )
    except requests.HTTPError as e:
        print(f"[ERROR] HTTP error: {e}", file=sys.stderr)
        return 2
    except requests.RequestException as e:
        print(f"[ERROR] Network error: {e}", file=sys.stderr)
        return 2

    if args.csv:
        try:
            n = save_csv(
                items,
                args.csv,
                excel_friendly=args.excel_friendly,
                normalize=args.normalize_cells,
                delimiter=args.delimiter,
            )
            print(f"Saved {n} rows to {args.csv}")
        except OSError as e:
            print(f"[ERROR] Could not write CSV to {args.csv}: {e}", file=sys.stderr)
            return 3
    else:
        for it in items:
            print(f"{it.position:>2}. {it.title} -- {it.url}")
            if it.snippet:
                print("   ", it.snippet[:180])

    return 0


if __name__ == "__main__":
    sys.exit(main())

Example usage with extra parameters and a proxy:

python bing_scraper.py -q "Python web scraping" --pages 3 --csv out.csv \
  --proxy "http://username:password@proxy:port"

What the script does:

  1. Sends GET requests to Bing with controlled parameters (q, count, first) and locale settings (cc, setlang).
  2. Overrides User-Agent and adds Accept-Language for more stable snippets.
  3. Parses HTML via BeautifulSoup(..., "lxml"), locates result cards li.b_algo, and extracts title, url, and snippet. The .select() CSS selectors in BS4 are a standard, flexible approach.
  4. Supports an optional proxy. For Requests, the correct proxy format is a protocol→URL mapping.

Stability tips:

  • Add pauses (randomize intervals between requests).
  • Rotate User-Agent (dynamically or from your list). Requests shows how to set headers correctly—we do this in the working example.
  • Use proxy infrastructure/IP rotation when needed to scale effectively within platform limitations.
  • Keep overall request volumes reasonable and check responses for CAPTCHA prompts.
  • For complex scenarios, consider managed SERP APIs (Apify, etc.) that include antibot infrastructure.

Where to read more about the tools

Tip: If you need proxy infrastructure for more stable data collection, check out the best proxies for Bing.

How to Avoid Blocks When Scraping Bing

Key principles to ensure your scraper doesn’t “die” during its first cycle:

  • Add delays (randomize intervals between requests).
  • Rotate your User-Agent (dynamically or from your own list); the correct way to set headers in requests is described in the documentation — we use the same approach in our working example.
  • Use proxies or IP rotation (respecting the service’s terms of use).
  • Limit the overall number of requests and monitor responses for CAPTCHA prompts.
  • For complex tasks, consider managed SERP APIs (Apify, etc.) with built-in antibot infrastructure.

Conclusion

Scraping Bing is helpful when you want to expand research beyond Google, gather additional donor domains, track alternative SERP features, and obtain an independent view of the landscape. For stable and “official” integration, Microsoft promotes Grounding with Bing Search in Azure AI Agents; it’s safer from a terms‑of‑service standpoint but doesn’t return raw JSON SERP data. If your task is to extract structured results, choose direct HTML parsing via Requests/BS4 or Selenium, or use a specialized SERP API. Pick the tool for the job: quick HTML parsing for prototypes, agents for LLM‑grounded answers, and SERP APIs for larger‑scale collection.

Comments:

0 comments