How to Implement Request Retries in Python

Comments: 0

Web scraping is an effective method for extracting data from the web. Many developers prefer to use the Python requests library to carry out web scraping projects, as it’s simple and effective. However, great as it is, the request library has its limitations. One typical problem we may encounter in web scraping is failed requests, which often lead to unstable data extraction. In this article, we will go through the process of implementing request retries in Python so you can handle the HTTP errors and keep your web scraping scripts stable and reliable.

Understanding Python Requests Retry Logic

Retrying failed requests means automatically trying an HTTP request again when it fails due to temporary problems. You’ll learn how this prevents data loss or interruptions in your programs caused by brief network glitches or server issues.

Errors and When to Retry

Not all errors deserve retries. Retry requests Python mainly benefits from temporary errors.

  • Retryable Errors: Timeouts, connection resets, or server-side errors (HTTP 500, 502, 503, 504). These issues often resolve on their own after a short wait.
  • Non-Retryable Errors: Avoid retrying non-retryable errors such as 404 (Not Found) or authentication failures (401, 403). Repeating requests with these errors wastes resources and risks your IP being blocked for suspicious behavior.

Exponential Backoff Delay

To manage retry timing, Python requests retry backoff uses exponential backoff delays. This means waiting longer between each retry to reduce overload on the server and avoid rapid repeated requests.

The backoff formula is:

backoff_factor * (2 ** (retry_count - 1))

Here are practical examples using different backoff factors:

  • backoff factor 2: retry delays 2, 4, 8 seconds
  • backoff factor 3: retry delays 3, 6, 12 seconds
  • backoff factor 10: retry delays 10, 20, 40 seconds

This gradual increase protects the server and improves your success chances.

Practical advice for retry requests Python:

  • limit retries between 3 and 5 attempts;
  • use exponential backoff delays to avoid hitting servers too rapidly;
  • avoid retrying permanent errors to reduce block risks;
  • monitor retry outcomes to adjust your strategy.

Getting Started with the Requests Library

Let’s set up our environment first. Make sure you have Python installed and any IDE of your choice. Then install the requests library if you don’t have it already.

pip install requests

Once installed, let's send a request to example.com using Python's requests module. Here's a simple function that does just that:

import requests

def send_request(url):
    """
    Sends an HTTP GET request to the specified URL and prints the response status code.
    
    Parameters:
        url (str): The URL to send the request to.
    """
    response = requests.get(url)
    print('Response Status Code: ', response.status_code)

send_request('https://example.com')

The code output is shown below:

How to implement request retries in Python.png

Let's take a closer look at HTTP status codes to understand them better.

Understanding HTTP Status Codes

The server responds to an HTTP request with a status code indicating the request's outcome. Here's a quick rundown:

  • 1xx (Informational): The request was received and continues to be processed.
  • 2xx (Success): The request was received, understood, and accepted.
    • 200 OK: The request was successful. This is the green light of HTTP status codes.
  • 3xx (Redirection): Further action is needed to complete the request.
  • 4xx (Client Error): There was an error with the request, often due to something on the client-side.
  • 5xx (Server Error): The server failed to fulfill a valid request due to an error on its end.
    • 500 Internal Server Error: The server was unable to complete the request. This indicates that the server encountered an unexpected condition that prevented it from fulfilling the request. This is the HTTP status code equivalent of the red traffic light.
    • 504 Gateway Timeout: The server didn’t receive a response from the upstream server in time. This is the HTTP status code equivalent of the waiting room timeout traffic light.

In our example, the status code 200 means the request to https://example.com was completed. It's the server's way of saying, "Everything's good here, your request was a success".

These status codes can also play a role in bot detection and indicate when access is restricted due to bot-like behavior.

HTTP error codes

Below is a quick rundown of HTTP error codes that mainly occur due to bot detection and authentication issues.

  • 429 too many requests: this status code indicates that the user has sent too many requests in a given time (“rate limiting”). It’s a common response when bots exceed predefined request limits.
  • 403 forbidden: this code is returned when the server refuses to fulfill the request. This can occur if the server suspects the request is coming from a bot, based on User-Agent or other criteria.
  • 401 unauthorized: this status might be used if access requires authentication that the bot does not have.
  • 503 service unavailable: sometimes used to indicate that the server is temporarily unable to handle the request, which might happen during automated traffic spikes.

Implementing Retry Mechanism in Python

Let’s now write a simple retry mechanism in Python to make HTTP GET requests with the requests library. There are times when network requests fail because of some network problem or server overload. So if our request fails, we should retry these requests.

Basic Retry Mechanism

The function send_request_with_basic_retry_mechanism makes HTTP GET requests to a given URL with a basic retry mechanism in place, which would only retry if a network or request exception like a connection error is encountered. It would retry the request max_retries times maximum. If all tries fail with such an exception, it raises the last encountered exception.

import requests
import time

def send_request_with_basic_retry_mechanism(url, max_retries=2):
    """
    Sends an HTTP GET request to a URL with a basic retry mechanism.
    
    Parameters:
        url (str): The URL to send the request to.
        max_retries (int): The maximum number of times to retry the request.

    Raises:
        requests.RequestException: Raises the last exception if all retries fail.

    """
    for attempt in range(max_retries):
        try:
            response = requests.get(url)
            print('Response status: ', response.status_code)
            break  # Exit loop if request successful
        except requests.RequestException as error:
            print(f"Attempt {attempt+1} failed:", error)
            if attempt < max_retries - 1:
                print(f"Retrying...")
                time.sleep(delay)  # Wait before retrying
            else:
                print("Max retries exceeded.")
                # Re-raise the last exception if max retries reached
                raise
                send_request_with_basic_retry_mechanism('https://example.com')

Advanced Retry Mechanism

Let’s now adapt the basic retry mechanism to handle scenarios where the website we’re trying to scrape implements bot detection mechanisms that may result in blocking. To address such scenarios, we need to retry the request diligently multiple times, as they may not be just bot detection blocks but also could be because of network or server problems.

Code Implementation

The following function, send_request_with_advance_retry_mechanism, sends an HTTP GET request to the provided URL with optional retry attempts and retry delay.

import requests
import time

def send_request_with_advance_retry_mechanism(url, max_retries=3, delay=1):
    """
    Sends an HTTP GET request to the specified URL with an advanced retry mechanism.
    
    Parameters:
        url (str): The URL to send the request to.
        max_retries (int): The maximum number of times to retry the request. Default is 3.
        delay (int): The delay (in seconds) between retries. Default is 1.

    Raises:
        requests.RequestException: Raises the last exception if all retries fail.
    """
    for attempt in range(max_retries):
        try:
            response = requests.get(url)
            # Raise an exception for 4xx or 5xx status codes
            response.raise_for_status()
            print('Response Status Code:', response.status_code)
        except requests.RequestException as e:
            # Print error message and attempt number if the request fails
            print(f"Attempt {attempt+1} failed:", e)
            if attempt < max_retries - 1:
                # Print the retry message and wait before retrying
                print(f"Retrying in {delay} seconds...")
                time.sleep(delay)
            else:
                # If max retries exceeded, print message and re-raise exception
                print("Max retries exceeded.")
                raise

# Example usage
send_request_with_advance_retry_mechanism('https://httpbin.org/status/404')

Parameter Discussion

The function implements the retry logic as follows:

  • It tries to send the request multiple times for the specified number of attempts (max_retries).
  • It prints the response status code if the request successfully gets the response.
  • If it encounters a requests.RequestException (including connection errors or HTTP 4xx/5xx status codes), it prints the error message and retries it.
  • If the request fails even after the specified number of retry attempts, it raises the last encountered exception.

The delay parameter is important as it avoids bombarding the server with multiple requests at a close interval. Instead, it waits for the server to have enough time to process the request, making the server think that a human and not a bot is making the requests. So, the retry mechanism should be delayed to avoid server overload or slow server response, which may trigger anti-bot mechanisms.

Drawbacks of this implementation

  • All status codes belonging to the 4xx and 5xx ranges are retried. However, requests resulting in a 404 (Not Found) status code do not need to be retried.
  • Some bot detection services may respond with a status code of 200 (OK), but the response content may differ. This situation is not handled in the current implementation. Implementing content length validation could address this issue.

Here's the corrected code:

import requests
import time

def send_request_with_advance_retry_mechanism(url, max_retries=3, delay=1, min_content_length=10):
    """
    Sends an HTTP GET request to the specified URL with an advanced retry mechanism.

    Parameters:
        url (str): The URL to send the request to.
        max_retries (int): The maximum number of times to retry the request. The default is 3.
        delay (int): The delay (in seconds) between retries. Default is 1.
        min_content_length (int): The minimum length of response content to consider valid. The default is 10.

    Raises:
        requests.RequestException: Raises the last exception if all retries fail.
    """
    for attempt in range(max_retries):
        try:
            response = requests.get(url)
            # Raise an exception for 4xx or 5xx status codes
            response.raise_for_status()
            
            # Check if response status code is 404
            if response.status_code == 404:
                print("404 Error: Not Found")
                break  # Exit loop for 404 errors
            
            # Check if length of the response text is less than the specified minimum content length
            if len(response.text) < min_content_length:
                print("Response text length is less than specified minimum. Retrying...")
                time.sleep(delay)
                continue  # Retry the request
            
            print('Response Status Code:', response.status_code)
            # If conditions are met, break out of the loop
            break
            
        except requests.RequestException as e:
            print(f"Attempt {attempt+1} failed:", e)
            if attempt < max_retries - 1:
                print(f"Retrying in {delay} seconds...")
                time.sleep(delay)
            else:
                print("Max retries exceeded.")
                # Re-raise the last exception if max retries reached
                raise

# Example usage
send_request_with_advance_retry_mechanism('https://httpbin.org/status/404')

Using HTTPAdapter and urllib3 Retry with Requests

You’ll learn how to implement Python requests retry example using Python requests library combined with urllib3’s Retry and HTTPAdapter tools. This method gives you a clean, efficient retry strategy.

Implementation steps:

  1. Define a Retry strategy object specifying:
    • Total number of retries (e.g., total=5).
    • HTTP status codes to retry (status_forcelist), such as 500, 502, 503, 504.
    • Backoff factor for exponential delays (e.g., backoff_factor=1).
  2. Create an HTTPAdapter instance with max_retries set to the retry strategy.
  3. Mount this adapter to a requests.Session for both 'http://' and 'https://' URLs.

Example code snippet:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry


retry_strategy = Retry(
total=5,
status_forcelist=[500, 502, 503, 504],
backoff_factor=1
)


adapter = HTTPAdapter(max_retries=retry_strategy)
session = requests.Session()
session.mount("http://", adapter)
session.mount("https://", adapter)


response = session.get("https://example.com")
print(response.status_code)

Benefits of this approach:

  • reuse sessions with built-in retry handling;
  • moderate complexity, easy to integrate;
  • automatically delays retries using Python requests retry backoff;
  • handles connection errors and server responses smoothly.

This makes it a reliable solution for managing unstable network conditions or flaky servers.

Custom Retry Wrapper Function

You’ll build your own retry function wrapping requests.get to control retry behavior manually. This gives you full flexibility over retries and error handling.

Here’s what your function will do:

  • Accept URL, retry limit, status codes to retry, backoff factor, plus extra requests parameters.
  • Loop through attempts, making requests.
  • Check responses for retryable status codes.
  • On failure, wait using exponential backoff: $backoff\_factor \times (2^{retry\_count - 1})$.
  • Catch exceptions like ConnectionError and retry gracefully.

Example code:

import time
import requests


def retry_requests_python(url, total_retries=3, status_forcelist=None, backoff_factor=1, **kwargs):
    if status_forcelist is None:
        status_forcelist = [500, 502, 503, 504]


    for attempt in range(1, total_retries + 1):
        try:
            response = requests.get(url, **kwargs)
            if response.status_code not in status_forcelist:
                return response
            else:
                if attempt == total_retries:
                    return response
                sleep_time = backoff_factor * (2 ** (attempt - 1))
                time.sleep(sleep_time)
        except requests.ConnectionError:
            if attempt == total_retries:
                raise
            sleep_time = backoff_factor * (2 ** (attempt - 1))
            time.sleep(sleep_time)

Using this custom function, you control every aspect of Python requests retry logic. You can tune delays, retry codes, and handle exceptions inline.

Enhancing Custom Retries with Proxies

To further improve retry success, integrate Proxy-Seller proxies.

  • Reliability: Proxy-Seller offers fast, reliable SOCKS5 and HTTPS proxies to reduce connection errors and blocks.
  • Diversity: Their large proxy pool spans 220+ countries and over 400 networks, enabling diverse IP rotation.
  • Performance: High-speed connections up to 1 Gbps support rapid retries without bottlenecks.
  • Integration: Configure your retry function to route requests through Proxy-Seller proxies for stronger reliability – ideal for web scraping or automation.
  • Support: 24/7 Proxy-Seller support and API access enable seamless proxy management within your custom retry setup.

This combo prevents IP bans, increases scraping efficiency, and ensures smoother retry handling. By combining your manual retry wrapper with Proxy-Seller’s proxy services, you build a robust, flexible Python requests retry solution ready for challenging network environments.

Handling Specific HTTP Errors with Proxies

For certain errors like 429 Too Many Requests, using rotating proxies can help distribute your requests and avoid rate limiting.

The code below implements an advanced retry strategy along with the use of proxies. This way, we can implement a Python requests retry mechanism. Using high-quality web scraping proxies is also important. These proxies should have a good algorithm for proxy rotation and a reliable pool.

import requests
import time

def send_request_with_advance_retry_mechanism(url, max_retries=3, delay=1, min_content_length=10):
    """
    Sends an HTTP GET request to the specified URL with an advanced retry mechanism.

    Parameters:
        url (str): The URL to send the request to.
        max_retries (int): The maximum number of times to retry the request. Default is 3.
        delay (int): The delay (in seconds) between retries. The default is 1.
   
    Raises:
        requests.RequestException: Raises the last exception if all retries fail.
    """
    
    proxies = {
        "http": "http://USER:PASS@HOST:PORT",
        "https": "https://USER:PASS@HOST:PORT"
    }
    
    for attempt in range(max_retries):
        try:
            response = requests.get(url, proxies=proxies, verify=False)
            # Raise an exception for 4xx or 5xx status codes
            response.raise_for_status()
            
            # Check if the response status code is 404
            if response.status_code == 404:
                print("404 Error: Not Found")
                break  # Exit loop for 404 errors
            
            # Check if the length of the response text is less than 10 characters
            if len(response.text) < min_content_length:
                print("Response text length is less than 10 characters. Retrying...")
                time.sleep(delay)
                continue  # Retry the request
            
            print('Response Status Code:', response.status_code)
            # If conditions are met, break out of the loop
            break
            
        except requests.RequestException as e:
            print(f"Attempt {attempt+1} failed:", e)
            if attempt < max_retries - 1:
                print(f"Retrying in {delay} seconds...")
                time.sleep(delay)
            else:
                print("Max retries exceeded.")
                # Re-raise the last exception if max retries reached
                raise

send_request_with_advance_retry_mechanism('https://httpbin.org/status/404')

Wrapping Up

Request retries in Python are crucial for effective web scraping. The methods we've discussed to manage retries can help prevent blocking and enhance the efficiency and reliability of data collection. Implementing these techniques will make your web scraping scripts more robust and less susceptible to detection by bot protection systems.

Comments:

0 comments