In a nutshell, proxies are servers that serve as an outer layer connecting to network services. They serve as the middle-men for a computer and the site from which a user wants to extract information from. The user's request, instead of being directly sent to the relevant site, is first directed to the proxy which then sends it to the end server. Hence, the proxy server's IP address is the one that the site saves instead of the user's.
So, why use a proxy?
For instance, if there is a need to gather information from a site without the fear of restrictions or blocks. A proxy helps in masking the real user’s IP address changing their true digital identity.
A proxy from a specific country where there are no restrictions can be used. This is because, in certain countries, access to some programs and websites is limited or blocked.
If a large number of requests are sent to a particular site, chances are the system will be loaded. Consequently, there is a huge risk of being blocked. Using multiple proxies at the same time allows users to evenly distribute the requests to the particular site, hence avoiding blockage.
Requests is the library for sending HTTP requests using Python. It makes it simple to do GET and POST requests. Using Python Requests with no proxy can send HTTP requests, but it does not provide anonymity or bypass restrictions.
To install requests, simply enter the command below in the terminal:
pip install requests
To verify that the required library is properly set up, you need to open the Python development environment and run the command:
import requests
print(requests.__version__)
If everything was done properly, the script will return the version number.
Once we install the requests library, we can start executing HTTP requests. However, to incorporate a proxy, certain configurations in Python must be done.
Let us examine the steps to follow to configure a proxy correctly for requests in Python. We'll consider Python Requests with SOCKS proxies configurations separately.
Using Python to set up a proxy is very easy for users using the requests library. It requires that the proxy server address be passed in a dictionary and then used when making HTTP requests.
proxies = {
"http": "http://your-proxy-ip:port",
"https": "http://your-proxy-ip:port",
}
response = requests.get("http://example.com", proxies=proxies)
When using Python Requests, proxy authentication is easy to set up. Let's take a closer look at the details.
proxies = {
"http": "http://username:password@your-proxy-ip:port",
"https": "http://username:password@your-proxy-ip:port",
}
response = requests.get("http://example.com", proxies=proxies)
The two new fields that need to be filled in are:
For users who require a higher level of anonymity, or if you are working with heavily restricted sites, standard HTTP proxies might not work. In this case, SOCKS proxies may be better.
To enable SOCKS proxy support, an additional library needs to be installed using the command listed below:
pip install requests[socks]
Once it has been installed, Python Requests allow you to set up a SOCKS proxy that can be used as shown in the example below.
import requests
proxies = {
"http": "socks5h://your-proxy-ip:port",
"https": "socks5h://your-proxy-ip:port",
}
response = requests.get("http://example.com", proxies=proxies)
If the proxy server needs authentication, include them as shown below.
proxies = {
"http": "socks5h://username:password@your-proxy-ip:port",
"https": "socks5h://username:password@your-proxy-ip:port",
}
When making many requests through a single proxy, instead of adding it to each request, it becomes much more efficient to use Python Requests sessions with proxy.
This method of using proxies helps in retaining settings across all sessions. It also simplifies the code, improves performance, and most importantly, makes the implementation much easier.
It is very easy to create a session, just type in requests.Session().
Consider the following example:
import requests
# Creating a session
session = requests.Session()
session.proxies = {
"http": "http://username:password@your-proxy-ip:port",
"https": "http://username:password@your-proxy-ip:port",
}
# Request through the session
response = session.get("http://example.com")
All of the Python requests within this session will use the set proxy by default without any additional configurations.
When you are actively engaging with a website either for scraping purposes or for automation, using the same proxy over and over again can get your account blocked. This is manageable by rotating proxies given you have a few different proxy servers.
The following example shows the implementation of a loop that rotates proxies for each interaction with the web page:
import requests
import random
proxies_list = [
"http://username:password@your-proxy-ip[1]:port",
"http://username:password@your-proxy-ip[2]:port",
"http://username:password@your-proxy-ip[3]:port",
]
session = requests.Session()
for _ in range(5):
proxy = random.choice(proxies_list) # Randomly choose a proxy
session.proxies = {"http": proxy, "https": proxy}
response = session.get("http://example.com")
print(f"Used proxy: {proxy}")
print(response.status_code)
To sum up, Here are some suggestions that might help you manage proxies better:
In this article, we’ve touched on what proxy servers are, how to use proxy in Python Requests properly, and manage proxies through sessions to simplify the code. The example also shows usage of both SOCKS and regular proxies, how authentication is handled, and proxy management.
Comments: 0