How to set up and use a proxy in Puppeteer

Comments: 0

Puppeteer is a Node.js library that enables JavaScript to control Chromium-based browsers such as Google Chrome, Microsoft Edge, Opera, and Brave. It is particularly useful for automating browser tasks such as navigating pages, interacting with interface elements, generating PDF files, taking screenshots, and performing service tests. One of Puppeteer's key features is its support for headless mode, where the browser operates without a graphical interface. This mode is optimal for web scraping as it significantly enhances the speed of data collection and analysis.

We will next explore how to set up and utilize proxies in Puppeteer, a crucial step to maximize the capabilities of this library. Utilizing proxies is beneficial for several reasons:

  • Emulating user behavior: by simulating actions from different devices and IP addresses, it becomes possible to mimic a more natural browsing experience;
  • Bypassing anti-fraud measures and Captchas: proxies can help avoid detection when making numerous requests from a single IP address over a short period, which can trigger security measures like captchas;
  • Load balancing: distributing requests across multiple servers can increase scraping speed and efficiency;
  • Overcoming geographical restrictions: proxies enable access to region-specific content by bypassing geographical blocks, allowing for the collection of localized data.

These advantages underscore the importance of integrating proxy management within Puppeteer setups to ensure successful and efficient web scraping and automation tasks.

Video guide for setting up a proxy in Puppeteer

Step-by-step proxy setup in Puppeteer using JavaScript

To add a proxy to Puppeteer and configure it for use, follow these streamlined steps:

  1. Launch your development environment, such as Microsoft Visual Studio, and select the JavaScript library.
  2. Use the following code:
    
    
    const puppeteer = require('puppeteer');
    
    async function run() {
    const browser = await puppeteer.launch({
    headless: false,
    args: ['--proxy-server=PROXY_IP:PROXY_PORT']
    });
    const page = await browser.newPage();
    
    const pageUrl = 'https://example.com/';
    
    // Adding proxy authentication
    
    await page.authenticate({ username: 'PROXY_USERNAME', password: 'PROXY_PASSWORD' });
    await page.goto(pageUrl);
    }
    
    run();
     
  3. The --proxy-server=PROXY_IP:PROXY_PORT argument configures the browser to use the specified proxy.
    • --proxy-server: this is a flag used in the command-line argument to indicate that the browser should route its network requests through a proxy server.
    • PROXY_IP: replace this with the actual IP address of the proxy server you intend to use.
    • PROXY_PORT: substitute this with the port number on which your proxy server is configured to receive connections.

    For example, if your proxy is at IP 111.111.11.11 and port 2020, then the code will look like:

    
    args: ['--proxy-server=111.111.11.11 : 2020]
    
  4. To use a private proxy server, you must provide authentication details. Insert your login credentials into the page.authenticate method. For instance, if your username is myUser and your password is myPass, update the code as follows:
    await page.authenticate({ username: 'myUser', password: 'myPass' }); 
  5. To specify a start page for the browser, modify the pageUrl variable. Replace the default URL with the one you need. For example, to set the start page to https://example.com/, the code would be:
    const pageUrl = 'https://example.com/'; await page.goto(pageUrl); 

Using a proxy in Puppeteer to route all browser traffic through a specified server can be extremely useful. It allows you to bypass geographical restrictions, enhance anonymity online, and balance the load during web scraping activities.

Comments:

0 comments