Overview of the Scrapoxy proxy aggregator

Comments: 0

Scrapoxy is a proxy management tool that enhances the efficiency and security of the web scraping process. It is not a scraper or proxy provider itself, but it plays a crucial role in managing proxy servers and distributing requests across them to optimize data collection efforts.

image19.png

The principle of web scraping using Scrapoxy involves three key steps:

  1. Configuring the aggregator by setting the parameters of the proxy servers that will be utilized in the data collection process;
  2. Connecting Scrapoxy to the scraper using its configuration files or connection parameters;
  3. Initiating the scraping process, during which Scrapoxy will automatically distribute requests across its proxy servers.

With Scrapoxy, you can integrate various frameworks and libraries to enhance your web scraping capabilities:

  • BeautifulSoup is a Python library designed to extract data from HTML and XML documents;
  • Scrapy is a robust and flexible web scraping framework in Python, known for its efficiency and versatility;
  • Puppeteer is a Node.js library that offers an API to control Chrome or Chromium, making it a popular choice for web scraping and automation tasks.

Next, we will delve deeper into how Scrapoxy functions and explore the features it offers.

Features of Scrapoxy

Scrapoxy enhances the capabilities of scraping software by enabling more efficient and secure data collection tasks. As a proxy aggregator, it is a powerful tool for managing proxy servers, characterized by several notable features:

Support for all proxy types

Scrapoxy supports both dynamic and static IP addresses, demonstrating its flexibility as a tool. It allows for the configuration of various types of proxies, including:

  • Datacenter IPv4/IPv6 proxies;
  • ISP proxies;
  • Residential proxies;
  • Mobile proxies.

This versatility makes Scrapoxy an excellent choice for a wide range of web scraping and traffic management tasks. Additionally, it supports various types of HTTP/HTTPS and SOCKS protocols, enabling you to customize Scrapoxy to meet the specific needs of your project effectively.

Automatic proxy rotation

Scrapoxy supports automatic proxy rotation, enhancing anonymity and reducing the risk of blocks during web scraping activities. Proxy rotation involves regularly changing the proxies in use, and distributing requests across various IP addresses to avoid detection and restrictions from target websites.

This feature not only makes traffic harder to track and less likely to be blocked but also evenly distributes the load among different proxies. The seamless implementation of automatic rotation in Scrapoxy provides a user-friendly experience, particularly valuable when managing a large pool of IP addresses.

Traffic monitoring and management

Scrapoxy provides comprehensive monitoring of incoming and outgoing traffic during web scraping tasks, offering a detailed overview of the user's session. This capability allows for close tracking of several key metrics:

  • Number of requests made during the session;
  • Number of active proxies being utilized;
  • The average number of requests handled by each proxy;
  • The current rate of data acquisition;
  • The total amount of data received and sent through the proxy servers.

All this data is continuously updated and recorded in the metrics section of Scrapoxy. This feature enables users to assess the quality and efficiency of their scraping projects using specific proxy servers and to organize the information conveniently for thorough analysis and review.

Management of blocked proxies

Scrapoxy includes a feature to monitor and automatically detect blocked proxy servers. If a proxy becomes unavailable or malfunctions, Scrapoxy will mark it as blocked. This prevents the proxy from being used again for scraping, ensuring uninterrupted data collection.

To manage blocked proxies, users have options through both the Scrapoxy web interface and the API. In the web interface, users can view a list of proxy servers and their current statuses, and manually mark a proxy as blocked if necessary. Alternatively, the Scrapoxy API allows for the automation of this process, enabling more efficient management of proxy servers.

Scrapoxy application interface

Scrapoxy provides a user-friendly visual web interface to manage its main functions. To access this interface, you first need to install Scrapoxy using either Docker or Node.js.

image9.png

Projects

This tab displays a list of all the projects that have been created. If no projects exist yet, you have the option to create one directly from this section by navigating to the settings tab. Each project entry includes basic information and allows for more detailed viewing and configuration changes.

image5.png

A project in this list can display several statuses, each indicating a different operational state:

  • OFF: the project is stopped, and the proxies that were used for it have been deleted.
  • CALM: the project is in a “sleep” state, maintaining only the minimum number of proxies specified in the project settings.
  • HOT: the project is active, with proxies currently running and operational.

    image11.png

Credentials

Once the project is set up, an account is created that includes details such as the vendor, title, and token. Accounts contain the necessary information for authentication and authorization when connecting to cloud providers. Upon entering these details, the program verifies the data for validity. After successful verification, the settings are saved, and the credentials are displayed in this tab. Here, you can see the project name, the cloud provider, and a button that allows you to access more detailed account settings.

NEW1.png

Connectors

This tab displays a list of all connectors, which are modules that enable Scrapoxy to interact with various cloud providers to create and manage proxy servers.

When setting up a connector, you need to specify:

  • Credentials as mentioned in the previous section;
  • A unique name for the connector;
  • The number of proxies that will be utilized;
  • Proxy timeout, which is the duration after which an inactive proxy is considered non-operational.

All connectors that have been added are shown in the “Connectors” section. In the central window, the following information about each connector is displayed:

  • Status;
  • Name and type;
  • Number of proxies;
  • Controls for adjusting the number of proxies;
  • Option to set as the default connector;
  • Additional settings.

    NEW2.png

Connectors can have one of three statuses: “ON”, “OFF”, and “ERROR”. Connectors can be edited as needed to update the data and verify its validity.

Proxies

This tab is highly multifunctional, showcasing a list of proxy servers along with their basic information such as name, IP address, and status, among others. Additionally, this page allows for the management of proxy servers, enabling you to delete or disable them as needed.

image18.png

In the status column, icons indicate the current state of each proxy server:

  • Starts;
  • Launched;
  • Stops;
  • Stopped;
  • Does not work.

Adjacent to this, there is an icon that represents the connection status of each proxy, showing whether it is online, offline, or has a connection error.

Coverage

When you add a list of proxy servers to Scrapoxy and utilize them at least once, the program automatically analyzes their geolocations and generates a coverage map, accessible in this section. This feature provides a visual representation along with a statistical summary, which includes:

  • The names of the cities along with the count of proxies located in each;
  • The countries and the number of proxies found in each one;
  • The names of the networks each proxy belongs to and their respective counts.

Verifying the origin and ensuring comprehensive coverage on the world map is crucial for optimizing the web scraping process.

image1.png

Metrics

This tab offers a comprehensive dashboard for monitoring the project, providing a range of indicators. The central panel is segmented into different sections displaying basic statistics on projects. On the top panel, users can choose the time period for which Scrapoxy should display analytical data. Below, the information is detailed regarding the proxy servers used in the projects:

  • Received and Sent: displays the total number of bytes received and sent by all proxies.
  • Requests: shows the number of requests made.
  • Stops: indicates the number of deletion requests.
  • Received and Sent Rates: details the speed of receiving and sending data.
  • Valid and Invalid Requests: counts the number of valid and invalid requests.
  • Proxies Created and Removed: lists the number of proxies that have been created and removed.

    image14.png

Additional information is provided for analyzing proxy servers that have been removed from the pool:

  • The average number of requests made through each proxy;
  • The average operating time of each proxy.

    image4.png

Further down, the tab features graphs displaying the volume of data sent and received, the number of requests made, and stop orders received over the selected period.

image16.png

Tasks

This tab displays all tasks that have been initiated using Scrapoxy. For each task, the following information is presented:

  • Task name;
  • Start date and time;
  • Completion date and time;
  • Task progress: how many steps are done;
  • Detail view button.

    image17.png

When you open a task, you gain access to more comprehensive details, including a description of the task and the schedule for any retry attempts. Additionally, there is an option available to stop the task if necessary.

image3.png

Users

When you access this tab, it displays a list of all users who have access to the projects. You can see each user's name and email address. From here, you have the option to remove a user from the list or add new users. It's important to note that users cannot remove themselves from a project; this action must be performed by another user with the appropriate permissions. Additionally, you can only add users who have previously logged into Scrapoxy.

image15.png

Settings

When you first connect to Scrapoxy, this tab opens, allowing you to configure the project settings. This window contains information such as:

  • Name of the project;
  • Data for proxy authentication in requests including login and password;
  • Proxy settings like rotation and minimum number of proxies in the network;
  • Additional functions such as changing the User-Agent when changing the proxy, switching project statuses, intercepting HTTPS requests, sticky cookies, and others.

After making and saving all the settings, you can create an account for the project.

image20.png

How to integrate a proxy server to Scrapoxy

To set up a proxy in Scrapoxy using Proxy-Seller, follow these steps:

  1. Log into your account on the Proxy-Seller site and navigate to the “API” section.

    image7.png

  2. Copy the API token and save it for future use.

    image10.png

  3. Open the Scrapoxy web interface and go to the “Marketplace”. Use the manual search function to find Proxy-Seller by name or type.

    image2.png

  4. Select the type of proxy you wish to use, either static or dynamic, and click “Create” to set up a new account.

    image12.png

  5. Enter the name and the token you previously saved from your account. Confirm by clicking on the “Create” button.

    image13.png

  6. Proceed to create a new connector, choosing Proxy-Seller as the provider. Once created, the connector will appear in the main list, and you can activate it from there.

    image8.png

The setup is now complete, and data parsing tasks in the Scrapoxy proxy rotator will be performed using the connected proxies.

In conclusion, Scrapoxy serves as a valuable tool for proxy management, effectively scaling and managing proxy servers for web scraping tasks. The proxy manager enhances the anonymity of requests and automates data collection efficiently. Suitable for both individual and team use, Scrapoxy is compatible with a wide range of proxy providers and is available at no cost.

Comments:

0 comments