构建简单加密货币价格跟踪器指南

20.09.2024

评论: 0

喜歡:

文章的内容::

步骤 1：导入图书馆
步骤 2：设置代理
步骤 3：轮流代理
步骤 4：获取并解析加密数据
步骤 5：了解网站结构

提取硬币名称
提取股票代码
提取价格
提取百分比变化

第 6 步：将数据导出为 CSV
步骤 7：运行跟踪器
完整代码
成果

由于加密货币的高波动性，跟踪流行加密货币的价格数据可能具有挑战性。在交易加密货币时，进行全面研究并准备好抓住获利机会至关重要。获取准确的定价数据有时很困难。API 通常用于此目的，但免费订阅往往有其局限性。

我们将探讨如何使用 Python 定期搜索排名前 150 位加密货币的当前价格。我们的加密货币价格跟踪器将收集以下数据：

硬币名称；
Ticker；
价格；
24 小时价格变化百分比。

步骤 1：导入图书馆

Python 脚本的第一步是导入必要的库。我们将使用 `requests` 和 `BeautifulSoup` 库分别发送请求和从 HTML 文件中提取数据。

import requests
from bs4 import BeautifulSoup
import csv
import time
import random

我们还将使用 `csv` 进行 CSV 文件操作，使用 `time` 和 `random` 分别控制价格更新的频率和代理的轮换。

步骤 2：设置代理

在没有高级代理的情况下发送请求时，可能会遇到 "拒绝访问 "的响应。

您可以这样设置代理：

proxy = {
 "http": "http://Your_proxy_IP_Address:Your_proxy_port",
}
html = requests.get(url, proxies=proxy)

对于经过验证的代理，请使用以下格式：

proxy = {
 "http": "http://username:password@Your_proxy_IP_Address:Your_proxy_port",
}
html = requests.get(url, proxies=proxy)

切记用实际代理地址替换 "Your_proxy_IP_Address "和 "Your_proxy_port"。此外，请将 "username "和 "password "的值替换为您的凭据。

步骤 3：轮流代理

轮换代理是成功搜刮现代网站的一项非常重要的技术，因为当代理检测到来自同一 IP 地址的多个请求时，通常会阻止或限制机器人和搜刮程序的访问。要设置代理轮换，请导入随机库。

创建轮换代理列表：

# 代理人名单
proxies = [ 
 "username:password@Your_proxy_IP_Address:Your_proxy_port1",
 "username:password@Your_proxy_IP_Address:Your_proxy_port2",
 "username:password@Your_proxy_IP_Address:Your_proxy_port3",
 "username:password@Your_proxy_IP_Address:Your_proxy_port4",
 "username:password@Your_proxy_IP_Address:Your_proxy_port5",
]

接下来，我们定义了一个 get_proxy() 函数，为每个请求从列表中随机选择一个代理。

# 方法来旋转代理
def get_proxy(): 
  # 从列表中随机选择一个代理
  proxy = random.choice(proxies) 
  return {
            "http": f'http://{proxy}',
            "https": f'http://{proxy}'
    }

该函数返回一个字典，其中包含为 HTTP 协议选择的代理。这种设置可帮助我们以多个有机用户的身份出现在正在抓取的网站上，从而提高绕过反抓取措施的几率。

步骤 4：获取并解析加密数据

get_crypto_prices() 函数从 Coindesk 获取加密货币价格。它使用 requests.get() 函数向网站发送 GET 请求，并将我们的旋转代理作为参数传递。我们将响应文本和解析器 "html.parser "传递给 BeautifulSoup 构造函数。

def get_crypto_prices():
    url = "https://crypto.com/price"
    html = requests.get(url, proxies=get_proxy())
    soup = BeautifulSoup(html.text, "html.parser")

步骤 5：了解网站结构

在开始数据提取之前，我们需要了解网站结构。我们可以使用浏览器的 "开发工具 "来检查网页的 HTML。要访问开发工具，可以右击网页并选择 "检查"。

然后，我们使用 BeautifulSoup 的 find_all() 函数和 CSS 选择器 "tr", class_='css-1cxc880'找到页面上的所有价格容器，并提取每个容器的币名、代码、价格和 24 小时百分比变化。这些数据存储在字典中，然后附加到价格列表中。

提取硬币名称

在这里，我们使用 row.find('p',class_='css-rkws3')来查找具有 "css-rkws3 "类的 "p "元素。然后，我们提取文本并将其存储到 "name "变量中。

coin_name_tag = row.find('p', class_='css-rkws3')
name = coin_name_tag.get_text() if coin_name_tag else "no name entry"

提取股票代码

同样，我们使用 row.find("span",class_="css-1jj7b1a")来查找具有 "css-1jj7b1a "类的 span 元素。get_text() 方法将提取文本内容，为我们提供股票代码。

coin_ticker_tag = row.find('span', class_='css-1jj7b1a')
ticker = coin_ticker_tag.get_text() if coin_ticker_tag else "no ticker entry"

提取价格

我们使用 "css-b1ilzc "类定位 "div "元素。然后剥离文本内容并将其分配给价格变量。我们使用条件语句来处理元素不存在的情况。

coin_price_tag = row.find('div', class_='css-b1ilzc')
price = coin_price_tag.text.strip() if coin_price_tag else "no price entry"

提取百分比变化

同样，我们找到类名为 "css-yyku61 "的 "p "元素，以提取百分比变化。文本内容会被剥离，一个条件语句会处理可能出现的缺失。

coin_percentage_tag = row.find('p', class_='css-yyku61')
percentage = coin_percentage_tag.text.strip() if coin_percentage_tag else "no percentage entry"

将所有内容整合在一起，我们就有了这样一个 for 循环：

for row in price_rows:
        coin_name_tag = row.find('p', class_='css-rkws3')
        name = coin_name_tag.get_text() if coin_name_tag else "no name entry"

        coin_ticker_tag = row.find('span', class_='css-1jj7b1a')
        ticker = coin_ticker_tag.get_text() if coin_ticker_tag else "no ticker entry"
        
        coin_price_tag = row.find('div', class_='css-b1ilzc')
        price = coin_price_tag.text.strip() if coin_price_tag else "no price entry"

        coin_percentage_tag = row.find('p', class_='css-yyku61')
        percentage = coin_percentage_tag.text.strip() if coin_percentage_tag else "no percentage entry"
        
        prices.append({
            "Coin": name,
            "Ticker": ticker,
            "Price": price,
            "24hr-Percentage": percentage
        })
    
    return prices

第 6 步：将数据导出为 CSV

导出_to_csv()函数用于将刮擦数据导出到 CSV 文件。我们使用 CSV 库将价格列表中的数据写入指定的 CSV 文件。

 def export_to_csv(prices, filename="proxy_crypto_prices.csv"):
       with open(filename, "w", newline="") as file:
           fieldnames = ["Coin", "Ticker", "Price", "24hr-Percentage"]
           writer = csv.DictWriter(file, fieldnames=fieldnames)
           writer.writeheader()
           writer.writerows(prices)

步骤 7：运行跟踪器

在脚本的主要部分，我们调用 get_crypto_prices() 函数来获取价格，并调用 export_to_csv() 函数将价格导出到 CSV 文件。然后，我们等待 5 分钟（300）后再次更新价格。这将在一个无限循环中完成，因此价格将每 5 分钟更新一次，直到程序停止。

if __name__ == "__main__":
       while True:
           prices = get_crypto_prices()
           export_to_csv(prices)
           print("Prices updated. Waiting for the next update...")
           time.sleep(300)  # 每 5 分钟更新一次价格

完整代码

下面是完整的代码，它将整合我们所涉及的所有技术和步骤，提供一种简化的方法来构建加密货币价格跟踪器，就像我们在本项目中所做的那样。

import requests
from bs4 import BeautifulSoup
import csv
import time
import random

# 代理人名单
proxies = [
     "username:password@Your_proxy_IP_Address:Your_proxy_port1",
     "username:password@Your_proxy_IP_Address:Your_proxy_port2",
     "username:password@Your_proxy_IP_Address:Your_proxy_port3",
     "username:password@Your_proxy_IP_Address:Your_proxy_port4",
     "username:password@Your_proxy_IP_Address:Your_proxy_port5",
]

# 旋转代理的自定义方法
def get_proxy():
    # 从列表中随机选择一个代理
    proxy = random.choice(proxies)
    # 返回包含 http 协议代理的字典
    return {"http": f'http://{proxy}',
            "https": f'http://{proxy}'
          }


def get_crypto_prices():
    url = "https://crypto.com/price"
    html = requests.get(url, proxies=get_proxy())
    print(html.status_code)
    soup = BeautifulSoup(html.content, "html.parser")

    price_rows = soup.find_all('tr', class_='css-1cxc880')

    prices = []
    for row in price_rows:
        coin_name_tag = row.find('p', class_='css-rkws3')
        name = coin_name_tag.get_text() if coin_name_tag else "no name entry"

        coin_ticker_tag = row.find('span', class_='css-1jj7b1a')
        ticker = coin_ticker_tag.get_text() if coin_ticker_tag else "no ticker entry"
        
        coin_price_tag = row.find('div', class_='css-b1ilzc')
        price = coin_price_tag.text.strip() if coin_price_tag else "no price entry"

        coin_percentage_tag = row.find('p', class_='css-yyku61')
        percentage = coin_percentage_tag.text.strip() if coin_percentage_tag else "no percentage entry"
        
        prices.append({
            "Coin": name,
            "Ticker": ticker,
            "Price": price,
            "24hr-Percentage": percentage
        })
    
    return prices



def export_to_csv(prices, filename="proxy_crypto_prices.csv"):
    with open(filename, "w", newline="") as file:
        fieldnames = ["Coin", "Ticker", "Price", "24hr-Percentage"]
        writer = csv.DictWriter(file, fieldnames=fieldnames)

        writer.writeheader()
        writer.writerows(prices)


if __name__ == "__main__":
    while True:
        prices = get_crypto_prices()
        export_to_csv(prices)
        print("Prices updated. Waiting for the next update...")
        time.sleep(300)  # 每 5 分钟更新一次价格（根据需要进行调整）

成果

我们的加密货币价格跟踪器会将结果保存到一个名为 "proxy_crypto_prices.csv "的 CSV 文件中，如下所示：

Python 语法简单明了，是构建加密货币价格自动跟踪器的理想选择。这种编程语言便于添加新功能和扩展跟踪器的功能。所提供的示例演示了如何创建一个基本的刮板，它可以在指定的时间间隔自动更新加密货币汇率，通过代理收集数据，并以用户友好的格式保存数据。

0 评论

上一篇文章

下一篇文章