scrapy-zyte-smartproxy 2.3 documentation

scrapy-zyte-smartproxy is a Scrapy downloader middleware to use one of Zyte’s proxy services: either the proxy mode of Zyte API or Zyte Smart Proxy Manager (formerly Crawlera).

Configuration

  1. Add the downloader middleware to your DOWNLOADER_MIDDLEWARES Scrapy setting:

    settings.py
    DOWNLOADER_MIDDLEWARES = {
        ...
        'scrapy_zyte_smartproxy.ZyteSmartProxyMiddleware': 610
    }
    
  2. Enable the middleware and configure your API key, either through Scrapy settings:

    settings.py
    ZYTE_SMARTPROXY_ENABLED = True
    ZYTE_SMARTPROXY_APIKEY = 'apikey'
    

    Or through spider attributes:

    class MySpider(scrapy.Spider):
        zyte_smartproxy_enabled = True
        zyte_smartproxy_apikey = 'apikey'
    
  1. Set the ZYTE_SMARTPROXY_URL Scrapy setting as needed:

    • To use the proxy mode of Zyte API, set it to http://api.zyte.com:8011:

      settings.py
          ZYTE_SMARTPROXY_URL = "http://api.zyte.com:8011"
      
    • To use the default Zyte Smart Proxy Manager endpoint, leave it unset.

    • To use a custom Zyte Smart Proxy Manager endpoint, in case you have a dedicated or private instance, set it to your custom endpoint. For example:

      settings.py
          ZYTE_SMARTPROXY_URL = "http://myinstance.zyte.com:8011"
      

Usage

Once the downloader middleware is properly configured, every request goes through the configured Zyte proxy service.

Although the plugin configuration only allows defining a single proxy endpoint and API key, it is possible to override them for specific requests, so that you can use different combinations for different requests within the same spider.

To override which combination of endpoint and API key is used for a given request, set proxy in the request metadata to a URL indicating both the target endpoint and the API key to use. For example:

scrapy.Request(
    "https://topscrape.com",
    meta={
        "proxy": "http://YOUR_API_KEY@api.zyte.com:8011",
        ...
    },
)

To disable proxying altogether for a given request, set dont_proxy to True on the request metadata:

scrapy.Request(
    "https://topscrape.com",
    meta={
        "dont_proxy": True,
        ...
    },
)

You can set Zyte API proxy headers or Zyte Smart Proxy Manager headers as regular Scrapy headers, e.g. using the headers parameter of Request or using the DEFAULT_REQUEST_HEADERS setting. For example:

scrapy.Request(
    "https://topscrape.com",
    headers={
        "Zyte-Geolocation": "FR",
        ...
    },
)

For information about proxy-specific header processing, see Headers.

See also Settings for the complete list of settings that this downloader middleware supports.