scrapy-zyte-smartproxy 2.4 documentation
scrapy-zyte-smartproxy is a Scrapy downloader middleware to use one of Zyte’s proxy services: either the proxy mode of Zyte API or Zyte Smart Proxy Manager (formerly Crawlera).
Configuration
Add the downloader middleware to your
DOWNLOADER_MIDDLEWARES
Scrapy setting:settings.pyDOWNLOADER_MIDDLEWARES = { ... 'scrapy_zyte_smartproxy.ZyteSmartProxyMiddleware': 610 }
Enable the middleware and configure your API key, either through Scrapy settings:
settings.pyZYTE_SMARTPROXY_ENABLED = True ZYTE_SMARTPROXY_APIKEY = 'apikey'
Or through spider attributes:
class MySpider(scrapy.Spider): zyte_smartproxy_enabled = True zyte_smartproxy_apikey = 'apikey'
Set the
ZYTE_SMARTPROXY_URL
Scrapy setting as needed:To use the proxy mode of Zyte API, set it to
http://api.zyte.com:8011
:settings.pyZYTE_SMARTPROXY_URL = "http://api.zyte.com:8011"
Tip
This URL is logged, so that you can tell which value was used from crawl logs.
To use the default Zyte Smart Proxy Manager endpoint, leave it unset.
To use a custom Zyte Smart Proxy Manager endpoint, in case you have a dedicated or private instance, set it to your custom endpoint. For example:
settings.pyZYTE_SMARTPROXY_URL = "http://myinstance.zyte.com:8011"
Usage
Once the downloader middleware is properly configured, every request goes through the configured Zyte proxy service.
Although the plugin configuration only allows defining a single proxy endpoint and API key, it is possible to override them for specific requests, so that you can use different combinations for different requests within the same spider.
To override which combination of endpoint and API key is used for a given
request, set proxy
in the request metadata to a URL indicating both the
target endpoint and the API key to use. For example:
scrapy.Request( "https://topscrape.com", meta={ "proxy": "http://YOUR_API_KEY@api.zyte.com:8011", ... }, )
To disable proxying altogether for a given request, set dont_proxy
to
True
on the request metadata:
scrapy.Request( "https://topscrape.com", meta={ "dont_proxy": True, ... }, )
You can set Zyte API proxy headers or Zyte Smart Proxy Manager headers as
regular Scrapy headers, e.g. using the headers
parameter of Request
or using the DEFAULT_REQUEST_HEADERS setting. For example:
scrapy.Request( "https://topscrape.com", headers={ "Zyte-Geolocation": "FR", ... }, )
For information about proxy-specific header processing, see Headers.
See also Settings for the complete list of settings that this downloader middleware supports.