Changes¶
v2.3.3 (2024-02-22)¶
Fix response handling for Zyte API proxy mode. Before, a single connection issue during a request would add a 90 second delay between requests until the end of the crawl, instead of removing the delay after the first successful response.
v2.3.2 (2024-02-14)¶
Detect scenarios where the proxy
Request.meta
key has probably been
accidentally copied from an earlier response, warn about it, and fix the value.
The Zyte-Client
header is again sent when using Zyte API proxy mode,
now that Zyte API supports it.
v2.3.1 (2023-11-20)¶
Fixed Zyte API proxy mode support by removing the mapping of unsupported
headers Zyte-Client
and Zyte-No-Bancheck
.
v2.3.0 (2023-10-20)¶
Added support for the upcoming proxy mode of Zyte API.
Added a BSD-3-Clause license file.
v2.2.0 (2022-08-05)¶
Added support for Scrapy 2.6.2 and later.
Scrapy 1.4 became the minimum supported Scrapy version.
v2.1.0 (2021-06-16)¶
Use a custom logger instead of the root one
v2.0.0 (2021-05-12)¶
Following the upstream rebranding of Crawlera as Zyte Smart Proxy Manager,
scrapy-crawlera
has been renamed as scrapy-zyte-smartproxy
, with the
following backward-incompatible changes:
The repository name and Python Package Index (PyPI) name are now
scrapy-zyte-smartproxy
.Setting prefixes have switched from
CRAWLERA_
toZYTE_SMARTPROXY_
.Spider attribute prefixes and request meta key prefixes have switched from
crawlera_
tozyte_smartproxy_
.scrapy_crawlera
is nowscrapy_zyte_smartproxy
.CrawleraMiddleware
is nowZyteSmartProxyMiddleware
, and its defaulturl
is nowhttp://proxy.zyte.com:8011
.Stat prefixes have switched from
crawlera/
tozyte_smartproxy/
.The online documentation is moving to https://scrapy-zyte-smartproxy.readthedocs.io/
Note
Zyte Smart Proxy Manager headers continue to use the X-Crawlera-
prefix.
In addition to that, the
X-Crawlera-Client
header is now automatically included in all requests.
v1.7.2 (2020-12-01)¶
Use request.meta than response.meta in the middleware
v1.7.1 (2020-10-22)¶
Consider Crawlera response if contains X-Crawlera-Version header
Build the documentation in Travis CI and fail on documentation issues
Update matrix of tests
v1.7.0 (2020-04-01)¶
Added more stats to better understanding the internal states.
Log warning when using https:// protocol.
Add default http:// protocol in case of none provided, and log warning about it.
Fix duplicated request when the response is not from crawlera, this was causing an infinite loop of retries when dont_filter=True.
v1.6.0 (2019-05-27)¶
Enable crawlera on demand by setting
CRAWLERA_FORCE_ENABLE_ON_HTTP_CODES
v1.5.1 (2019-05-21)¶
Remove username and password from settings since it’s removed from crawlera.
Include affected spider in logs.
Handle situations when crawlera is restarted and reply with 407’s for a few minutes by retrying the requests with a exponential backoff system.
v1.5.0 (2019-01-23)¶
Correctly check for bans in crawlera (Jobs will not get banned on non ban 503’s).
Exponential backoff when crawlera doesn’t have proxies available.
Fix
dont_proxy=False
header disabling crawlera when it is enabled.
v1.4.0 (2018-09-20)¶
Remove X-Crawlera-* headers when Crawlera is disabled.
Introduction of DEFAULT_CRAWLERA_HEADERS settings.
v1.3.0 (2018-01-10)¶
Use CONNECT method to contact Crawlera proxy.
v1.2.4 (2017-07-04)¶
Trigger PYPI deployments after changes made to TOXENV in v1.2.3
v1.2.3 (2017-06-29)¶
Multiple documentation fixes
Test scrapy-crawlera on combinations of software used by scrapinghub stacks
v1.2.2 (2017-01-19)¶
Fix Crawlera error stats key in Python 3.
Add support for Python 3.6.
v1.2.1 (2016-10-17)¶
Fix release date in README.
v1.2.0 (2016-10-17)¶
Recommend middleware order to be
610
to run beforeRedirectMiddleware
.Change default download timeout to 190s or 3 minutes 10 seconds (instead of 1800s or 30 minutes).
Test and advertize Python 3 compatiblity.
New
crawlera/request
andcrawlera/request/method/*
stats counts.Clear Scrapy DNS cache for proxy URL in case of connection errors.
Distribute plugin as universal wheel.