Thanks to everyone who contribute to this release!
List of contributors sorted by number of commits:
69 Daniel Graña <dangra@...>
37 Pablo Hoffman <pablo@...>
13 Mikhail Korobov <kmike84@...>
9 Alex Cepoi <alex.cepoi@...>
9 alexanderlukanin13 <alexander.lukanin.13@...>
8 Rolando Espinoza La fuente <darkrho@...>
8 Lukasz Biedrycki <lukasz.biedrycki@...>
6 Nicolas Ramirez <nramirez.uy@...>
3 Paul Tremberth <paul.tremberth@...>
2 Martin Olveyra <molveyra@...>
2 Stefan <misc@...>
2 Rolando Espinoza <darkrho@...>
2 Loren Davie <loren@...>
2 irgmedeiros <irgmedeiros@...>
1 Stefan Koch <taikano@...>
1 Stefan <cct@...>
1 scraperdragon <dragon@...>
1 Kumara Tharmalingam <ktharmal@...>
1 Francesco Piccinno <stack.box@...>
1 Marcos Campal <duendex@...>
1 Dragon Dave <dragon@...>
1 Capi Etheriel <barraponto@...>
1 cacovsky <amarquesferraz@...>
1 Berend Iwema <berend@...>
Thanks to everyone who contribute to this release. Here is a list of contributors sorted by number of commits:
130 Pablo Hoffman <pablo@...>
97 Daniel Graña <dangra@...>
20 Nicolás Ramírez <nramirez.uy@...>
13 Mikhail Korobov <kmike84@...>
12 Pedro Faustino <pedrobandim@...>
11 Steven Almeroth <sroth77@...>
5 Rolando Espinoza La fuente <darkrho@...>
4 Michal Danilak <mimino.coder@...>
4 Alex Cepoi <alex.cepoi@...>
4 Alexandr N Zamaraev (aka tonal) <tonal@...>
3 paul <paul.tremberth@...>
3 Martin Olveyra <molveyra@...>
3 Jordi Llonch <llonchj@...>
3 arijitchakraborty <myself.arijit@...>
2 Shane Evans <shane.evans@...>
2 joehillen <joehillen@...>
2 Hart <HartSimha@...>
2 Dan <ellisd23@...>
1 Zuhao Wan <wanzuhao@...>
1 whodatninja <blake@...>
1 vkrest <v.krestiannykov@...>
1 tpeng <pengtaoo@...>
1 Tom Mortimer-Jones <tom@...>
1 Rocio Aramberri <roschegel@...>
1 Pedro <pedro@...>
1 notsobad <wangxiaohugg@...>
1 Natan L <kuyanatan.nlao@...>
1 Mark Grey <mark.grey@...>
1 Luan <luanpab@...>
1 Libor Nenadál <libor.nenadal@...>
1 Juan M Uys <opyate@...>
1 Jonas Brunsgaard <jonas.brunsgaard@...>
1 Ilya Baryshev <baryshev@...>
1 Hasnain Lakhani <m.hasnain.lakhani@...>
1 Emanuel Schorsch <emschorsch@...>
1 Chris Tilden <chris.tilden@...>
1 Capi Etheriel <barraponto@...>
1 cacovsky <amarquesferraz@...>
1 Berend Iwema <berend@...>
Scrapy changes:
Support for AJAX crawleable urls
New persistent scheduler that stores requests on disk, allowing to suspend and resume crawls (r2737)
added -o option to scrapy crawl, a shortcut for dumping scraped items into a file (or standard output using -)
Added support for passing custom settings to Scrapyd schedule.json api (r2779, r2783)
New ChunkedTransferMiddleware (enabled by default) to support chunked transfer encoding (r2769)
Add boto 2.0 support for S3 downloader handler (r2763)
In request errbacks, offending requests are now received in failure.request attribute (r2738)
check the documentation for more details
Added builtin caching DNS resolver (r2728)
Moved Amazon AWS-related components/extensions (SQS spider queue, SimpleDB stats collector) to a separate project: [scaws](https://github.com/scrapinghub/scaws) (r2706, r2714)
Moved spider queues to scrapyd: scrapy.spiderqueue -> scrapyd.spiderqueue (r2708)
Moved sqlite utils to scrapyd: scrapy.utils.sqlite -> scrapyd.sqlite (r2781)
Real support for returning iterators on start_requests() method. The iterator is now consumed during the crawl when the spider is getting idle (r2704)
Added REDIRECT_ENABLED setting to quickly enable/disable the redirect middleware (r2697)
Added RETRY_ENABLED setting to quickly enable/disable the retry middleware (r2694)
Added CloseSpider exception to manually close spiders (r2691)
Improved encoding detection by adding support for HTML5 meta charset declaration (r2690)
Refactored close spider behavior to wait for all downloads to finish and be processed by spiders, before closing the spider (r2688)
Added SitemapSpider (see documentation in Spiders page) (r2658)
Added LogStats extension for periodically logging basic stats (like crawled pages and scraped items) (r2657)
Make handling of gzipped responses more robust (#319, r2643). Now Scrapy will try and decompress as much as possible from a gzipped response, instead of failing with an IOError.
Simplified !MemoryDebugger extension to use stats for dumping memory debugging info (r2639)
Added new command to edit spiders: scrapy edit (r2636) and -e flag to genspider command that uses it (r2653)
Changed default representation of items to pretty-printed dicts. (r2631). This improves default logging by making log more readable in the default case, for both Scraped and Dropped lines.
Added spider_error signal (r2628)
Added COOKIES_ENABLED setting (r2625)
Stats are now dumped to Scrapy log (default value of STATS_DUMP setting has been changed to True). This is to make Scrapy users more aware of Scrapy stats and the data that is collected there.
Added support for dynamically adjusting download delay and maximum concurrent requests (r2599)
Added new DBM HTTP cache storage backend (r2576)
Added listjobs.json API to Scrapyd (r2571)
CsvItemExporter: added join_multivalued parameter (r2578)
Added namespace support to xmliter_lxml (r2552)
Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it (r2579)
Several improvements to Scrapyd and Link extractors
Removed unused function: scrapy.utils.request.request_info() (r2577)
Removed googledir project from examples/googledir. There’s now a new example project called dirbot available on github: https://github.com/scrapy/dirbot
Removed support for default field values in Scrapy items (r2616)
Removed experimental crawlspider v2 (r2632)
Removed scheduler middleware to simplify architecture. Duplicates filter is now done in the scheduler itself, using the same dupe fltering class as before (DUPEFILTER_CLASS setting) (r2640)
Removed support for passing urls to scrapy crawl command (use scrapy parse instead) (r2704)
Removed deprecated Execution Queue (r2704)
Removed (undocumented) spider context extension (from scrapy.contrib.spidercontext) (r2780)
removed CONCURRENT_SPIDERS setting (use scrapyd maxproc instead) (r2789)
Renamed attributes of core components: downloader.sites -> downloader.slots, scraper.sites -> scraper.slots (r2717, r2718)
Renamed setting CLOSESPIDER_ITEMPASSED to CLOSESPIDER_ITEMCOUNT (r2655). Backwards compatibility kept.
The numbers like #NNN reference tickets in the old issue tracker (Trac) which is no longer available.
The numbers like #NNN reference tickets in the old issue tracker (Trac) which is no longer available.
url and body attributes of Request objects are now read-only (#230)
Request.copy() and Request.replace() now also copies their callback and errback attributes (#231)
Removed UrlFilterMiddleware from scrapy.contrib (already disabled by default)
Offsite middelware doesn’t filter out any request coming from a spider that doesn’t have a allowed_domains attribute (#225)
Removed Spider Manager load() method. Now spiders are loaded in the constructor itself.
Moved module: scrapy.contrib.spidermanager to scrapy.spidermanager
Spider Manager singleton moved from scrapy.spider.spiders to the spiders` attribute of ``scrapy.project.crawler singleton.
default per-command settings are now specified in the default_settings attribute of command object class (#201)
added handles_request() class method to BaseSpider
dropped scrapy.log.exc() function (use scrapy.log.err() instead)
dropped component argument of scrapy.log.msg() function
dropped scrapy.log.log_level attribute
Added from_settings() class methods to Spider Manager, and Item Pipeline Manager
The numbers like #NNN reference tickets in the old issue tracker (Trac) which is no longer available.
The numbers like #NNN reference tickets in the old issue tracker (Trac) which is no longer available.
Changed scrapy.utils.response.get_meta_refresh() signature (r1804)
Removed deprecated scrapy.item.ScrapedItem class - use scrapy.item.Item instead (r1838)
Removed deprecated scrapy.xpath module - use scrapy.selector instead. (r1836)
Removed deprecated core.signals.domain_open signal - use core.signals.domain_opened instead (r1822)
Changed core signals domain_opened, domain_closed, domain_idle
Removed deprecated SCRAPYSETTINGS_MODULE environment variable - use SCRAPY_SETTINGS_MODULE instead (r1840)
Renamed setting: REQUESTS_PER_DOMAIN to CONCURRENT_REQUESTS_PER_SPIDER (r1830, r1844)
Renamed setting: CONCURRENT_DOMAINS to CONCURRENT_SPIDERS (r1830)
Refactored HTTP Cache middleware
HTTP Cache middleware has been heavilty refactored, retaining the same functionality except for the domain sectorization which was removed. (r1843 )
Renamed exception: DontCloseDomain to DontCloseSpider (r1859 | #120)
Renamed extension: DelayedCloseDomain to SpiderCloseDelay (r1861 | #121)
Removed obsolete scrapy.utils.markup.remove_escape_chars function - use scrapy.utils.markup.replace_escape_chars instead (r1865)
First release of Scrapy.