Fixed scheduler process_spider_output() to yield requests#254
Fixed scheduler process_spider_output() to yield requests#254sibiryakov merged 3 commits intoscrapinghub:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #254 +/- ##
=======================================
Coverage 70.15% 70.15%
=======================================
Files 68 68
Lines 4715 4715
Branches 632 632
=======================================
Hits 3308 3308
Misses 1267 1267
Partials 140 140
Continue to review full report at Codecov.
|
|
Hey @voith sorry for a long absence. It looks absolutely fine. I hope your complex tests work. Ready for merge? |
|
Hi @sibiryakov, This is ready for merge. You can view that the test works by looking at the builds. |
|
thank you! 🍺 |
|
This PR broke Frontera behaviour. Now every yielded request end up in as a call to |
|
Ping @sibiryakov |
|
@isra17 I too had noticed that Well frontera should have had a test case for this. |
|
I've opened a PR to revert this change #273 |
|
Don't worry about that, this is not the kind of issue that broke vanilla Frontera in obvious manner. I didn't see it until I had a middleware with some logic specific to links_extracted. |
|
this PR fixes the problem probably #261, |
|
There is no need to revert this change, it's a step in the right direction: why should allow other middlewares to operate on objects passed Frontera middlewares. |
|
Unless I'm missing something, won't #261 end up scheduling twice the requests? |
|
@isra17 This will happen only if you have some middleware in Scrapy yielding all requests it gets. Normally, this shouldn't happen. |
|
@isra17 I spend more time looking into this, and I think you're right: we will get requests in two places: temp. queue and frontier. I'll release the fix soon. |
|
See #276 |
fixes #253

Here's a screenshot using the same code discussed here.
Nothing seems to break when testing this change manually. The only test that was failing was wrong IMO because it passed a list of requests and items and was only expecting items in return. I have modified that test to make it compatible with this patch.
I've the split this PR into three commits:
A note about the tests added:
The tests might be a little difficult to understand on the first sight. I would recommend to read the following code in order understand the tests:I have simulated the above discussed code in order to write the test.