Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added: A DeltaFetchPseudoItem for storing requests with no items yielded #19

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

starrify
Copy link

Sometimes one may want to store a request key for future skipping even when there's no item generated.

Currently I handle such cases by yielding a pseudo item from such responses, then:

  • Drop that item using another middleware to be placed after deltafetch, or
  • Drop that item in a pipeline

It would be nice to support this feature inside deltafetch.

@codecov-io
Copy link

codecov-io commented May 16, 2017

Codecov Report

Merging #19 into master will increase coverage by 0.69%.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #19      +/-   ##
=======================================
+ Coverage    91.3%   92%   +0.69%     
=======================================
  Files           2     2              
  Lines          69    75       +6     
  Branches        9    11       +2     
=======================================
+ Hits           63    69       +6     
  Misses          3     3              
  Partials        3     3
Impacted Files Coverage Δ
scrapy_deltafetch/middleware.py 91.78% <100%> (+0.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aea3c34...d18b9fb. Read the comment docs.

@kmike
Copy link
Member

kmike commented May 16, 2017

A shameless plug: https://github.com/TeamHG-Memex/scrapy-crawl-once is a similar package, but storage decision is not based on items - an explicit meta key is used (users can still set it based on scraped items if they want). So instead of creating fake items and dropping them in a middleware one can just set request.meta['crawl_once']=False. It also shouldn't have issues like #18 because it uses sqlite. It is harder to use if a decision should be based on whether items are scraped or not though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants