Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix deprecated calls to scrapy.utils.request.request_fingerprint #50

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.5, 3.6, 3.7, 3.8, 3.9]
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11", "3.12", "3.13"]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.5 and 3.6 don't seem to work anymore? at least in my fork


steps:
- uses: actions/checkout@v2
Expand All @@ -21,7 +21,7 @@ jobs:
sudo apt-get install libdb-dev

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Cache pip
Expand All @@ -38,4 +38,4 @@ jobs:
pip install -r tests/requirements-test.txt
- name: Test with pytest
run: |
pytest
pytest
4 changes: 2 additions & 2 deletions scrapy_deltafetch/middleware.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

from scrapy.http import Request
from scrapy.item import Item
from scrapy.utils.request import request_fingerprint
from scrapy.utils.request import fingerprint

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Scrapy 2.7+, the right approach is using Crawler.fingerprinter.fingerprint().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gallaecio wouldn't using this require storing an instance of (and instantiating) a Crawler object as an instance variable?

from the messages in scrapy 2.11.2 (https://github.com/scrapy/scrapy/blob/e8cb5a03b382b98f2c8945355076390f708b918d/scrapy/utils/request.py#L86-L136) it seems to suggest getting the crawler during instantiation with DeltaFetch.from_crawler if I am reading it right.

but what if the DeltaFetch object isn't instantiated that way? what should happen in DeltaFetch._get_key?

Copy link

@Gallaecio Gallaecio Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeltaFetch is a spider middleware, instantiated by Scrapy.

Most Scrapy components are instantiated with the create_instance (Scrapy 2.11-) / build_from_crawler (Scrapy 2.12+) functions, which call from_crawler if defined. Spider middlewares are definitely one of those components. So __init__ should never be called without first calling from_crawler.

It does happen in tests, that use self.mwcls(), and would need to change to either use from_crawler or, better yet, use create_instance / build_from_crawler.

from scrapy.utils.project import data_path
from scrapy.utils.python import to_bytes
from scrapy.exceptions import NotConfigured
Expand Down Expand Up @@ -79,7 +79,7 @@ def process_spider_output(self, response, result, spider):
yield r

def _get_key(self, request):
key = request.meta.get('deltafetch_key') or request_fingerprint(request)
key = request.meta.get('deltafetch_key') or fingerprint(request)
return to_bytes(key)

def _is_enabled_for_request(self, request):
Expand Down
4 changes: 2 additions & 2 deletions tests/test_deltafetch.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from scrapy.spiders import Spider
from scrapy.settings import Settings
from scrapy.exceptions import NotConfigured
from scrapy.utils.request import request_fingerprint
from scrapy.utils.request import fingerprint
from scrapy.utils.python import to_bytes
from scrapy.statscollectors import StatsCollector
from scrapy.utils.test import get_crawler
Expand Down Expand Up @@ -319,7 +319,7 @@ def test_get_key(self):
mw = self.mwcls(self.temp_dir, reset=True)
test_req1 = Request('http://url1')
self.assertEqual(mw._get_key(test_req1),
to_bytes(request_fingerprint(test_req1)))
to_bytes(fingerprint(test_req1)))
test_req2 = Request('http://url2', meta={'deltafetch_key': b'dfkey1'})
self.assertEqual(mw._get_key(test_req2), b'dfkey1')

Expand Down