Skip to content

Commit

Permalink
Release/v0.1.9 (#11)
Browse files Browse the repository at this point in the history
* Removed explicit job healthchecks
* Added scraper job timeout
* Moved lib to root directory
* Moving runner pings to UTC timezone
* Fixing scheduler and worker bugs
* Added and updated tests
* Bumping version. Updating dependencies
  • Loading branch information
flulemon authored Jun 1, 2023
1 parent b0145c0 commit f43d2ce
Show file tree
Hide file tree
Showing 56 changed files with 1,357 additions and 1,021 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,8 @@ do so let\'s configure **SneakpeekServer**:
```python3
# file: main.py

from sneakpeek.lib.models import Scraper, ScraperJobPriority, ScraperSchedule
from sneakpeek.lib.storage.in_memory_storage import (
from sneakpeek.models import Scraper, ScraperJobPriority, ScraperSchedule
from sneakpeek.storage.in_memory_storage import (
InMemoryLeaseStorage,
InMemoryScraperJobsStorage,
InMemoryScrapersStorage,
Expand Down
12 changes: 6 additions & 6 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,18 @@ API
.. automodule:: sneakpeek.scraper_config
.. automodule:: sneakpeek.scraper_context
.. automodule:: sneakpeek.scraper_handler
.. automodule:: sneakpeek.lib.models
.. automodule:: sneakpeek.models
.. automodule:: sneakpeek.scheduler
.. automodule:: sneakpeek.worker
.. automodule:: sneakpeek.runner
.. automodule:: sneakpeek.api
.. automodule:: sneakpeek.metrics
.. automodule:: sneakpeek.logging
.. automodule:: sneakpeek.lib.errors
.. automodule:: sneakpeek.lib.queue
.. automodule:: sneakpeek.lib.storage.base
.. automodule:: sneakpeek.lib.storage.in_memory_storage
.. automodule:: sneakpeek.lib.storage.redis_storage
.. automodule:: sneakpeek.errors
.. automodule:: sneakpeek.queue
.. automodule:: sneakpeek.storage.base
.. automodule:: sneakpeek.storage.in_memory_storage
.. automodule:: sneakpeek.storage.redis_storage
.. automodule:: sneakpeek.plugins.proxy_plugin
.. automodule:: sneakpeek.plugins.rate_limiter_plugin
.. automodule:: sneakpeek.plugins.requests_logging_plugin
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
copyright = "2023, Dan Yazovsky"
author = "Dan Yazovsky"
version = "0.1"
release = "0.1.4"
release = "0.1.9"
extensions = ["sphinx.ext.autodoc", "sphinx.ext.coverage", "sphinx.ext.napoleon"]
templates_path = ["_templates"]
language = "en"
Expand Down
52 changes: 26 additions & 26 deletions docs/design.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,45 +19,45 @@ All of the components are run by the :py:class:`SneakpeekServer <sneakpeek.serve
Scrapers Storage
================

Storage must implement this abstract class :py:class:`sneakpeek.lib.storage.base.ScrapersStorage`.
Storage must implement this abstract class :py:class:`sneakpeek.storage.base.ScrapersStorage`.
Following methods are mandatory to implement:

* :py:meth:`get_scrapers <sneakpeek.lib.storage.base.ScrapersStorage.get_scrapers>` - get list of all scrapers
* :py:meth:`get_scraper <sneakpeek.lib.storage.base.ScrapersStorage.get_scraper>` - get scraper by ID
* :py:meth:`is_read_only <sneakpeek.lib.storage.base.ScrapersStorage.is_read_only>` - whether the storage allows modifications of the scrapers list and its metadata
* :py:meth:`get_scrapers <sneakpeek.storage.base.ScrapersStorage.get_scrapers>` - get list of all scrapers
* :py:meth:`get_scraper <sneakpeek.storage.base.ScrapersStorage.get_scraper>` - get scraper by ID
* :py:meth:`is_read_only <sneakpeek.storage.base.ScrapersStorage.is_read_only>` - whether the storage allows modifications of the scrapers list and its metadata

Following methods are optional to implement:

* :py:meth:`create_scraper <sneakpeek.lib.storage.base.ScrapersStorage.create_scraper>` - create a new scraper
* :py:meth:`delete_scraper <sneakpeek.lib.storage.base.ScrapersStorage.delete_scraper>` - delete scraper by ID
* :py:meth:`update_scraper <sneakpeek.lib.storage.base.ScrapersStorage.update_scraper>` - update existing scraper
* :py:meth:`maybe_get_scraper <sneakpeek.lib.storage.base.ScrapersStorage.maybe_get_scraper>` - get scraper by ID if it exists
* :py:meth:`search_scrapers <sneakpeek.lib.storage.base.ScrapersStorage.search_scrapers>` - search scrapers using given filters
* :py:meth:`create_scraper <sneakpeek.storage.base.ScrapersStorage.create_scraper>` - create a new scraper
* :py:meth:`delete_scraper <sneakpeek.storage.base.ScrapersStorage.delete_scraper>` - delete scraper by ID
* :py:meth:`update_scraper <sneakpeek.storage.base.ScrapersStorage.update_scraper>` - update existing scraper
* :py:meth:`maybe_get_scraper <sneakpeek.storage.base.ScrapersStorage.maybe_get_scraper>` - get scraper by ID if it exists
* :py:meth:`search_scrapers <sneakpeek.storage.base.ScrapersStorage.search_scrapers>` - search scrapers using given filters

Currently there 2 storage implementations:

* :py:class:`InMemoryScrapersStorage <sneakpeek.lib.storage.in_memory_storage.InMemoryScrapersStorage>` - in-memory storage. Should either be used in **development** environment or if the list of scrapers is static and wouldn't be changed.
* :py:class:`RedisScrapersStorage <sneakpeek.lib.storage.in_memory_storage.RedisScrapersStorage>` - redis storage.
* :py:class:`InMemoryScrapersStorage <sneakpeek.storage.in_memory_storage.InMemoryScrapersStorage>` - in-memory storage. Should either be used in **development** environment or if the list of scrapers is static and wouldn't be changed.
* :py:class:`RedisScrapersStorage <sneakpeek.storage.in_memory_storage.RedisScrapersStorage>` - redis storage.

================
Jobs queue
================

Jobs queue must implement this abstract class :py:class:`sneakpeek.lib.storage.base.ScraperJobsStorage`.
Jobs queue must implement this abstract class :py:class:`sneakpeek.storage.base.ScraperJobsStorage`.
Following methods must be implemented:

* :py:meth:`get_scraper_jobs <sneakpeek.lib.storage.base.ScraperJobsStorage.get_scraper_jobs>` - get scraper jobs by scraper ID
* :py:meth:`add_scraper_job <sneakpeek.lib.storage.base.ScraperJobsStorage.add_scraper_job>` - add new scraper job
* :py:meth:`update_scraper_job <sneakpeek.lib.storage.base.ScraperJobsStorage.update_scraper_job>` - update existing scraper job
* :py:meth:`get_scraper_job <sneakpeek.lib.storage.base.ScraperJobsStorage.get_scraper_job>` - get existing scraper job by scraper ID and scraper job ID
* :py:meth:`dequeue_scraper_job <sneakpeek.lib.storage.base.ScraperJobsStorage.dequeue_scraper_job>` - dequeue scraper job from queue with given priority
* :py:meth:`delete_old_scraper_jobs <sneakpeek.lib.storage.base.ScraperJobsStorage.delete_old_scraper_jobs>` - delete old historical scraper jobs
* :py:meth:`get_queue_len <sneakpeek.lib.storage.base.ScraperJobsStorage.get_queue_len>` - get number of pending scraper jobs in the queue with given priority
* :py:meth:`get_scraper_jobs <sneakpeek.storage.base.ScraperJobsStorage.get_scraper_jobs>` - get scraper jobs by scraper ID
* :py:meth:`add_scraper_job <sneakpeek.storage.base.ScraperJobsStorage.add_scraper_job>` - add new scraper job
* :py:meth:`update_scraper_job <sneakpeek.storage.base.ScraperJobsStorage.update_scraper_job>` - update existing scraper job
* :py:meth:`get_scraper_job <sneakpeek.storage.base.ScraperJobsStorage.get_scraper_job>` - get existing scraper job by scraper ID and scraper job ID
* :py:meth:`dequeue_scraper_job <sneakpeek.storage.base.ScraperJobsStorage.dequeue_scraper_job>` - dequeue scraper job from queue with given priority
* :py:meth:`delete_old_scraper_jobs <sneakpeek.storage.base.ScraperJobsStorage.delete_old_scraper_jobs>` - delete old historical scraper jobs
* :py:meth:`get_queue_len <sneakpeek.storage.base.ScraperJobsStorage.get_queue_len>` - get number of pending scraper jobs in the queue with given priority

Currently there 2 storage implementations:

* :py:class:`InMemoryScraperJobsStorage <sneakpeek.lib.storage.in_memory_storage.InMemoryScraperJobsStorage>` - in-memory storage. Should only be used in **development** environment.
* :py:class:`RedisScraperJobsStorage <sneakpeek.lib.storage.in_memory_storage.RedisScraperJobsStorage>` - redis storage.
* :py:class:`InMemoryScraperJobsStorage <sneakpeek.storage.in_memory_storage.InMemoryScraperJobsStorage>` - in-memory storage. Should only be used in **development** environment.
* :py:class:`RedisScraperJobsStorage <sneakpeek.storage.in_memory_storage.RedisScraperJobsStorage>` - redis storage.

================
Lease storage
Expand All @@ -67,16 +67,16 @@ Lease storage is used by scheduler to ensure that at any point of time there's n
than 1 active scheduler instance which can enqueue scraper jobs. This disallows concurrent
execution of the scraper.

Lease storage must implement this abstract class :py:class:`sneakpeek.lib.storage.base.LeaseStorage`.
Lease storage must implement this abstract class :py:class:`sneakpeek.storage.base.LeaseStorage`.
Following methods must be implemented:

* :py:meth:`maybe_acquire_lease <sneakpeek.lib.storage.base.LeaseStorage.maybe_acquire_lease>` - try to acquire lease (or global lock)
* :py:meth:`release_lease <sneakpeek.lib.storage.base.LeaseStorage.release_lease>` - release acquired lease
* :py:meth:`maybe_acquire_lease <sneakpeek.storage.base.LeaseStorage.maybe_acquire_lease>` - try to acquire lease (or global lock)
* :py:meth:`release_lease <sneakpeek.storage.base.LeaseStorage.release_lease>` - release acquired lease

Currently there 2 storage implementations:

* :py:class:`InMemoryLeaseStorage <sneakpeek.lib.storage.in_memory_storage.InMemoryLeaseStorage>` - in-memory storage. Should only be used in **development** environment.
* :py:class:`RedisLeaseStorage <sneakpeek.lib.storage.in_memory_storage.RedisLeaseStorage>` - redis storage.
* :py:class:`InMemoryLeaseStorage <sneakpeek.storage.in_memory_storage.InMemoryLeaseStorage>` - in-memory storage. Should only be used in **development** environment.
* :py:class:`RedisLeaseStorage <sneakpeek.storage.in_memory_storage.RedisLeaseStorage>` - redis storage.

================
Scheduler
Expand Down
4 changes: 2 additions & 2 deletions docs/quick_start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,8 @@ To do so let's configure **SneakpeekServer**:
# file: main.py
from sneakpeek.lib.models import Scraper, ScraperJobPriority, ScraperSchedule
from sneakpeek.lib.storage.in_memory_storage import (
from sneakpeek.models import Scraper, ScraperJobPriority, ScraperSchedule
from sneakpeek.storage.in_memory_storage import (
InMemoryLeaseStorage,
InMemoryScraperJobsStorage,
InMemoryScrapersStorage,
Expand Down
Loading

0 comments on commit f43d2ce

Please sign in to comment.