v0.2.1
This release is about bug fixing and quality maintenance for our parser. In addition:
- we added two new publishers to
DE
(Business Insider, Braunschweiger Zeitung) - we had to disable
WorldTruth
until we get rid of the batch logic with #357 - @addie9800 added a new attribute to
Article
calledfree_access
indicating if an article is available for free. - we added a new workflow to automatically publish releases on
TestPyPi
andPyPi
What's new?
- Add Business Insider by @MaxDall in #341
- Addition of Braunschweiger Zeitung by @addie9800 in #340
- Add free access atrribute by @addie9800 in #362
- Add release workflow to automatically publish on PyPi by @MaxDall in #358
Bug fixing
- Two major bug fixes by @MaxDall in #346
- Fix
LD
parsing by @MaxDall in #334 - Fixes broken tutorial links by @MaxDall in #348
- Various bug fixes by @MaxDall in #335
- Exclude
validators
verions0.23.x
, logURLSource
with invalid URL as init parameter as error instead of raising. by @MaxDall in #394 - comment out
WorldTruth
from the collection by @MaxDall in #395
Refactors
- Replace
asyncio
with thread-based solution for WARC-path download by @MaxDall in #347 - Refactor
ExtractionFilter
andRequires
by @MaxDall in #360
QoL
- Loosens version restrictions for dependencies by @MaxDall in #350
- Add
py.typed
by @dobbersc in #387 - Add MANIFEST.in by @dobbersc in #389
for DEVs
- Add utility to retrieve test articles by @MaxDall in #355
- Add URL parameter to test generation script by @MaxDall in #364
Text with Article.body
is now also normalized
Publisher quality maintenance
- Update Bild Sources, Update SZ Parser by @addie9800 in #352
- Update DW Version by @MaxDall in #342
- The Namibian Parser Update by @addie9800 in #363
- Fix SZ selector style and type hint by @MaxDall in #369
- Adjust paragraph selector for CNBC by @MaxDall in #366
- Adjust paragraph selector for Fox News parser by @MaxDall in #368
- Adjust paragraph selector for gateway pundit by @MaxDall in #367
- Fiz zeit paragraph selector by @addie9800 in #371
- Update NDR paragraph selector by @addie9800 in #373
- LeMonde: add summary and subheadline selector by @addie9800 in #374
- Update Berliner Zeitung - paragraph selector by @addie9800 in #372
- TheTelegraph: add subheadline selector by @addie9800 in #377
- Fix
_summary_selector
forTheNewYorker
by @MaxDall in #379 - Add
_summary_selector
toFreeBeacon
by @MaxDall in #380 - Fix sitemap filters for
OccupyDemocrats
by @MaxDall in #381 - Fix
iNews
parser by @MaxDall in #333 - Updated MDR Parser by @addie9800 in #370
- Bump
TheIntercept
version toV1_1
by @MaxDall in #384 - Bump up
Reuters
parser to version V1_1 by @MaxDall in #386 - Fix malformed HTML for
TheNation
by @MaxDall in #385 - Add
subheadline
selector toLATimes
by @MaxDall in #378
Full Changelog: v0.2.0...v0.2.1