Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added derstandard.at #431

Merged
merged 4 commits into from
May 6, 2024
Merged

Conversation

felixvonberlin
Copy link

Hi,
I've added »der Standard«, an Austrian newspaper to fundus.

I think it looks fine, but I could not test it with pytest.
Even testing existing news sources failed on my platform.

Best,
Felix

@MaxDall
Copy link
Collaborator

MaxDall commented Apr 20, 2024

@felixvonberlin Thanks for adding this :) Could you give some more context as to why you could not run pytest? Did you get any error messages?

Furthermore, before reviewing, could you merge the latest master branch to your branch and run the following

python -m scripts.generate_parser_test_files -p DerStandard

@felixvonberlin
Copy link
Author

felixvonberlin commented Apr 20, 2024

Oh, yes, I missed it yesterday; I'm sorry!

Running the pytest fails like this:

Click Me
oertzenf@tarvos ~/git/fundus$ source bin/activate                                                                                    ✭master 
(fundus) oertzenf@tarvos ~/git/fundus$ export PYTHONPATH=$(pwd)/src                                                                  ✭master 
(fundus) oertzenf@tarvos ~/git/fundus$ python3 -m scripts.generate_parser_test_files -p DerStandard                                  ✭master 
DerStandard:   0%|                                                                                                      | 0/1 [00:00<?, ?it/s]
2024-04-20 13:04:23,144 - basic_logger - ERROR - Couldn't get article for DerStandard. Skipping
DerStandard:   0%|                                                                                                      | 0/1 [02:50<?, ?it/s]
(fundus) oertzenf@tarvos ~/git/fundus$ python3 -m pytest                                                                             ✭master 
============================================================ test session starts =============================================================
platform linux -- Python 3.11.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/oertzenf/git/fundus
configfile: pyproject.toml
collected 418 items / 13 errors                                                                                                              

=================================================================== ERRORS ===================================================================
___________________________________ ERROR collecting lib/python3.11/site-packages/dill/tests/test_diff.py ____________________________________
lib/python3.11/site-packages/dill/tests/test_diff.py:9: in <module>
    from dill import __diff as diff
lib/python3.11/site-packages/dill/__diff.py:233: in <module>
    memorise(mod)
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:119: in memorise
    [mem(item) for item in s]
lib/python3.11/site-packages/dill/__diff.py:119: in <listcomp>
    [mem(item) for item in s]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:115: in memorise
    [(mem(key), mem(item))
lib/python3.11/site-packages/dill/__diff.py:115: in <listcomp>
    [(mem(key), mem(item))
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:101: in memorise
    seq_id = dict((id_(key),id_(value)) for key, value in s.items())
lib/python3.11/site-packages/dill/__diff.py:101: in <genexpr>
    seq_id = dict((id_(key),id_(value)) for key, value in s.items())
E   ValueError: too many values to unpack (expected 2)
__________________________________ ERROR collecting lib/python3.11/site-packages/dill/tests/test_module.py ___________________________________
ImportError while importing test module '/home/oertzenf/git/fundus/lib/python3.11/site-packages/dill/tests/test_module.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
lib/python3.11/site-packages/dill/tests/test_module.py:11: in <module>
    import test_mixins as module
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
    mod = __import__(*args, **kwds)
E   ModuleNotFoundError: No module named 'test_mixins'
__________________________________ ERROR collecting lib/python3.11/site-packages/dill/tests/test_objects.py __________________________________
lib/python3.11/site-packages/dill/tests/test_objects.py:20: in <module>
    load_types(pickleable=True,unpickleable=False)
lib/python3.11/site-packages/dill/__init__.py:71: in load_types
    from . import _objects
lib/python3.11/site-packages/dill/__diff.py:223: in _imp
    memorise(sys.modules[m])
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:115: in memorise
    [(mem(key), mem(item))
lib/python3.11/site-packages/dill/__diff.py:115: in <listcomp>
    [(mem(key), mem(item))
lib/python3.11/site-packages/dill/__diff.py:91: in memorise
    g = get_attrs(obj)
lib/python3.11/site-packages/dill/__diff.py:44: in get_attrs
    return getattr(obj, '__dict__', None)
E   ReferenceError: weakly-referenced object no longer exists
________________________________ ERROR collecting lib/python3.11/site-packages/dill/tests/test_registered.py _________________________________
lib/python3.11/site-packages/dill/tests/test_registered.py:45: in <module>
    raise e from None
lib/python3.11/site-packages/dill/tests/test_registered.py:42: in <module>
    assert not bool(success)
E   AssertionError: assert not True
E    +  where True = bool(['PrettyPrinterType', 'StreamHandlerType'])
-------------------------------------------------------------- Captured stdout ---------------------------------------------------------------
SUCCESS: ['PrettyPrinterType', 'StreamHandlerType']
_________________________________ ERROR collecting lib/python3.11/site-packages/dill/tests/test_selected.py __________________________________
lib/python3.11/site-packages/dill/tests/test_selected.py:46: in <module>
    objects['TemporaryFileType'].close()
E   OSError: [Errno 9] Bad file descriptor
__________________________________ ERROR collecting lib/python3.11/site-packages/dill/tests/test_session.py __________________________________
ImportError while importing test module '/home/oertzenf/git/fundus/lib/python3.11/site-packages/dill/tests/test_session.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
lib/python3.11/site-packages/dill/tests/test_session.py:72: in <module>
    import test_dictviews as local_mod                  # non-builtin top-level module
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
    mod = __import__(*args, **kwds)
E   ModuleNotFoundError: No module named 'test_dictviews'
______________________________________ ERROR collecting lib/python3.11/site-packages/tests/test_repr.py ______________________________________
ImportError while importing test module '/home/oertzenf/git/fundus/lib/python3.11/site-packages/tests/test_repr.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'tests.test_repr'
__________________________________ ERROR collecting lib64/python3.11/site-packages/dill/tests/test_diff.py ___________________________________
/usr/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
    ???
<frozen importlib._bootstrap>:1178: in _find_and_load
    ???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
lib/python3.11/site-packages/_pytest/assertion/rewrite.py:178: in exec_module
    exec(co, module.__dict__)
lib/python3.11/site-packages/dill/tests/test_diff.py:9: in <module>
    from dill import __diff as diff
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
    mod = __import__(*args, **kwds)
lib/python3.11/site-packages/dill/__diff.py:233: in <module>
    memorise(mod)
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:119: in memorise
    [mem(item) for item in s]
lib/python3.11/site-packages/dill/__diff.py:119: in <listcomp>
    [mem(item) for item in s]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:115: in memorise
    [(mem(key), mem(item))
lib/python3.11/site-packages/dill/__diff.py:115: in <listcomp>
    [(mem(key), mem(item))
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:101: in memorise
    seq_id = dict((id_(key),id_(value)) for key, value in s.items())
lib/python3.11/site-packages/dill/__diff.py:101: in <genexpr>
    seq_id = dict((id_(key),id_(value)) for key, value in s.items())
E   ValueError: too many values to unpack (expected 2)
_________________________________ ERROR collecting lib64/python3.11/site-packages/dill/tests/test_module.py __________________________________
ImportError while importing test module '/home/oertzenf/git/fundus/lib64/python3.11/site-packages/dill/tests/test_module.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
lib/python3.11/site-packages/dill/tests/test_module.py:11: in <module>
    import test_mixins as module
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
    mod = __import__(*args, **kwds)
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
    mod = __import__(*args, **kwds)
E   ModuleNotFoundError: No module named 'test_mixins'
_______________________________ ERROR collecting lib64/python3.11/site-packages/dill/tests/test_registered.py ________________________________
/usr/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
    ???
<frozen importlib._bootstrap>:1178: in _find_and_load
    ???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
lib/python3.11/site-packages/_pytest/assertion/rewrite.py:178: in exec_module
    exec(co, module.__dict__)
lib/python3.11/site-packages/dill/tests/test_registered.py:45: in <module>
    raise e from None
lib/python3.11/site-packages/dill/tests/test_registered.py:42: in <module>
    assert not bool(success)
E   AssertionError: assert not True
E    +  where True = bool(['PrettyPrinterType', 'TemporaryFileType', 'StreamHandlerType'])
-------------------------------------------------------------- Captured stdout ---------------------------------------------------------------
SUCCESS: ['PrettyPrinterType', 'TemporaryFileType', 'StreamHandlerType']
_________________________________ ERROR collecting lib64/python3.11/site-packages/dill/tests/test_session.py _________________________________
ImportError while importing test module '/home/oertzenf/git/fundus/lib64/python3.11/site-packages/dill/tests/test_session.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
lib/python3.11/site-packages/dill/tests/test_session.py:72: in <module>
    import test_dictviews as local_mod                  # non-builtin top-level module
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
    mod = __import__(*args, **kwds)
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
    mod = __import__(*args, **kwds)
E   ModuleNotFoundError: No module named 'test_dictviews'
_____________________________________ ERROR collecting lib64/python3.11/site-packages/tests/test_repr.py _____________________________________
ImportError while importing test module '/home/oertzenf/git/fundus/lib64/python3.11/site-packages/tests/test_repr.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'tests.test_repr'
___________________________________________________ ERROR collecting tests/test_parser.py ____________________________________________________
tests/test_parser.py:18: in <module>
    from tests.utility import (
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
    mod = __import__(*args, **kwds)
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
    mod = __import__(*args, **kwds)
tests/utility.py:15: in <module>
    from scripts.generate_tables import supported_publishers_markdown_path
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
    mod = __import__(*args, **kwds)
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
    mod = __import__(*args, **kwds)
scripts/generate_tables.py:9: in <module>
    from lxml.html.builder import CLASS, CODE, DIV, SPAN, TABLE, TBODY, TD, TH, THEAD, TR, A
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
    mod = __import__(*args, **kwds)
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
    mod = __import__(*args, **kwds)
lib/python3.11/site-packages/lxml/html/builder.py:32: in <module>
    from lxml.builder import ElementMaker
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
    mod = __import__(*args, **kwds)
lib/python3.11/site-packages/dill/__diff.py:223: in _imp
    memorise(sys.modules[m])
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in <listcomp>
    [mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:95: in memorise
    attrs_id = dict((key,id_(value)) for key, value in g.items())
E   AttributeError: 'functools.partial' object has no attribute 'items'
========================================================== short test summary info ===========================================================
ERROR lib/python3.11/site-packages/dill/tests/test_diff.py - ValueError: too many values to unpack (expected 2)
ERROR lib/python3.11/site-packages/dill/tests/test_module.py
ERROR lib/python3.11/site-packages/dill/tests/test_objects.py - ReferenceError: weakly-referenced object no longer exists
ERROR lib/python3.11/site-packages/dill/tests/test_registered.py - AssertionError: assert not True
ERROR lib/python3.11/site-packages/dill/tests/test_selected.py - OSError: [Errno 9] Bad file descriptor
ERROR lib/python3.11/site-packages/dill/tests/test_session.py
ERROR lib/python3.11/site-packages/tests/test_repr.py
ERROR lib64/python3.11/site-packages/dill/tests/test_diff.py - ValueError: too many values to unpack (expected 2)
ERROR lib64/python3.11/site-packages/dill/tests/test_module.py
ERROR lib64/python3.11/site-packages/dill/tests/test_registered.py - AssertionError: assert not True
ERROR lib64/python3.11/site-packages/dill/tests/test_session.py
ERROR lib64/python3.11/site-packages/tests/test_repr.py
ERROR tests/test_parser.py - AttributeError: 'functools.partial' object has no attribute 'items'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 13 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================================= 13 errors in 1.03s =============================================================
(fundus) oertzenf@tarvos ~/git/fundus$                                                                                          2 ↵  ✭master

I think, I've installed all dependencies. Calling pytest also fails on my machine for the ORF, for example.

@MaxDall
Copy link
Collaborator

MaxDall commented Apr 21, 2024

Hey, it seems that pytest runs tests unrelated to Fundus (in this case dill) and to be honest I've never seen something like this before.

Do you get the same results if you're running pytest src and don't export pythonpath? Is it a completely fresh venv with only Fundus + dependencies installed? Did you install Fundus running pip install -e .[dev]?

@felixvonberlin
Copy link
Author

felixvonberlin commented Apr 22, 2024

I tried in a new environment with the ORF; it still produces a lot of errors:

Click Me ``` oertzenf@tarvos /tmp$ echo $PYTHONPATH

oertzenf@tarvos /tmp$ git clone [email protected]:flairNLP/fundus.git
Cloning into 'fundus'...
remote: Enumerating objects: 10775, done.
remote: Counting objects: 100% (5930/5930), done.
remote: Compressing objects: 100% (1826/1826), done.
remote: Total 10775 (delta 4193), reused 5074 (delta 3757), pack-reused 4845
Receiving objects: 100% (10775/10775), 15.08 MiB | 1.17 MiB/s, done.
Resolving deltas: 100% (6946/6946), done.
oertzenf@tarvos /tmp$ python3 -m venv fundus
oertzenf@tarvos /tmp$ cd fundus/
oertzenf@tarvos /tmp/fundus$ source bin/activate ✭master
(fundus) oertzenf@tarvos /tmp/fundus$ pip3 install -e .[dev] 130 ↵ ✭master
zsh: no matches found: .[dev]
(fundus) oertzenf@tarvos /tmp/fundus$ pip3 install -e . [dev] 1 ↵ ✭master
zsh: no matches found: [dev]
(fundus) oertzenf@tarvos /tmp/fundus$ pip3 install -e . 1 ↵ ✭master
Obtaining file:///tmp/fundus
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... done
Preparing editable metadata (pyproject.toml) ... done
Collecting python-dateutil<3,>=2.8
Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Collecting lxml<5,>=4.9
Using cached lxml-4.9.4-cp311-cp311-manylinux_2_28_x86_64.whl (7.9 MB)
Collecting more-itertools<10,>=9.1
Using cached more_itertools-9.1.0-py3-none-any.whl (54 kB)
Collecting cssselect<2,>=1.1
Using cached cssselect-1.2.0-py2.py3-none-any.whl (18 kB)
Collecting feedparser<7,>=6.0
Using cached feedparser-6.0.11-py3-none-any.whl (81 kB)
Collecting colorama<1,>=0.4
Using cached colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Collecting typing-extensions<5,>=4.6
Using cached typing_extensions-4.11.0-py3-none-any.whl (34 kB)
Collecting langdetect<2,>=1.0
Using cached langdetect-1.0.9.tar.gz (981 kB)
Preparing metadata (setup.py) ... done
Collecting validators!=0.23,<1,>=0.20
Using cached validators-0.28.1-py3-none-any.whl (39 kB)
Collecting requests<3,>=2.28
Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Collecting tqdm<5,>=4.66
Using cached tqdm-4.66.2-py3-none-any.whl (78 kB)
Collecting fastwarc<1,>=0.14
Using cached FastWARC-0.14.6-cp311-cp311-manylinux_2_28_x86_64.whl (2.4 MB)
Collecting chardet<6,>=5.2
Using cached chardet-5.2.0-py3-none-any.whl (199 kB)
Collecting dill<1,>=0.3
Using cached dill-0.3.8-py3-none-any.whl (116 kB)
Collecting brotli
Using cached Brotli-1.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.9 MB)
Collecting click
Using cached click-8.1.7-py3-none-any.whl (97 kB)
Collecting sgmllib3k
Using cached sgmllib3k-1.0.0.tar.gz (5.8 kB)
Preparing metadata (setup.py) ... done
Collecting six
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting charset-normalizer<4,>=2
Using cached charset_normalizer-3.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (140 kB)
Collecting idna<4,>=2.5
Using cached idna-3.7-py3-none-any.whl (66 kB)
Collecting urllib3<3,>=1.21.1
Using cached urllib3-2.2.1-py3-none-any.whl (121 kB)
Collecting certifi>=2017.4.17
Using cached certifi-2024.2.2-py3-none-any.whl (163 kB)
Building wheels for collected packages: fundus
Building editable for fundus (pyproject.toml) ... done
Created wheel for fundus: filename=fundus-0.3.0-0.editable-py3-none-any.whl size=5484 sha256=727cbad8c4ad9c672e001764b385d64d59d4904537f2ee91340a53fea38c680e
Stored in directory: /tmp/pip-ephem-wheel-cache-8nqyhoo0/wheels/f5/bb/42/425468f5bb73ef0111f285222de9e224ed77be931ea1b5b2d7
Successfully built fundus
Installing collected packages: sgmllib3k, brotli, validators, urllib3, typing-extensions, tqdm, six, more-itertools, lxml, idna, feedparser, dill, cssselect, colorama, click, charset-normalizer, chardet, certifi, requests, python-dateutil, langdetect, fastwarc, fundus
DEPRECATION: sgmllib3k is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at pypa/pip#8559
Running setup.py install for sgmllib3k ... done
DEPRECATION: langdetect is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at pypa/pip#8559
Running setup.py install for langdetect ... done
Successfully installed brotli-1.1.0 certifi-2024.2.2 chardet-5.2.0 charset-normalizer-3.3.2 click-8.1.7 colorama-0.4.6 cssselect-1.2.0 dill-0.3.8 fastwarc-0.14.6 feedparser-6.0.11 fundus-0.3.0 idna-3.7 langdetect-1.0.9 lxml-4.9.4 more-itertools-9.1.0 python-dateutil-2.9.0.post0 requests-2.31.0 sgmllib3k-1.0.0 six-1.16.0 tqdm-4.66.2 typing-extensions-4.11.0 urllib3-2.2.1 validators-0.28.1
(fundus) oertzenf@tarvos /tmp/fundus$ pip3 install pytest ✭master
Collecting pytest
Using cached pytest-8.1.1-py3-none-any.whl (337 kB)
Collecting iniconfig
Using cached iniconfig-2.0.0-py3-none-any.whl (5.9 kB)
Collecting packaging
Using cached packaging-24.0-py3-none-any.whl (53 kB)
Collecting pluggy<2.0,>=1.4
Using cached pluggy-1.5.0-py3-none-any.whl (20 kB)
Installing collected packages: pluggy, packaging, iniconfig, pytest
Successfully installed iniconfig-2.0.0 packaging-24.0 pluggy-1.5.0 pytest-8.1.1
(fundus) oertzenf@tarvos /tmp/fundus$ python3 -m scripts.generate_parser_test_files -p ORF 1 ↵ ✭master
ORF: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 50.06it/s]
(fundus) oertzenf@tarvos /tmp/fundus$ python3 -m pytest src ✭master
============================================================ test session starts =============================================================
platform linux -- Python 3.11.2, pytest-8.1.1, pluggy-1.5.0
rootdir: /tmp/fundus
configfile: pyproject.toml
collected 0 items

=========================================================== no tests ran in 0.01s ============================================================
(fundus) oertzenf@tarvos /tmp/fundus$ python3 -m pytest 5 ↵ ✭master
============================================================ test session starts =============================================================
platform linux -- Python 3.11.2, pytest-8.1.1, pluggy-1.5.0
rootdir: /tmp/fundus
configfile: pyproject.toml
collected 420 items / 11 errors

=================================================================== ERRORS ===================================================================
___________________________________ ERROR collecting lib/python3.11/site-packages/dill/tests/test_diff.py ____________________________________
lib/python3.11/site-packages/dill/tests/test_diff.py:9: in
from dill import __diff as diff
lib/python3.11/site-packages/dill/__diff.py:233: in
memorise(mod)
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:119: in memorise
[mem(item) for item in s]
lib/python3.11/site-packages/dill/__diff.py:119: in
[mem(item) for item in s]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:115: in memorise
[(mem(key), mem(item))
lib/python3.11/site-packages/dill/__diff.py:115: in
[(mem(key), mem(item))
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/diff.py:101: in memorise
seq_id = dict((id
(key),id
(value)) for key, value in s.items())
lib/python3.11/site-packages/dill/diff.py:101: in
seq_id = dict((id
(key),id
(value)) for key, value in s.items())
E ValueError: too many values to unpack (expected 2)
__________________________________ ERROR collecting lib/python3.11/site-packages/dill/tests/test_module.py ___________________________________
ImportError while importing test module '/tmp/fundus/lib/python3.11/site-packages/dill/tests/test_module.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
lib/python3.11/site-packages/dill/tests/test_module.py:11: in
import test_mixins as module
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
mod = import(*args, **kwds)
E ModuleNotFoundError: No module named 'test_mixins'
__________________________________ ERROR collecting lib/python3.11/site-packages/dill/tests/test_objects.py __________________________________
lib/python3.11/site-packages/dill/tests/test_objects.py:20: in
load_types(pickleable=True,unpickleable=False)
lib/python3.11/site-packages/dill/init.py:71: in load_types
from . import _objects
lib/python3.11/site-packages/dill/__diff.py:223: in _imp
memorise(sys.modules[m])
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:115: in memorise
[(mem(key), mem(item))
lib/python3.11/site-packages/dill/__diff.py:115: in
[(mem(key), mem(item))
lib/python3.11/site-packages/dill/__diff.py:91: in memorise
g = get_attrs(obj)
lib/python3.11/site-packages/dill/__diff.py:44: in get_attrs
return getattr(obj, 'dict', None)
E ReferenceError: weakly-referenced object no longer exists
________________________________ ERROR collecting lib/python3.11/site-packages/dill/tests/test_registered.py _________________________________
lib/python3.11/site-packages/dill/tests/test_registered.py:45: in
raise e from None
lib/python3.11/site-packages/dill/tests/test_registered.py:42: in
assert not bool(success)
E AssertionError: assert not True
E + where True = bool(['PrettyPrinterType', 'StreamHandlerType'])
-------------------------------------------------------------- Captured stdout ---------------------------------------------------------------
SUCCESS: ['PrettyPrinterType', 'StreamHandlerType']
_________________________________ ERROR collecting lib/python3.11/site-packages/dill/tests/test_selected.py __________________________________
lib/python3.11/site-packages/dill/tests/test_selected.py:46: in
objects['TemporaryFileType'].close()
E OSError: [Errno 9] Bad file descriptor
__________________________________ ERROR collecting lib/python3.11/site-packages/dill/tests/test_session.py __________________________________
ImportError while importing test module '/tmp/fundus/lib/python3.11/site-packages/dill/tests/test_session.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
lib/python3.11/site-packages/dill/tests/test_session.py:72: in
import test_dictviews as local_mod # non-builtin top-level module
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
mod = import(*args, **kwds)
E ModuleNotFoundError: No module named 'test_dictviews'
__________________________________ ERROR collecting lib64/python3.11/site-packages/dill/tests/test_diff.py ___________________________________
/usr/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
:1206: in _gcd_import
???
:1178: in _find_and_load
???
:1149: in _find_and_load_unlocked
???
:690: in _load_unlocked
???
lib/python3.11/site-packages/_pytest/assertion/rewrite.py:178: in exec_module
exec(co, module.dict)
lib/python3.11/site-packages/dill/tests/test_diff.py:9: in
from dill import __diff as diff
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
mod = import(*args, **kwds)
lib/python3.11/site-packages/dill/__diff.py:233: in
memorise(mod)
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:119: in memorise
[mem(item) for item in s]
lib/python3.11/site-packages/dill/__diff.py:119: in
[mem(item) for item in s]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:115: in memorise
[(mem(key), mem(item))
lib/python3.11/site-packages/dill/__diff.py:115: in
[(mem(key), mem(item))
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/diff.py:101: in memorise
seq_id = dict((id
(key),id
(value)) for key, value in s.items())
lib/python3.11/site-packages/dill/diff.py:101: in
seq_id = dict((id
(key),id
(value)) for key, value in s.items())
E ValueError: too many values to unpack (expected 2)
_________________________________ ERROR collecting lib64/python3.11/site-packages/dill/tests/test_module.py __________________________________
ImportError while importing test module '/tmp/fundus/lib64/python3.11/site-packages/dill/tests/test_module.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
lib/python3.11/site-packages/dill/tests/test_module.py:11: in
import test_mixins as module
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
mod = import(*args, **kwds)
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
mod = import(*args, **kwds)
E ModuleNotFoundError: No module named 'test_mixins'
_______________________________ ERROR collecting lib64/python3.11/site-packages/dill/tests/test_registered.py ________________________________
/usr/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
:1206: in _gcd_import
???
:1178: in _find_and_load
???
:1149: in _find_and_load_unlocked
???
:690: in _load_unlocked
???
lib/python3.11/site-packages/_pytest/assertion/rewrite.py:178: in exec_module
exec(co, module.dict)
lib/python3.11/site-packages/dill/tests/test_registered.py:45: in
raise e from None
lib/python3.11/site-packages/dill/tests/test_registered.py:42: in
assert not bool(success)
E AssertionError: assert not True
E + where True = bool(['PrettyPrinterType', 'TemporaryFileType', 'StreamHandlerType'])
-------------------------------------------------------------- Captured stdout ---------------------------------------------------------------
SUCCESS: ['PrettyPrinterType', 'TemporaryFileType', 'StreamHandlerType']
_________________________________ ERROR collecting lib64/python3.11/site-packages/dill/tests/test_session.py _________________________________
ImportError while importing test module '/tmp/fundus/lib64/python3.11/site-packages/dill/tests/test_session.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
lib/python3.11/site-packages/dill/tests/test_session.py:72: in
import test_dictviews as local_mod # non-builtin top-level module
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
mod = import(*args, **kwds)
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
mod = import(*args, **kwds)
E ModuleNotFoundError: No module named 'test_dictviews'
___________________________________________________ ERROR collecting tests/test_parser.py ____________________________________________________
tests/test_parser.py:18: in
from tests.utility import (
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
mod = import(*args, **kwds)
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
mod = import(*args, **kwds)
tests/utility.py:16: in
from scripts.generate_tables import supported_publishers_markdown_path
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
mod = import(*args, **kwds)
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
mod = import(*args, **kwds)
scripts/generate_tables.py:9: in
from lxml.html.builder import CLASS, CODE, DIV, SPAN, TABLE, TBODY, TD, TH, THEAD, TR, A
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
mod = import(*args, **kwds)
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
mod = import(*args, **kwds)
lib/python3.11/site-packages/lxml/html/builder.py:32: in
from lxml.builder import ElementMaker
lib/python3.11/site-packages/dill/__diff.py:220: in _imp
mod = import(*args, **kwds)
lib/python3.11/site-packages/dill/__diff.py:223: in _imp
memorise(sys.modules[m])
lib/python3.11/site-packages/dill/__diff.py:111: in memorise
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/__diff.py:111: in
[mem(value) for key, value in g.items()]
lib/python3.11/site-packages/dill/_diff.py:95: in memorise
attrs_id = dict((key,id
(value)) for key, value in g.items())
E AttributeError: 'functools.partial' object has no attribute 'items'
========================================================== short test summary info ===========================================================
ERROR lib/python3.11/site-packages/dill/tests/test_diff.py - ValueError: too many values to unpack (expected 2)
ERROR lib/python3.11/site-packages/dill/tests/test_module.py
ERROR lib/python3.11/site-packages/dill/tests/test_objects.py - ReferenceError: weakly-referenced object no longer exists
ERROR lib/python3.11/site-packages/dill/tests/test_registered.py - AssertionError: assert not True
ERROR lib/python3.11/site-packages/dill/tests/test_selected.py - OSError: [Errno 9] Bad file descriptor
ERROR lib/python3.11/site-packages/dill/tests/test_session.py
ERROR lib64/python3.11/site-packages/dill/tests/test_diff.py - ValueError: too many values to unpack (expected 2)
ERROR lib64/python3.11/site-packages/dill/tests/test_module.py
ERROR lib64/python3.11/site-packages/dill/tests/test_registered.py - AssertionError: assert not True
ERROR lib64/python3.11/site-packages/dill/tests/test_session.py
ERROR tests/test_parser.py - AttributeError: 'functools.partial' object has no attribute 'items'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 11 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================================= 11 errors in 1.08s =============================================================
(fundus) oertzenf@tarvos /tmp/fundus$ 2 ↵ ✭master

</details>

@MaxDall
Copy link
Collaborator

MaxDall commented Apr 22, 2024

@felixvonberlin I'm a bit confused by these lines:

oertzenf@tarvos /tmp$ python3 -m venv fundus                                                                                                 
oertzenf@tarvos /tmp$ cd fundus/                                                                                                             
oertzenf@tarvos /tmp/fundus$ source bin/activate

Is your venv the same directory you cloned the repository to?

Normally it's supposed to be something like this

fundus
   | venv

@felixvonberlin
Copy link
Author

Yes:
oertzenf@tarvos /tmp$ python3 -m venv fundus
Here I've created a venv into the git repo. Is this a problem?

@MaxDall
Copy link
Collaborator

MaxDall commented Apr 24, 2024

Yes, I'm pretty sure that's whats causing your problems.

You should create the venv inside the repository.

git clone ...
cd fundus
python3 -m venv venv
source venv/bin/activate

@MaxDall MaxDall merged commit f7ba274 into flairNLP:master May 6, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants