Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init fails due to violated check constraints #13

Open
nikammerlaan opened this issue Jun 7, 2024 · 7 comments
Open

Init fails due to violated check constraints #13

nikammerlaan opened this issue Jun 7, 2024 · 7 comments

Comments

@nikammerlaan
Copy link

I'm initing the database with these commands:

pipx install 'mbslave'
mbslave init

and running into this error:

...
INFO:mbslave.replication:Loading l_series_work to musicbrainz.l_series_work
INFO:mbslave.replication:Loading l_url_work to musicbrainz.l_url_work
INFO:mbslave.replication:Loading l_work_work to musicbrainz.l_work_work
INFO:mbslave.replication:Loading label to musicbrainz.label
Traceback (most recent call last):
  File "/opt/pipx_bin/mbslave", line 8, in <module>
    sys.exit(main())
  File "/opt/pipx/venvs/mbslave/lib/python3.10/site-packages/mbslave/replication.py", line 787, in main
    args.func(config, args)
  File "/opt/pipx/venvs/mbslave/lib/python3.10/site-packages/mbslave/replication.py", line 297, in mbslave_auto_import_main
    load_tar(url, fileobj, db, config, config.schemas.ignored_schemas, config.tables.ignored_tables)
  File "/opt/pipx/venvs/mbslave/lib/python3.10/site-packages/mbslave/replication.py", line 258, in load_tar
    cursor.copy_expert('COPY {} FROM STDIN'.format(fulltable), tar.extractfile(member))
psycopg2.errors.CheckViolation: new row for relation "label" violates check constraint "label_label_code_check"
DETAIL:  Failing row contains (294731, dacec7dc-806e-4f1a-ab41-cc0c46b297e0, beau by Republic, 2022, 10, 9, null, null, null, 202210, 1, 7741, , 0, 2024-05-14 21:06:04.842073+00, f).
CONTEXT:  COPY label, line 39973: "294731	dacec7dc-806e-4f1a-ab41-cc0c46b297e0	beau by Republic	2022	10	9	\N	\N	\N	202210	1	7741		0	202..."

Traceback (most recent call last):
  File "/opt/pipx_bin/mbslave", line 8, in <module>
    sys.exit(main())
  File "/opt/pipx/venvs/mbslave/lib/python3.10/site-packages/mbslave/replication.py", line 787, in main
    args.func(config, args)
  File "/opt/pipx/venvs/mbslave/lib/python3.10/site-packages/mbslave/replication.py", line 640, in mbslave_init_main
    run_script('mbslave auto-import')
  File "/opt/pipx/venvs/mbslave/lib/python3.10/site-packages/mbslave/replication.py", line 593, in run_script
    subprocess.run(script, check=True, shell=True)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'mbslave auto-import' returned non-zero exit status 1.
@nikammerlaan
Copy link
Author

I did some digging and this is because version 29 has not been published.

@nikammerlaan
Copy link
Author

I fixed it by installing mbslave from git:

pipx install git+https://github.com/acoustid/mbslave.git

@flimzy
Copy link

flimzy commented Jul 12, 2024

I'm getting the same error, even after instalilng mbslave from git as suggested above. 😢

Update: After uninstalling mbslave before re-installing from git, it works.

@kevinlee819
Copy link

I fixed it by installing mbslave from git:

pipx install git+https://github.com/acoustid/mbslave.git

after that. also didn't work

INFO:mbslave.replication:Latest dump is 20250205-001748
INFO:mbslave.replication:Importing data from http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/20250205-001748/mbdump.tar.bz2
INFO:mbslave.replication:Loading alternative_release_type to musicbrainz.alternative_release_type
INFO:mbslave.replication:Loading area to musicbrainz.area
INFO:mbslave.replication:Loading area_alias to musicbrainz.area_alias
INFO:mbslave.replication:Loading area_alias_type to musicbrainz.area_alias_type
INFO:mbslave.replication:Loading area_gid_redirect to musicbrainz.area_gid_redirect
INFO:mbslave.replication:Loading area_type to musicbrainz.area_type
INFO:mbslave.replication:Loading artist to musicbrainz.artist
INFO:mbslave.replication:Loading artist_alias to musicbrainz.artist_alias
INFO:mbslave.replication:Loading artist_alias_type to musicbrainz.artist_alias_type
INFO:mbslave.replication:Loading artist_credit to musicbrainz.artist_credit
zINFO:mbslave.replication:Loading artist_credit_gid_redirect to musicbrainz.artist_credit_gid_redirect
INFO:mbslave.replication:Loading artist_credit_name to musicbrainz.artist_credit_name
Traceback (most recent call last):
  File "/home/meizu/miniconda3/bin/mbslave", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/meizu/miniconda3/lib/python3.12/site-packages/mbslave/replication.py", line 806, in main
    args.func(config, args)
  File "/home/meizu/miniconda3/lib/python3.12/site-packages/mbslave/replication.py", line 304, in mbslave_auto_import_main
    load_tar(url, fileobj, db, config, config.schemas.ignored_schemas, config.tables.ignored_tables)
  File "/home/meizu/miniconda3/lib/python3.12/site-packages/mbslave/replication.py", line 265, in load_tar
    cursor.copy_expert('COPY {} FROM STDIN'.format(fulltable), tar.extractfile(member))
psycopg2.errors.QueryCanceled: COPY from stdin failed: error in .read() call: ReadError unexpected end of data
CONTEXT:  COPY artist_credit_name, line 5080688

Traceback (most recent call last):
  File "/home/meizu/miniconda3/bin/mbslave", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/meizu/miniconda3/lib/python3.12/site-packages/mbslave/replication.py", line 806, in main
    args.func(config, args)
  File "/home/meizu/miniconda3/lib/python3.12/site-packages/mbslave/replication.py", line 654, in mbslave_init_main
    run_script('mbslave auto-import')
  File "/home/meizu/miniconda3/lib/python3.12/site-packages/mbslave/replication.py", line 607, in run_script
    subprocess.run(script, check=True, shell=True)
  File "/home/meizu/miniconda3/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'mbslave auto-import' returned non-zero exit status 1.

@frisi
Copy link

frisi commented Feb 11, 2025

i had similar errors when trying to bootstrap a fresh database the last days and tried multiple things and made a few mistakes. sharing this will hopefully help you.
please try out one after the other and provide some feedback so we know what helped you so we can fix the issues with a pull request.

A) don't split db creation and import

make sure to remove and re-create the database before you run mbslave

i've done something like this to create the database in a first step and import data in a second step.

mbslave init --create-user --create-database --empty

# download package
wget http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/20250122-001711/mbdump.tar.bz2

mbslave import mbdump.tar.bz2

⚠ this won't work, as mbslave init works differently:

  • it first creates the tables
  • second imports data via the postgres \copy command (also more packages than the one mentioned above)
  • and after that it installs triggers that will make a second import of data next to impossible (script will get stuck either on copy or commit

B) set keepalive

i've adapted replication.py to set keepalive settings as suggested in https://stackoverflow.com/a/63130830

    def create_psycopg2_kwargs(self, superuser=False, no_db=False):
        kwargs = {
            "keepalives": 1,
            "keepalives_idle": 30,
            "keepalives_interval": 5,
            "keepalives_count": 5,
        }
  • you'll need to checkout this repository
  • adapt code
  • create a virtualenv
  • activate it
  • install from local repository
git clone ...

python3 -m venv mbslave
cd mbslave
source bin/activate

# install dependencies
sudo apt-get install postgresql-client
sudo apt install libpq-dev

pip install ~/path/to/mbslave

helpful debug commands

i've had hanging copy and commits blocking my database (due to mistake A).
should not happen with a fresh database.

to better know what currently is going on you can connect to your database (eg using psql -h <host> -U <user> <dbname> )
and check the status of the curently running copy command

Whenever COPY is running, the pg_stat_progress_copy view will contain one row for each backend that is currently running a COPY command
(see https://www.postgresql.org/docs/15/progress-reporting.html#COPY-PROGRESS-REPORTING)

select * from pg_stat_progress_copy;

query for any long running commands

    SELECT pid, age(clock_timestamp(), query_start), usename, query 
    FROM pg_stat_activity 
    WHERE state != 'idle' and query NOT ILIKE '%pg_stat_activity%' 
    ORDER BY query_start;

terminate a pid

select pg_cancel_backend(<procpid>);

-- sometimes
select pg_terminate_backend(<procpid>);

@kevinlee819
Copy link

i had similar errors when trying to bootstrap a fresh database the last days and tried multiple things and made a few mistakes. sharing this will hopefully help you. please try out one after the other and provide some feedback so we know what helped you so we can fix the issues with a pull request.

A) don't split db creation and import

make sure to remove and re-create the database before you run mbslave

i've done something like this to create the database in a first step and import data in a second step.

mbslave init --create-user --create-database --empty

# download package
wget http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/20250122-001711/mbdump.tar.bz2

mbslave import mbdump.tar.bz2

⚠ this won't work, as mbslave init works differently:

  • it first creates the tables
  • second imports data via the postgres \copy command (also more packages than the one mentioned above)
  • and after that it installs triggers that will make a second import of data next to impossible (script will get stuck either on copy or commit

B) set keepalive

i've adapted replication.py to set keepalive settings as suggested in https://stackoverflow.com/a/63130830

    def create_psycopg2_kwargs(self, superuser=False, no_db=False):
        kwargs = {
            "keepalives": 1,
            "keepalives_idle": 30,
            "keepalives_interval": 5,
            "keepalives_count": 5,
        }
  • you'll need to checkout this repository
  • adapt code
  • create a virtualenv
  • activate it
  • install from local repository
git clone ...

python3 -m venv mbslave
cd mbslave
source bin/activate

# install dependencies
sudo apt-get install postgresql-client
sudo apt install libpq-dev

pip install ~/path/to/mbslave

helpful debug commands

i've had hanging copy and commits blocking my database (due to mistake A). should not happen with a fresh database.

to better know what currently is going on you can connect to your database (eg using psql -h <host> -U <user> <dbname> ) and check the status of the curently running copy command

Whenever COPY is running, the pg_stat_progress_copy view will contain one row for each backend that is currently running a COPY command
(see https://www.postgresql.org/docs/15/progress-reporting.html#COPY-PROGRESS-REPORTING)

select * from pg_stat_progress_copy;

query for any long running commands

    SELECT pid, age(clock_timestamp(), query_start), usename, query 
    FROM pg_stat_activity 
    WHERE state != 'idle' and query NOT ILIKE '%pg_stat_activity%' 
    ORDER BY query_start;

terminate a pid

select pg_cancel_backend(<procpid>);

-- sometimes
select pg_terminate_backend(<procpid>);

You're right, after repeatedly trying to download data directly using mbslave without success, I used the method you mentioned:

mbslave init --create-user --create-database --empty

# download package
wget http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/20250122-001711/mbdump.tar.bz2

mbslave import mbdump.tar.bz2

But it gets stuck at "release" table every time

 402740 postgres  20   0  950484 789920 199168 R 100.0   4.9 967:02.00 postgres: 14/main: postgres musicbrainz 127.0.0.1(35860) COMMIT

It seems like the process has been running for a day. I don't know what's going on.

@frisi
Copy link

frisi commented Feb 13, 2025

@kevinlee819 please read carefully: the method described in A does not work as it will create a database with triggers
in place that will make importing huge datasets next to impossible (\copy takes a long time, and the deferred triggers will make importer stuck at the commit)

so please make sure you start with a fresh database, and try if updated keepalives settings solve you issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants