Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow validation after dynamicRef implementation, even with schemas which do not use dynamicRef other than in their metaschema #941

Closed
mriedem opened this issue May 5, 2022 · 14 comments
Labels
Bug Something doesn't work the way it should.

Comments

@mriedem
Copy link

mriedem commented May 5, 2022

We just started noticing that some tooling which is using this code hangs in 4.5.0:

import jsonschema
import ruamel.yaml

YAML = ruamel.yaml.YAML()
...
        with open(fname, "r") as f:
            self.config = YAML.load(f)
        jsonschema.validate(
            self.config, SCHEMA,
            format_checker=jsonschema.draft7_format_checker)

That seems to hang and there are no warnings or errors. When we drop back to jsonschema<4.5.0 (so 4.4.0) it works. I'm not sure what might be going on here or how to debug.

@mriedem
Copy link
Author

mriedem commented May 5, 2022

These are the packages we have installed FWIW:

attrs==21.4.0,awesome-progress-bar==1.7.2,certifi==2021.10.8,cffi==1.15.0,charset-normalizer==2.0.12,Deprecated==1.2.13,ibmq-deploy==1.14.2,idna==3.3,importlib-resources==5.7.1,jsonschema==4.5.0,pycparser==2.21,PyGithub==1.55,PyJWT==2.3.0,PyNaCl==1.5.0,pyrsistent==0.18.1,PyYAML==6.0,requests==2.27.1,ruamel.yaml==0.17.21,ruamel.yaml.clib==0.2.6,urllib3==1.26.9,wrapt==1.14.1,zipp==3.8.0

@mriedem
Copy link
Author

mriedem commented May 5, 2022

Running our unit tests also hangs. I killed the test runner and got this output, it looks like there is maybe a cycle in here?

========================================================================================== ERRORS ===========================================================================================
______________________________________________________________________ ERROR at setup of test_should_build_pr_no_push _______________________________________________________________________

module = <module 'test_build' from '/home/osboxes/ibmq/deploy-tool/tests/test_build.py'>

    def setup_module(module):
>       build.CONF.load()

tests/test_build.py:14: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ibmq_deploy/config.py:28: in load
    jsonschema.validate(
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:1036: in validate
    cls.check_schema(schema)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:201: in check_schema
    for error in cls(cls.META_SCHEMA).iter_errors(schema):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:370: in allOf
    yield from validator.descend(instance, subschema, schema_path=index)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:299: in ref
    yield from validator.descend(instance, resolved)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:340: in properties
    yield from validator.descend(
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:47: in additionalProperties
    yield from validator.descend(instance[extra], aP, path=extra)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:320: in dynamicRef
    yield from validator.descend(instance, subschema)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:370: in allOf
    yield from validator.descend(instance, subschema, schema_path=index)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:299: in ref
    yield from validator.descend(instance, resolved)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:340: in properties
    yield from validator.descend(
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:47: in additionalProperties
    yield from validator.descend(instance[extra], aP, path=extra)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:320: in dynamicRef
    yield from validator.descend(instance, subschema)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:370: in allOf
    yield from validator.descend(instance, subschema, schema_path=index)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:299: in ref
    yield from validator.descend(instance, resolved)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:340: in properties
    yield from validator.descend(
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:320: in dynamicRef
    yield from validator.descend(instance, subschema)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:370: in allOf
    yield from validator.descend(instance, subschema, schema_path=index)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:299: in ref
    yield from validator.descend(instance, resolved)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:340: in properties
    yield from validator.descend(
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:47: in additionalProperties
    yield from validator.descend(instance[extra], aP, path=extra)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:313: in dynamicRef
    extended_schema = dynamic_anchor_extender(
.tox/py38/lib/python3.8/site-packages/jsonschema/_utils.py:406: in dynamic_anchor_extender
    extender_schema = _find_dynamic_anchor_intermediate(
.tox/py38/lib/python3.8/site-packages/jsonschema/_utils.py:386: in _find_dynamic_anchor_intermediate
    for subschema in search_schema(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

schema = {'$comment': 'This meta-schema also defines keywords that have appeared in previous drafts in order to prevent incompa... '$id': 'https://json-schema.org/draft/2020-12/schema', '$schema': 'https://json-schema.org/draft/2020-12/schema', ...}
matcher = <function match_keyword.<locals>.matcher at 0x7f9d92b7e4c0>

    def search_schema(schema, matcher):
        """Breadth-first search routine."""
        values = deque([schema])
        while values:
            value = values.pop()
            if isinstance(value, list):
                values.extendleft(value)
                continue
            if not isinstance(value, dict):
                continue
            yield from matcher(value)
>           values.extendleft(value.values())
E           Failed: Timeout >120.0s

.tox/py38/lib/python3.8/site-packages/jsonschema/_utils.py:431: Failed

@Julian
Copy link
Member

Julian commented May 5, 2022

I'd need some sort of reproducer -- there are plenty of tests for validate, so it's certainly not failing in all cases. What schema and instance are you validating either in your tests or real code?

@mriedem
Copy link
Author

mriedem commented May 5, 2022

This is the file that defines our schema:

"""The json schema for the config file."""

STRING = {"type": "string"}
BOOL = {"type": "boolean"}

DEPLOY_SCHEMA = {
    "type": "object",
    "properties": {
        "resource_group": STRING,
        "region": STRING,
        "cluster": STRING,
        "openshift": BOOL,
        "chart": STRING,
        "cloud_secret": STRING,
        "namespace": STRING,
        "tag_name": STRING,
        "image_tag": STRING,
        "items": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "jumpurl": STRING,
                    "name": STRING,
                    "namespace": STRING,
                    "value_file": STRING,
                    "secret_file": STRING,
                    "resource_group": STRING,
                    "region": STRING,
                    "cluster": STRING,
                    "openshift": BOOL,
                    "pr": BOOL
                },
                "required": ["name"]
            }
        }
    },
    "required": ["resource_group", "region", "cluster", "chart", "image_tag", "items"]
}

DEPLOYMENTS_SCHEMA = {
    "type": "object",
    "items": DEPLOY_SCHEMA
}


BRANCH_SCHEMA = {
    "type": "object",
    "properties": {
        "name": STRING,
        "cloud_secret": STRING,
        "ns": STRING
    },
    "required": ["name"]
}

IMAGE_SCHEMA = {
    "type": "object",
    "properties": {
        "registry": {
            "type": ["string", "array"],
            "items": STRING
        },
        "namespace": STRING,
        "name": STRING,
        "branches": {
            "type": "array",
            "items": BRANCH_SCHEMA
        }
    },
    "required": ["registry", "namespace", "name", "branches"]
}

HELM_SCHEMA = {
    "type": "object",
    "properties": {
        "repos": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": STRING,
                    "url": STRING
                },
                "required": ["name", "url"]
            }
        }
    },
    "required": ["repos"]
}

# Which tools are instalable
TOOLS_SCHEMA = {
    "type": "object",
    "properties": {
        "helm": {"type": ["number", "string"]},
        "helm_timeout": {"type": ["string"]}
    }
}

SCHEMA = {
    "type": "object",
    "properties": {
        "tools": TOOLS_SCHEMA,
        "deployments": DEPLOYMENTS_SCHEMA,
        "image": IMAGE_SCHEMA,
        "helm": HELM_SCHEMA
    },
}

This is the yaml file being validated:

tools:
  helm: 3

deployments:
  master:
    resource_group: Support Services
    region: us-south
    cluster: support-services
    chart: q-site
    image_field: qsite_image_tag
    namespace: q-site-staging
    items:
      - name: q-site-dev
        value_file: values/values-master.yaml
  production:
    resource_group: Support Services
    region: us-south
    cluster: support-services
    chart: q-site
    image_field: qsite_image_tag
    namespace: q-site-prod
    items:
      - name: q-site-prod
        value_file: values/values-prod.yaml

@Julian
Copy link
Member

Julian commented May 5, 2022

Thanks, it'd also be helpful if you minimized that to the smallest hanging example.

@mriedem
Copy link
Author

mriedem commented May 5, 2022

it'd also be helpful if you minimized that to the smallest hanging example

I'm not sure what you mean. We had some code that hung validating that exact yaml file (which is parsed and read in using ruamel.yaml.YAML().load() as above) using the SCHEMA object above. I'm not sure what in there is causing the hang exactly outside of that traceback from the hung unit test.

@Julian
Copy link
Member

Julian commented May 5, 2022

What I mean is it's helpful to me or anyone who can spare time to debug if you provide a minimal working example of the issue rather than one with a lot of extra unnecessary complexity. E.g. the issue almost certainly persists if you remove say, the chart properly in DEPLOY_SCHEMA. I'm asking for the smallest possible example demonstrating the problem. If you don't have time to do so you can certainly leave it as is though and someone else may come along to help.

@progval
Copy link

progval commented May 5, 2022

I have this smaller example from matrix-org/synapse#12649 :

import jsonschema

_OEMBED_PROVIDER_SCHEMA = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "endpoints": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "url": {"type": "string"},
                    },
                },
            },
        },
    },
}

config = [
    {
        "endpoints": [
            {
                "url": "https://publish.twitter.com/oembed",
            }
        ],
    }
]

jsonschema.validate(config, _OEMBED_PROVIDER_SCHEMA)

It seems to be minimal

bodnarbm added a commit to Enterprise-CMCS/macpro-mdct-carts that referenced this issue May 5, 2022
4.5.0 is experiencing an infinite loop. python-jsonschema/jsonschema#941
@DMRobertson
Copy link

Bisecting against that test example, it seems to be introduced in #886.

@Julian
Copy link
Member

Julian commented May 5, 2022

That's really the only change in the release, so it's definitely there, but will still require some minimizing to figure out what the issue is.

I'll see if I have a bit of time in a few hours if someone doesn't see the issue by then. (And thanks all for the info so far)

@Julian
Copy link
Member

Julian commented May 5, 2022

The example there seems to complete, it's just (very) slow. Specifically here it completes in ~11s. My guess is the original example does too, just even slower, and that the issue is again some missing caching unfortunately.

Julian added a commit that referenced this issue May 5, 2022
It needs performance optimization. See #941.

This reverts commit 12c791e.
@mmb-davidsmith

This comment was marked as outdated.

@mriedem

This comment was marked as outdated.

@Julian Julian changed the title validate hangs in 4.5.0 Very slow validation after dynamicRef implementation, even with schemas which do not use dynamicRef other than in their metaschema May 24, 2022
@Julian Julian added the Bug Something doesn't work the way it should. label Jun 7, 2022
@Julian
Copy link
Member

Julian commented Feb 23, 2023

Hello all!

This, along with many many other $ref-related issues, is now finally being handled in #1049 with the introduction of a new referencing library which is fully compliant and has APIs which I hope are a lot easier to understand and customize.

The next release of jsonschema (v4.18.0) will contain a merged version of that PR, and should be released shortly in beta, and followed quickly by a regular release, assuming no critical issues are reported.

It looks from my testing like indeed the examples from this thread work reasonably there -- i.e. aren't unusably slow! If you still care to, I'd love it if you tried out the beta once it is released, or certainly it'd be hugely helpful to immediately install the branch containing this work (https://github.com/python-jsonschema/jsonschema/tree/referencing) and confirm. You can in the interim find documentation for the change in a preview page here.

I'm going to close this given it indeed seems like it is addressed by #1049, but feel free to follow up with any comments. Sorry for the delay in getting to these, but hopefully this new release will bring lots of benefit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something doesn't work the way it should.
Projects
None yet
Development

No branches or pull requests

5 participants