Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix assert tool url normalization #2778

Merged
merged 9 commits into from
Oct 20, 2024

Conversation

janhoy
Copy link
Contributor

@janhoy janhoy commented Oct 18, 2024

Draft PR to get a green state in branch_9x

Tests used to pass on branch_9x a few days ago. Some of the many SolrCLI changes touched the urlNormalization stuff so that you needed to add /solr at the end of some URLs passed to Solr.

In particular this was true for AssertTool, which I fix here by getting the SolrClient from SolrCLI util method instead of constructing an Http2HttpSolrClient manually. The difference is that the former adds a /solr before constructing the client, and this is consistent with other client initializations in AssertTool.

The whole Solr Base Path and /solr suffix is a mess, and it is different in main than in branch_9x. So this will keep cropping up for as long as we backport stuff from 10 to 9 :(

@janhoy janhoy marked this pull request as draft October 18, 2024 07:54
@janhoy janhoy requested a review from epugh October 18, 2024 08:28
@epugh
Copy link
Contributor

epugh commented Oct 18, 2024

@janhoy thanks for taking a stab at this... I definitly have been playing "whack a mole" of adding and removing /solr/'s in various places. I will go through this today.... I appreciate your digging in with fresh eyes on this problem.

@malliaridis
Copy link
Contributor

malliaridis commented Oct 18, 2024

The whole Solr Base Path and /solr suffix is a mess, and it is different in main than in branch_9x. So this will keep cropping up for as long as we backport stuff from 10 to 9 :(

I would love to have less classes touching the URL and base path. This was also one of the goals of the merge in #2756. Unifying the options that accept a URL should reduce the possible input variations, allowing us to simplify the logic internally.

About the remaining failing test PostToolTest#testBasicRun: currently we pass a URL with --solr-url that includes /solr, but this is probably wrong. A /solr path in the URL is only allowed in --solr-update-url, which was deprecated and replaced in #2756.

This is specific to our 9x branch.  Not sure if it needs to be on other Tools as well.
@epugh
Copy link
Contributor

epugh commented Oct 19, 2024

Okay, so now the tests all passed @janhoy.. I kind of hacked in something to work for PostTool, I actually thought it would be needed on more other tools...

hostContext = "/solr";
}

solrUrl = SolrCLI.normalizeSolrUrl(solrUrl, true, hostContext) + hostContext;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of the same workaround, and guess I'm medium happy about it. Would of course prefer a SolrCLI util method that hides this such as SolrCLI.getAndNormalizeSolrUrl(cli), and consider using it wherever we do raw cli.getOptionValue("solr-url")`, since we'd potentially run into same issue elsewhere.

@janhoy janhoy marked this pull request as ready for review October 20, 2024 00:55
Copy link
Contributor

@malliaridis malliaridis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although I would like to find a better way of handling URL composition. Using the v1 API should also be discouraged in the long run.

Note that I am unable to manually test on Windows, as the script is currently broken. gradlew check finishes without errors.

@janhoy janhoy merged commit 77e2ca8 into apache:branch_9x Oct 20, 2024
4 checks passed
@janhoy janhoy deleted the SOLR-17479-fix-backport-bug branch October 20, 2024 20:35
@janhoy
Copy link
Contributor Author

janhoy commented Oct 21, 2024

@epugh, @malliaridis Turns out this still fails half the time, when hostContext is set to /. I can reproduce locally, about half the runs.

See https://github.com/apache/solr/blob/branch_9x/solr/test-framework/src/java/org/apache/solr/BaseDistributedSearchTestCase.java#L137-L145

So we need some kind of extra check that we don't get two // in a row I suppose.

@malliaridis
Copy link
Contributor

malliaridis commented Oct 21, 2024

@janhoy For some reason I am not surprised. I guess a hostContext that contains only / may be valid from the user's perspective, but we cannot handle it without additional workarounds and the current logic.

I would prefer to use a URL builder which we can feed the path segments with and without leading and trailing slashes and it will take care of everything. Otherwise we would have to add just another workaround, like

if (hostContext.equals("/")) {
  hostContext = "";
}

right after the current workaround if (hostContext.isBlank()) {...}. Maybe it's time to reconsider the normalization logic?Concatenating URL paths should definitely not be done by simply appending the paths, as we cannot guarantee the presence / absence of slashes.

Using URL builders
I looked into the implementation of org.apache.http.client.utils.URIBuilder and jakarta.ws.rs.core.UriBuilder (both present right now) and only the jakarta implementation can take leading and trailing slashes into account, if used properly. This means that code like:

solrUpdateUrl = UriBuilder.fromUri(SolrCLI.normalizeSolrUrl(solrUrl, true, hostContext))
    .path(hostContext)
    .path(cli.getOptionValue("name")) // "testBasicRun"
    .path("/update")
    .path("/with/")
    .path("slashes/")
    .path("/again")
    .build();
// generates http://127.0.0.1:52371/solr/testBasicRun/update/with/slashes/again

would correctly generate the path. Maybe we could use a more robust solution like that?

Note that using UriBuilder#segment which looks like a more suitable function would escape the slashes:

solrUpdateUrl = UriBuilder.fromUri(SolrCLI.normalizeSolrUrl(solrUrl, true, hostContext) + hostContext)
   .segment(cli.getOptionValue("name"), "/update", "/with/", "slashes/", "/again")
   .build();
// generates http://127.0.0.1:52215/solr/testBasicRun/%2Fupdate/%2Fwith%2F/slashes%2F/%2Fagain

@epugh
Copy link
Contributor

epugh commented Oct 21, 2024

Of the two URI builders, I think we would want the jakarta since at some point Apache HTTP Client goes away.

What about just looking for // and removing one of them? In the 9x line, does that get us through the problem.

I remember when we looked at the hostContext a while ago and the reason we changed to just requiring /solr was because we thought it wasn't properly supported to use anything else anyway in Solr. Like, it might pass in the tests, but in the real world it would fail. What if we just "fix" the BaseDistributedSolrTestCase to not do anything but /solr.....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants