-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPA: improve error messages returned to the end user #796
Conversation
@@ -10,7 +10,7 @@ import ( | |||
"testing" | |||
) | |||
|
|||
func TestAccessor(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems as copy pasta...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes. We contributing upstream, we can even split this PR in 2. The first one with this fix and the equivalent you did below. At least such PR should be accepted without problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My plan:
- merge it to our fork as a single PR to our master (and release branch)... I will now read our instructions on release procedure
- after we release and deploy I plan to prepare the PRs to upstream
- then cherry pick specific commits and open 2 PRs upstream if your think its necessary to split this changes
@marco-m-pix4d fishing for feedback |
Two things I forgot to mention:
In any case, its easy to do the same change here so, opinions?
The question is: is this error reported back to the use if e.g. is using fly? I will test this part now... I already know for example that For other commands, like fly set-pipeline I need to ensure that this msg reaches the end user in full... I don't remember seeing anything else than |
9f2f114
to
71b7bf7
Compare
@@ -205,7 +205,7 @@ var _ = Describe("OPA Policy Checker", func() { | |||
It("should return error", func() { | |||
result, err := agent.Check(policy.PolicyCheckInput{}) | |||
Expect(err).To(HaveOccurred()) | |||
Expect(err.Error()).To(ContainSubstring("invalid character 'h' looking for beginning of value")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so ironic I thought you invented it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I know it's even worse... The code from current master would craft this error msg:
parsing opa results: invalid character 'h' looking for beginning of value
and this is not the error msg end-user was getting... but instead end user saw only:
invalid character 'p' looking for beginning of value
because another JSON parser got the body starting with parsing...
:)
Unbelievable...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because another JSON parser got the body starting with parsing...
Ah. Can we fix also this properly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did... 4th commit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. It was easier than I though, better like this.
@marco-m-pix4d please read also: #796 (comment) |
Ah OK. The root cause is that the Go json library sometimes doesn't prefix its error messages with "json", and sometimes it does, so we even risk stuttering ("parsing JSON: json: xxx") :-( So we can do the following: Grep for all "invalid character 'h' looking for beginning of value`" in the tests for this area of the code and see if we can prefix the error with "parsing JSON" as we are doing here. |
So I managed to solve the issue with both (maybe all commands)... As I expected the difference of behaviour between commands arrived from usage of different clients:
currently if the response is forbidden by the e.g. OPA just the status code is returned without any reason.
what we actually need is:
to print also the reason from the response body. Note that this simple fix actually affect many many commands. I would even say majority of the commands suffered from the same issue. Note also that these commands using connection client don't have a problem when there is an BadRequest error:
because they would return the proper error msg to the end user.
Even though we fixed the server side already, I was still saying: The cause of the error arrives from:
So my explanation is this: originally you enter the
The problem is that now with OPA (but maybe also due to other cases) one can not expect that body is in JSON format, and especially you should not expect that it can be converted to atc.SaveConfigResponse type. so when server was returning a body with an error msg like:
we were just returning the error msg from the JSON parser, which is this time:
even though OPA policy checker returned:
we have:
seen by the end user... My first attempt to fix this is:
This prints what we want to the end user, but we lost the actual error msg from the json.Unmarshal just above... @marco-m-pix4d FYI |
fe42ea1
to
8cb0c3c
Compare
Summary of changes:
What user saw before:
What it sees now?
What user saw before:
What it sees now?
|
@@ -30,7 +30,7 @@ func ParseOpaResult(bytesResult []byte, opaConfig OpaConfig) (opaResult, error) | |||
var results vars.StaticVariables | |||
err := json.Unmarshal(bytesResult, &results) | |||
if err != nil { | |||
return opaResult{}, err | |||
return opaResult{}, fmt.Errorf("parsing JSON: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I am still not 100% happy with the error msg user sees:
policy check error: parsing OPA results: parsing JSON: invalid character 'h' looking for beginning of value
for sure its better than before:
invalid character 'p' looking for beginning of value
because I am somehow lacking the information that this is because the response body from the OPA server is not a valid JSON.. or in other words that the problem is because OPA server being misconfigured and sending something unexpected...
Can we append something like: Hint: targeted OPA server seems to be misconfigured
Just as an example, I will give the error msg from the fly when Concourse is not reachable...
could not reach the Concourse server called developers:
Get "https://concourse-web/api/v1/info": dial tcp XX.XX.XX.XX:443: i/o timeout
is the targeted Concourse running? better go catch it lol
e.g. this seems quite clear. Can we do something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think about it, from the point of view of the end user (the dev setting the pipeline), nothing changes:
policy check error: parsing OPA results: parsing JSON: invalid character 'h' looking for beginning of value
policy check error: parsing OPA results: parsing JSON: invalid character 'h' looking for beginning of value. Hint: targeted OPA server seems to be misconfigured
In both cases, the user will ask for help on the channel: he cannot do anything about it and he is still confused.
I really resist this "Hint" because it is adding a patch to something that is broken already.
If we really want to give a better message to the user, I think we can, because we control the system (Concourse). This is different from cogito: the system is GitHub that we do not control.
So we should be able to rework some more logic how Concourse interacts with the OPA server and detect the real problem. Once we do that, then we could report something like
policy check error: OPA server misconfigured
Now, I don't even understand what we mean by "misconfigured".:
- Do we mean that the OPA server configuration is wrong? (so: independently from Concourse).
- Do we mean that the Concourse settings about the OPA server is wrong?
To summarize, I think that this PR already made the situation way better and we should merge and deploy.
I am OK if we want to create a ticket, with some of the discussion here, to go more at the root of the problem (assuming I am not misunderstanding something).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Said in another way: this PR is too big to follow in details, at least for me. All the discussions here are useful, but we can still do the secondary improvements in a separate ticket, I think.
The PR change quite a bit since the approval..
8cb0c3c
to
8255e89
Compare
} | ||
defer resp.Body.Close() | ||
|
||
statusCode := resp.StatusCode | ||
if statusCode != http.StatusOK { | ||
return nil, fmt.Errorf("opa returned status: %d", statusCode) | ||
return nil, fmt.Errorf("OPA server returned status: %d", statusCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to do something with the response body here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful e.g. when you configure wrong value for:
CONCOURSE_OPA_URL: http://opa:8181/v1/data/concourse/decision
and opa is there but the path is wrong or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine if we do as in other places and add also the body to the error.
- the existing names seem to be a copy paste mistake Signed-off-by: aliculPix4D <[email protected]>
Signed-off-by: aliculPix4D <[email protected]>
Majority of fly commands still use the deprecated "connection" client and when and HTTP request is forbidden by the ATC server e.g. when OPA action is not allowed (or any other internal Concourse reason) the error message is not printed to the end user, leaving them in questioning why. This is especially important when OPA block the action in question, because in that case its imperative the end-user see the reason why the action was blocked by the OPA. Signed-off-by: aliculPix4D <[email protected]>
…o misconfigured OPA server) the response body is not a parsablble JSON, and hence instead of returning the JSON parser error, we need to return an actual response body containing the reason for the error. e.g. in case of OPA error the ATC server response body will never be a parsable JSON and instead its string (byte) starting with: > policy check error: reason This will always be unparsable by the JSON parser used in the HTTP client and instead of returning JSON parser error from the client itself (`invalid character 'p' looking for beginning of value`) we want to return the respons body error msg returned by the ATC server. This is especially important e.g. when a OPA server is misconfigured. Instead of: > error: invalid character 'p' looking for beginning of value end user will see: > error: Unexpected Response > Status: 400 Bad Request > Body: policy check error: reason Signed-off-by: aliculPix4D <[email protected]>
8255e89
to
65b1850
Compare
@Pix4D/integration ready for review... I split the work into 4 commits. The two very important ones (third and fourth) have longer commit msg explaining the reasoning why they are required (shorter version of my PR comments) |
@@ -353,7 +353,7 @@ var _ = Describe("ATC Connection", func() { | |||
}) | |||
}) | |||
|
|||
Describe("403 response", func() { | |||
Describe("HTTPAgent: 403 response", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they had a test for 403 response for what they call HTTPagent
client, but not for what they call Connection
client
@Pix4D/integration one more note... I think reading carefully: before reviewing will help you a lot to understand what is going on... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left my detailed reasoning in my latest inline comment.
I think we can proceed (unless I am missing something).
Thanks!
@@ -96,7 +96,11 @@ func (team *team) CreateOrUpdatePipelineConfig(pipelineRef atc.PipelineRef, conf | |||
var validationErr atc.SaveConfigResponse | |||
err = json.Unmarshal(body, &validationErr) | |||
if err != nil { | |||
return false, false, []ConfigWarning{}, err | |||
return false, false, []ConfigWarning{}, internal.UnexpectedResponseError{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marco-m-pix4d I am curious to hear/read your opinion on this part...
if you follow closely:
- before we were returning the
err
which is a useless error from the JSON parse on L97... - now we return
internal.UnexpectedResponseError
which I craft using response body and status code, but I never actually use the uselesserr
anywhere... I think it is not needed to append thiserr.Error()
because this error msg would always look the sameinvalid character
.
Personally, I was uncomfortable just ignoring the err
like that... We usually don't have this practice... but I don't think it would bring any value appending it.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the reasoning. As usual, the answer is that it depends.
The problem is that json.Unmarshal can, at least in theory, return any type of error, not only parse errors.
If the json package had an error sentinel, say ErrParse, then one could say:
I check for json.ErrParse. If it matches, then I do not pass it to internal.UnexpectedResponseError{}.
If on the other hand it doesn't match, then it can be helpful to know what has happened, so in this case I pass it to internal.UnexpectedResponseError{}.
Also if json.ErrParse existed, doing the reasoning just above would still be risky, because it is possible that a new version of the json
package added another error sentinel, and so on...
So my approach is to almost always include errors. If I know like in this case that it can be confusing, I add it in parenthesis "(%s)" or similar...
As discussed in video chat, I am closing this. I will split it in smaller separate PRs as suggested. |
Draft PR just to fish for some initial feedback and to show what changes are required here and in our ci-for-concourse repo..
Related to: https://github.com/Pix4D/ci-for-concourse/pull/18