Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content-Type confusion for zipped gml files in ATOM dataset feed #1121

Open
dhdeangelis opened this issue Oct 10, 2024 · 13 comments
Open

Content-Type confusion for zipped gml files in ATOM dataset feed #1121

dhdeangelis opened this issue Oct 10, 2024 · 13 comments
Assignees
Labels
question Further information is requested

Comments

@dhdeangelis
Copy link

dhdeangelis commented Oct 10, 2024

Using validator versions 2024.2 and 2024.3 in a local Docker instance.

Testing the following ATOM service:
https://stationsregister.miljodatasamverkan.se/docs/atom/atomTopFeed.xml

We get the following problem:
Expected 'application/zip' as Content-Type header but server returned 'application/x-zip-compressed'

This is new. We have served zip compressed gml from this ATOM service for a long time without validation issues. We serve packages called *.gml.zip with dataset feeds that have the declaration type="application/zip. This has worked well before.

Testing these files locally retrieves application/zip:

file --mime-type example.gml.zip
example.gml.zip: application/zip

We have tested replacing type="application/zip with type="application/x-gmz because as stated in the TG only types described in the registry should be used, but this has not worked either.

Other ATOM services built in the same way but serving packages called file.zip( instead of file.gml.zip ) pass validation with no issues. See for example:
https://geodata.havochvatten.se/download/fiskets-geografier/atom-topfeed.xml

It is not obvious that this is a problem with the validator, but it is very confusing anyway.

Could this be an issue with how browsers or operative systems read and manage MIME types?

As an example, a test using https://mimetype.io/ with the served files retrieves application/zip as content type, but recommends using application/zip-compressed instead.

What should be the right way to proceed here?

@fabiovinci
Copy link
Collaborator

Dear @dhdeangelis,

I don't know if you introduced further changes since the service is valid now: https://inspire.ec.europa.eu/validator/test-run/details.html?id=EID3fc03268-aab3-4b97-8b8e-bc854f2a8841

@fabiovinci fabiovinci added the question Further information is requested label Oct 14, 2024
@dhdeangelis
Copy link
Author

dhdeangelis commented Oct 16, 2024

Dear @fabiovinci thank you for testing. I see that opensearch validates OK on the production instance, yet it gives an error on "media-types" despite using a media type listed in the Inspire Registry.

https://inspire.ec.europa.eu/validator/test-run/index.html?id=EIDe51a85a6-69a7-43ff-ae0e-5555a50049dc

We will try other options, including going back to application/zip

@dhdeangelis
Copy link
Author

@fabiovinci I have now tested quite extensively and find it strange that I am getting this sort of error even with services that were testing OK jsut a few weeks ago:

https://inspire.ec.europa.eu/media-types/media-types.es.atom

image

I am starting to suspect that there may be a bug somewhere, not necessarily in the validator itself but perhaps in the chain of connections to/from the Inspire Registry. Could that be the case?

@dhdeangelis
Copy link
Author

I see it may be related to this: #1123

@fabiovinci
Copy link
Collaborator

Dear @dhdeangelis,

I confirm there is a problem with the registry.
Your service was 100% valid two days ago ;-)
Test run on 15_25 - 14.10.2024 with test suite Conformance Class Pre-defined Atom - IOS 1121.html.zip

@fabiovinci
Copy link
Collaborator

Dear @dhdeangelis,

the Registry issue has been solved.

@dhdeangelis
Copy link
Author

Dear @fabiovinci thank you for solving the issue with the Registry.

The original issue with content-type still occurs when running validator 2024.3 on a Docker in a Linux machine. However it does not occur when using the staging instance (production instance seems to be unavailable at the moment).

When testing the service from the staging instance we get it passed (green).

When running 2024.3 from a local Docker instance (Linux) we get the following error message in "Iterate over Get Spatial Dataset URL":
Expected 'application/zip' as Content-Type header but server returned 'application/x-zip-compressed' for url

All our dataset links in dataset feeds look like this example:
<link href="https://stationsregister.miljodatasamverkan.se/docs/atom/SE_EF_StnReg/SE_EF_StnReg.gml.zip" hreflang="sv" rel="alternate" title="Datamängd i komprimerad GML format" length="5671534" type="application/zip"/>

How is it possible that "application/zip" is accepted by the staging instance but not by the local docker instance?

@dhdeangelis
Copy link
Author

Dear @fabiovinci I wonder if there has been any chance to look at this.

I have now tested extensively and I am still baffled by the following error:

On one hand, this service:
https://stationsregister.miljodatasamverkan.se/docs/atom/atomTopFeed.xml

does not pass validation on the following error:
Expected 'application/zip' as Content-Type header but server returned 'application/x-zip-compressed'

the downloadable element in all its dataset feeds are formatted like this:
<link href="https://stationsregister.miljodatasamverkan.se/docs/atom/SE_EF_StnReg_DV_Badvatten/SE_EF_StnReg_DV_Badvatten.gml.zip" hreflang="sv" rel="alternate" title="Datamängd i komprimerad GML format" length="25601" type="application/zip"/>

On the other hand, this other service:
https://geodata.havochvatten.se/download/limniska-vattentypsregioner/atom-topfeed.xml

passes validation OK.

The downloadable element in all the dataset feeds look like this:
<link href="https://geodata.havochvatten.se/download/limniska-vattentypsregioner/2023-10-03/limniska-vattentypsregioner-2023-10-03.zip" rel="alternate" title="Datamängd i GML format" type="application/zip" hreflang="sv" length="22899335"/>

As you see, both services are written in the same exact way, using "application/zip" for all zipped gml files. Yet one produces an error on contentType and the other not. What could be happening here?

Tested on 2024.3 on a local docker instance in Linux.

@fabiovinci
Copy link
Collaborator

Dear @dhdeangelis,

I tested both of your services with the production instance, and both services passed the validation:

  1. https://inspire.ec.europa.eu/validator/test-run/index.html?id=EIDaf4ed973-05e0-43d1-ba10-c3dd108ae7c1
  2. https://inspire.ec.europa.eu/validator/test-run/index.html?id=EIDc04487ba-db46-4353-be32-f11d7a665632

We have encountered challenges in the past when determining the Content-Type, as the results vary over time.
In any case, testing the various links with Postman yields different results.

The link containing the .gml string is responding with the application/x-zip-compressed Content-Type. This differs from the one declared in the XML.

image

The link not containing the .gml string is responding with the application/zip Content-Type. In this case, it corresponds with the one declared in the XML.
image

Currently, the only suggestion is to remove the .gml string from the links.

In the meantime, we will continue to investigate the Content-Type response.

@dhdeangelis
Copy link
Author

Thank you @fabiovinci , that was good catch !

@dhdeangelis
Copy link
Author

@fabiovinci we have now changed all our files to *.zip but unfortunately this did not solve the issue. They are still seen as "application/x-zip-compressed" instead of "application/zip". I have no explanation for this and I cannot find much information either. There appears to be no or only a extremely subtle difference between "application/x-zip-compressed" and "application/zip". It is a shame that validations fails because of that. So, although this is most probably not a problem with the validator I do believe that perhaps the validator could be set to be less picky regarding this small issue. Thanks for all help.

@dhdeangelis
Copy link
Author

Short update on this: we found that one of our servers (Windows IIS) assigns a content type string "application/x-zip-compressed" to ZIP files. This has now been changed. Our service will be retested as soon as the changes enter production. This may be of interest for anyone using this kind of server.

@fabiovinci
Copy link
Collaborator

Dear @dhdeangelis,

that's a great discovery!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants