Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StAXDecoder.getNamespaceCount returns wrong number of Namespace Declarations #18

Closed
buko opened this issue Jul 10, 2018 · 14 comments
Closed

Comments

@buko
Copy link

buko commented Jul 10, 2018

Have the following XML document which is generated using JAXB:

<?xml version="1.0" ?><ns2:create-resource-request-message xmlns:ns5="http://www.bubblegumproject.com/2018/polis" xmlns="" xmlns:ns3="http://www.bubblegumproject.com/2018/uia" xmlns:ns2="http://www.bubblegumproject.com/2018/fabric"><ns2:type>urn:fabric:type:eUol2jOc4XR67WEZ8jA2vg:0.0.0?=name=com.bubblegumproject.fabric:Type/Message/CreateResourceRequestMessage</ns2:type><ns2:coordinates>urn:fabric:co:wUv2GxsVDcFdqVu81XWMqw:0.0.0</ns2:coordinates><ns2:source>urn:fabric:co:JkaddGXCtF_66OBmbo94sg:0.0.0</ns2:source><ns2:timestamp>2018-07-10T09:59:09.594470900Z</ns2:timestamp><ns2:correlation-id>wUv2GxsVDcFdqVu81XWMqw</ns2:correlation-id><ns2:destination>urn:fabric:co:qEooMf4_59zR1RKnP7VvqQ:0.0.0</ns2:destination><ns2:data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns5:blog-post-data"><ns2:type>urn:fabric:type:rE2_W6kHEdHDm_aogLb5nQ:0.0.0?=name=com.bubblegumproject.polis:/Type/Blog/BlogPostData</ns2:type><ns2:coordinates>urn:fabric:co:H0DiR9nVOTaqySRJlxPGhg:0.0.0</ns2:coordinates><ns2:title><ns3:default><ns3:value>Hello World!</ns3:value><ns3:locale>en-US</ns3:locale></ns3:default><ns3:alternatives><ns3:alt><ns3:value>Hello World!</ns3:value><ns3:locale>en-US</ns3:locale></ns3:alt></ns3:alternatives></ns2:title><ns2:created-time>2018-07-10T09:59:09.590473600Z</ns2:created-time><ns2:modified-time>2018-07-10T09:59:09.590473600Z</ns2:modified-time><ns2:segment xsi:type="ns2:segment"><ns2:name>/hello-world</ns2:name></ns2:segment><ns5:content xsi:type="ns2:text-content"><ns2:type>urn:fabric:type:2EewWCsXigVak99a0Z6srQ:0.0.0?=name=com.bubblegumproject.fabric:/Type/Content/Text/Plain</ns2:type><ns2:coordinates>urn:fabric:co:-kyXnYxJjuUbiqdH1IRFpg:0.0.0</ns2:coordinates><ns2:text>Hello World!</ns2:text></ns5:content><ns5:published-time>2018-07-10T09:59:09.589474300Z</ns5:published-time></ns2:data></ns2:create-resource-request-message>

When I try to decode this using StAXDecoder and JAXB something interesting happens:

  1. While processing the first 'type' element in the document, ns2:type (which appears just after the first element in the document), StaXDecoder.getNamespaceCount() returns 5 even though the ns2:type element declares no new namespaces. (They are all declared on the root element.)
  2. Code in JAXB Runtime (https://github.com/javaee/jaxb-v2/blob/master/jaxb-ri/runtime/impl/src/main/java/com/sun/xml/bind/v2/runtime/unmarshaller/StAXStreamConnector.java#L224) will use this incorrect value to then decrease an index into the namespace lookup stack (see: https://github.com/javaee/jaxb-v2/blob/master/jaxb-ri/runtime/impl/src/main/java/com/sun/xml/bind/v2/runtime/unmarshaller/UnmarshallingContext.java#L912)
  3. Allowing this code to run to completion eventually produces an ArrayIndexOutOfBoundsException from deep inside JAXB:

Caused by: java.lang.ArrayIndexOutOfBoundsException: -50 at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.startPrefixMapping(UnmarshallingContext.java:908) at com.sun.xml.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startPrefixMapping(InterningXmlVisitor.java:85) at com.sun.xml.bind.v2.runtime.unmarshaller.StAXStreamConnector.handleStartElement(StAXStreamConnector.java:236) at com.sun.xml.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:180) at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:415) at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:386)

This document marshals and unmarshals just fine when EXI isn't used.

There is reason to believe that the implementation of STaxDecoder is wrong here. The code does not match up with the javadoc. According to the Javadoc it is supposed to return the number of Namespace Declarations on this START_ELEMENT or END_ELEMENT. What the code actually appears to be doing is simply returning the number of declared namespace prefixes that apply to this element. No attempt is made to check whether these prefixes were actually declared on this START_ELEMENT.

danielpeintner added a commit that referenced this issue Jul 10, 2018
@danielpeintner
Copy link
Member

I created a very simple initial test to reproduce your StAX issue (see https://github.com/EXIficient/exificient/blob/master/src/test/java/com/siemens/ct/exi/main/api/stream/StAXCoderTestCase.java#L243-L319) where I encode your XML snippet and try to decode it with StAX decoder..

When running it I don't see those 5 NS declarations. Instead I see the following which looks fine to me.

>> ns3 : {http://www.bubblegumproject.com/2018/fabric}create-resource-request-message
	NS: xsi : http://www.w3.org/2001/XMLSchema-instance
	NS: ns3 : http://www.bubblegumproject.com/2018/fabric
>> ns3 : {http://www.bubblegumproject.com/2018/fabric}type
	CH: urn:fabric:type:eUol2jOc4XR67WEZ8jA2vg:0.0.0?=name=com.bubblegumproject.fabric:Type/Message/CreateResourceRequestMessage
<< ns3 : {http://www.bubblegumproject.com/2018/fabric}type
>> ns3 : {http://www.bubblegumproject.com/2018/fabric}coordinates
...

Maybe it is an issue of the setup. Can you try to provide me a test-case that I can run... maybe just StAX (without JAXB) to reproduce the issue.. because JAXB requires schema et cetera.

May I ask you whether you use special EXI (fidelity) options?

Thanks,

-- Daniel

@buko
Copy link
Author

buko commented Jul 10, 2018

Hey Daniel,

Thanks for investigating.

I don't think your test quite captures the problem.

getNamespaceCount() is supposed to return the number of namespaces that were declared on a specific END_ELEMENT. Instead, StaxDecoder is returning the number of Namespaces that are in effect for that element.

The reader should only return five if the type element actually declared 5 namespaces eg:

<ns2:type xmlns:ns1="1" xmlns:ns2="2" xmlns:ns3="3">...</ns2:type>

Here getNamespaceCount() should return 3. In the test, for my snippet, getNamespaceCount() should return 0.

This corresponds with the Javadoc:

`
/**

  • Returns the count of namespaces declared on this START_ELEMENT or END_ELEMENT,
  • this method is only valid on a START_ELEMENT, END_ELEMENT or NAMESPACE. On
  • an END_ELEMENT the count is of the namespaces that are about to go
  • out of scope. This is the equivalent of the information reported
  • by SAX callback for an end element event.
  • @return returns the number of namespace declarations on this specific element
  • @throws IllegalStateException if this is not a START_ELEMENT, END_ELEMENT or NAMESPACE
    */
    public int getNamespaceCount();
    `

The problem is that this wrong count throws off the JAXB unmarshalling machinery which begins decrementing a counter thinking that since 5 namespaces were declared on the element, when that element is finished it can forget the last 5 namespaces it saw.

I've tried to provide the javadoc, the expected behavior, and a detailed explanation about how this blows up JAXB. An example could be provided that actually uses JAXB I think. It wouldn't be a bad idea to have some code that uses JAXB (JAXB doesn't require a schema, you can just annotate some javabeans) and use that for testing.

@buko
Copy link
Author

buko commented Jul 10, 2018

Note that specifically in your test on the 'END_ELEMENT' (and also the START_ELEMENT, but here END_ELEMENT is the problem) ... getNamespaceCount() should return the number of namespace prefixes declared on the 'ns2:type' element which is here zero. (All the namespace prefixes are declared on the root.)

@danielpeintner
Copy link
Member

danielpeintner commented Jul 10, 2018

I looked into the issue. The internal EXIficient API does not fully match with the StAX API. Hence the declared prefixes need to be retrieved before the end element and cached for a later call.

I updated the code (snapshot). Please let me know whether it resolves your issue.

(All the namespace prefixes are declared on the root.)

Note: One more issue/information. EXI decoders usually do not create namespaces when you expect them to happen because of XML. EXI creates namespace declaration when needed (unless one sets EXI to preserve prefixes as the were in the original XML)...

danielpeintner added a commit that referenced this issue Jul 11, 2018
* rename getNamespaceDeclaration() to getNamespaceDeclarations()
* return empty immutable list in case of null
@danielpeintner
Copy link
Member

@buko let me bring this back to the actual issue. May I ask you to confirm the following statements

<message xmlns:a="...">
  <type xmlns:b="...">
   </type>
</message>
  • getNamespaceContext() returns 1 result for message and 2 results for type
  • getNamespaceCount() returns 1 results for message and type on both StartElement and EndElement

@buko
Copy link
Author

buko commented Jul 11, 2018

Yes, I believe that is basically correct.

More specifically, based on the Javadoc I think the following behavior is desired:

message START_ELEMENT event:
 NamespaceContext = { a=a-uri }
 NamespaceCount = 1

type START_ELEMENT event:
 NamespaceContext = { a=a-uri,  b=b-uri },
 NamespaceCount = 1

type END_ELEMENT event: 
  NamespaceContext = { a=a-uri, b=b-uri },
  NamespaceCount = 1 

 message END_ELEMENT event:
  NamespaceContext = { a=a-uri },
  NamespaceCount = 1

Note that the Javadoc for getNamespaceCount says for END_ELEMENT events it is ' the count is of the namespaces that are about to go out of scope'. This implies that on the END_ELEMENT events those namespaces that are about to go out of scope are still in scope. My reading suggests that items should not be removed from the NamespaceContext until after the END_ELEMENT event if the user calls 'next()' again. (The javadoc for getNamespaceContext makes it clear NamespaceContext should only be updated on calls to next().)

In practice, the most important thing is that getNamespaceCount() reflect the number of new namespaces declared on START_ELEMENT events and the number going out of scope on END_ELEMENT events. There seems to be a lot of code in JAXB and in some validators which uses the getNamespaceCount() number to drive an internal stack data structure. A certain number of items are added on START_ELEMENT events and the same number are eventually removed on END_ELEMENT events. If this count is off we end up with the dreaded ArrayIndexOutOfBoundsException I described above.

@buko
Copy link
Author

buko commented Jul 11, 2018

It's also possible @jclark (https://github.com/jclark) might weigh in? He seems to be the one responsible of these interfaces ;)

@danielpeintner
Copy link
Member

I updated the code so that NamespaceContext reports all currently available namespaces (including the ones up in the hierarchy) while namespaceCount only reports the current namespaces for this element.

@danielpeintner
Copy link
Member

Based on c05dde0#commitcomment-29674026 I think we can close the issue, correct?

@buko
Copy link
Author

buko commented Jul 12, 2018

Would it be possible for you to release a SNAPSHOT to Maven Central?

If you release a snapshot I can take it, flip a switch and run my tests again and confirm the ArrayIndexOutOfBoundsException is gone.

It does look like the code is doing the right thing and your tests represents exactly the desired behavior.

@danielpeintner
Copy link
Member

We push updates that pass all tests by default to Sonatype (see https://oss.sonatype.org/content/repositories/snapshots/com/siemens/ct/exi/exificient/)

Hence, 1.0.1-SNAPSHOT should be available.

I think you need to add the snapshot repo to your POM so that it can be found.. something like this..

     <repositories>
       ...
       <repository>
         <id>snapshots-repo</id>
         <url>https://oss.sonatype.org/content/repositories/snapshots</url>
         <releases><enabled>false</enabled></releases>
         <snapshots><enabled>true</enabled></snapshots>
       </repository>
     </repositories>

Please let me know whether the previous errors are gone...

@buko
Copy link
Author

buko commented Jul 13, 2018

Good stuff, @danielpeintner.

Upgraded parent pom to exificient 1.0.1-SNAPSHOT. Changed all wire formats to use XmlBinaryEncoder/Decoder. Looks good so far: all tests pass and the tests are good bit faster!

We've got a snapshot dependency now... Any idea when 1.0.1 might be released?

@danielpeintner
Copy link
Member

We usually have no fixed schedule for releases.

Lets wait for end of next week. If there are no further issues popping up I think 1.0.1 can be released...

@danielpeintner
Copy link
Member

exificient 1.0.1 has been released and published to Maven today (23-Jul-2018)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants