-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid result after merge operation (related to Element#mergeText) #28
Comments
Please provide the original XML file you are trying to translate so I can reproduce the issue. There are no external dependencies for this project, so there is no need for Maven or a |
FWIW, the lines of code that you mention are correct. They do exactly what they are supposed to do. Please provide the source file for testing or close this issue. |
Hello @rmraya, Everything you need to reproduce this issue is provided above. Here are step by step instructions:
This will produce a To my understanding the produced result is not valid but if you think everything behaves as expected, feel free to close this issue |
@ajacob I need the original XML used to create the XLIFF. I want to perform a full roundtrip: generate XLIFF, translate and merge. |
@ajacob You mention that you made changes to the XLIFF, I want to see the version generated by OpenXLIFF |
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<NIV1><G> Bonjour. </G> Ce text devrait être traduit </NIV1>
</ROOT>
and obtained this: <?xml version="1.0" encoding="UTF-8" ?>
<xliff srcLang="fr-FR" xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0" version="2.0" xmlns:mda="urn:oasis:names:tc:xliff:metadata:2.0" trgLang="en-US" xmlns="urn:oasis:names:tc:xliff:document:2.0">
<file original="/Users/rmraya/Desktop/original.xml" id="1">
<skeleton href="/Users/rmraya/Desktop/original.xml.skl"/>
<mda:metadata>
<mda:metaGroup category="format">
<mda:meta type="datatype">xml</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="tool">
<mda:meta type="tool-id">OpenXLIFF</mda:meta>
<mda:meta type="tool-name">OpenXLIFF Filters</mda:meta>
<mda:meta type="tool-version">3.16.0 20231029_1141</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="PI">
<mda:meta type="encoding">UTF-8</mda:meta>
</mda:metaGroup>
</mda:metadata>
<unit id="1">
<segment id="1-0">
<source xml:space="preserve"> Bonjour.</source>
</segment>
<ignorable id="1-1">
<source xml:space="preserve"> </source>
</ignorable>
</unit>
<unit id="2">
<segment id="2">
<source xml:space="preserve"> Ce text devrait être traduit </source>
</segment>
</unit>
</file>
</xliff>
<?xml version="1.0" encoding="UTF-8" ?>
<xliff srcLang="fr-FR" xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0" version="2.0" xmlns:mda="urn:oasis:names:tc:xliff:metadata:2.0" trgLang="en-US" xmlns="urn:oasis:names:tc:xliff:document:2.0">
<file original="/Users/rmraya/Desktop/original.xml" id="1">
<skeleton href="/Users/rmraya/Desktop/original.xml.skl"/>
<mda:metadata>
<mda:metaGroup category="format">
<mda:meta type="datatype">xml</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="tool">
<mda:meta type="tool-id">OpenXLIFF</mda:meta>
<mda:meta type="tool-name">OpenXLIFF Filters</mda:meta>
<mda:meta type="tool-version">3.16.0 20231029_1141</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="PI">
<mda:meta type="encoding">UTF-8</mda:meta>
</mda:metaGroup>
</mda:metadata>
<unit id="1">
<segment state="final" id="1-0">
<source xml:space="preserve"> Bonjour.</source>
<target xml:space="preserve"> Hello.</target>
</segment>
<ignorable id="1-1">
<source xml:space="preserve"> </source>
</ignorable>
</unit>
<unit id="2">
<segment state="final" id="2">
<source xml:space="preserve"> Ce text devrait être traduit </source>
<target xml:space="preserve"> This text should be translated </target>
</segment>
</unit>
</file>
</xliff>
and obtained this output file: <?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<NIV1><G> Hello. </G> This text should be translated </NIV1>
</ROOT> I don't know what you did, but the code seems to work. |
I'll try to give you more information / STR, and hopefully help you reproduce this issue. I've created a very simple sample that involves another tool that add translations into the SETUPinput.xml <?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<NIV1><G>Bonjour.</G> Ceci est un text multi-ligne
qui a pour objectif de mettre en évidence
un simple bug. J'espère que ça aidera.</NIV1>
</ROOT> config_ROOT.xml <?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ini-file PUBLIC "-//MAXPROGRAMS//Converters 2.0.0//EN" "configuration.dtd" >
<ini-file>
<tag hard-break="segment">NIV1</tag>
<tag ctype="x-bold" hard-break="inline">G</tag>
</ini-file> CONVERT
test.xlf <?xml version="1.0" encoding="UTF-8" ?>
<xliff srcLang="fr" xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0" version="2.1" xmlns:mda="urn:oasis:names:tc:xliff:metadata:2.0" trgLang="en-US" xmlns="urn:oasis:names:tc:xliff:document:2.0">
<file original="/input.xml" id="1">
<skeleton href="test.skl"/>
<mda:metadata>
<mda:metaGroup category="format">
<mda:meta type="datatype">xml</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="tool">
<mda:meta type="tool-id">OpenXLIFF</mda:meta>
<mda:meta type="tool-name">OpenXLIFF Filters</mda:meta>
<mda:meta type="tool-version">3.16.0 20231029_1141</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="PI">
<mda:meta type="encoding">UTF-8</mda:meta>
</mda:metaGroup>
</mda:metadata>
<unit id="1">
<mda:metadata id="1">
<mda:metaGroup category="attributes" id="ph0">
<mda:meta type="ctype">x-bold</mda:meta>
</mda:metaGroup>
</mda:metadata>
<originalData>
<data id="ph0"><G></data>
<data id="ph1"></G></data>
</originalData>
<ignorable>
<source xml:space="preserve"><ph id="ph0"/></source>
</ignorable>
<segment id="1-0">
<source xml:space="preserve">Bonjour.</source>
</segment>
<ignorable>
<source xml:space="preserve"><ph id="ph1"/></source>
</ignorable>
<segment id="1-1">
<source xml:space="preserve"> Ceci est un text multi-ligne qui a pour objectif de mettre en évidence un simple bug.</source>
</segment>
<segment id="1-2">
<source xml:space="preserve"> J'espère que ça aidera.</source>
</segment>
</unit>
</file>
</xliff> test.skl <?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<NIV1>%%%1%%%
</NIV1>
</ROOT> TRANSLATIONThis is an important step as the xlf file is modified by another tool that:
test.xlf <?xml version="1.0" encoding="UTF-8" ?>
<xliff srcLang="fr" xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0" version="2.1" xmlns:mda="urn:oasis:names:tc:xliff:metadata:2.0" trgLang="en-US" xmlns="urn:oasis:names:tc:xliff:document:2.0">
<file original="/home/alexandre/hello/input.xml" id="1">
<skeleton href="test.skl"/>
<mda:metadata>
<mda:metaGroup category="format">
<mda:meta type="datatype">xml</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="tool">
<mda:meta type="tool-id">OpenXLIFF</mda:meta>
<mda:meta type="tool-name">OpenXLIFF Filters</mda:meta>
<mda:meta type="tool-version">3.16.0 20231029_1141</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="PI">
<mda:meta type="encoding">UTF-8</mda:meta>
</mda:metaGroup>
</mda:metadata>
<unit id="1">
<mda:metadata id="1">
<mda:metaGroup category="attributes" id="ph0">
<mda:meta type="ctype">x-bold</mda:meta>
</mda:metaGroup>
</mda:metadata>
<originalData>
<data id="ph0"><G></data>
<data id="ph1"></G></data>
</originalData>
<ignorable>
<source xml:space="preserve">
<ph id="ph0"/>
</source>
</ignorable>
<segment id="1-0">
<source xml:space="preserve">Bonjour.</source>
<target>Hello.</target>
</segment>
<ignorable>
<source xml:space="preserve">
<ph id="ph1"/>
</source>
</ignorable>
<segment id="1-1">
<source xml:space="preserve"> Ceci est un text multi-ligne qui a pour objectif de mettre en évidence un simple bug.</source>
<target>This is a multi-line text that is supposed to demo a simple bug.</target>
</segment>
<segment id="1-2">
<source xml:space="preserve"> J'espère que ça aidera.</source>
<target> I hope this helps.</target>
</segment>
</unit>
</file>
</xliff> MERGE
result.xml <?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<NIV1>
<G>
Bonjour.
Hello.
</G>
Ceci est un text multi-ligne qui a pour objectif de mettre en évidence un simple bug. J'espère que ça aidera.This is a multi-line text that is supposed to demo a simple bug. I hope this helps.</NIV1>
</ROOT> And as you can see, we now have duplicated content. To help you better understand the changes in the xlf file, here is the diff (generated by another tool) --- test.xlf 2023-11-02 23:50:14.943475711 +0100
+++ test.xlf 2023-11-02 23:50:19.660208235 +0100
@@ -26,20 +26,27 @@
<data id="ph1"></G></data>
</originalData>
<ignorable>
- <source xml:space="preserve"><ph id="ph0"/></source>
+ <source xml:space="preserve">
+ <ph id="ph0"/>
+ </source>
</ignorable>
<segment id="1-0">
<source xml:space="preserve">Bonjour.</source>
+ <target>Hello.</target>
</segment>
<ignorable>
- <source xml:space="preserve"><ph id="ph1"/></source>
+ <source xml:space="preserve">
+ <ph id="ph1"/>
+ </source>
</ignorable>
<segment id="1-1">
<source xml:space="preserve"> Ceci est un text multi-ligne qui a pour objectif de mettre en évidence un simple bug.</source>
+ <target>This is a multi-line text that is supposed to demo a simple bug.</target>
</segment>
<segment id="1-2">
<source xml:space="preserve"> J'espère que ça aidera.</source>
+ <target> I hope this helps.</target>
</segment>
</unit>
</file> |
Your sample does not need a custom filter configuration. I used this command to convert
and got this XLIFF: <?xml version="1.0" encoding="UTF-8" ?>
<xliff srcLang="fr" xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0" version="2.0" xmlns:mda="urn:oasis:names:tc:xliff:metadata:2.0" trgLang="en" xmlns="urn:oasis:names:tc:xliff:document:2.0">
<file original="/Users/rmraya/Desktop/input.xml" id="1">
<skeleton href="/Users/rmraya/Desktop/input.xml.skl"/>
<mda:metadata>
<mda:metaGroup category="format">
<mda:meta type="datatype">xml</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="tool">
<mda:meta type="tool-id">OpenXLIFF</mda:meta>
<mda:meta type="tool-name">OpenXLIFF Filters</mda:meta>
<mda:meta type="tool-version">3.16.0 20231029_1141</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="PI">
<mda:meta type="encoding">UTF-8</mda:meta>
</mda:metaGroup>
</mda:metadata>
<unit id="1">
<segment id="1">
<source xml:space="preserve">Bonjour.</source>
</segment>
</unit>
<unit id="2">
<segment id="2-0">
<source xml:space="preserve"> Ceci est un text multi-ligne qui a pour objectif de mettre en évidence un simple bug.</source>
</segment>
<segment id="2-1">
<source xml:space="preserve"> J'espère que ça aidera.</source>
</segment>
</unit>
</file>
</xliff> I translated the XLIFF with Swordfish and got this translated version: <?xml version="1.0" encoding="UTF-8" ?>
<xliff srcLang="fr" xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0" version="2.0" xmlns:mda="urn:oasis:names:tc:xliff:metadata:2.0" trgLang="en" xmlns="urn:oasis:names:tc:xliff:document:2.0">
<file original="/Users/rmraya/Desktop/input.xml" id="1">
<skeleton href="/Users/rmraya/Desktop/input.xml.skl"/>
<mda:metadata>
<mda:metaGroup category="format">
<mda:meta type="datatype">xml</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="tool">
<mda:meta type="tool-id">OpenXLIFF</mda:meta>
<mda:meta type="tool-name">OpenXLIFF Filters</mda:meta>
<mda:meta type="tool-version">3.16.0 20231029_1141</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="PI">
<mda:meta type="encoding">UTF-8</mda:meta>
</mda:metaGroup>
</mda:metadata>
<unit id="1">
<segment state="final" id="1">
<source xml:space="preserve">Bonjour.</source>
<target xml:space="preserve">Hello.</target>
</segment>
</unit>
<unit id="2">
<segment state="final" id="2-0">
<source xml:space="preserve"> Ceci est un text multi-ligne qui a pour objectif de mettre en évidence un simple bug.</source>
<target xml:space="preserve"> This is a multi-line text that aims to highlight a simple bug.</target>
</segment>
<segment state="final" id="2-1">
<source xml:space="preserve"> J'espère que ça aidera.</source>
<target xml:space="preserve"> I hope it will help.</target>
</segment>
</unit>
</file>
</xliff> Then, I merged the file using this command:
and obtained this translated file: <?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<NIV1><G>Hello.</G> This is a multi-line text that aims to highlight a simple bug. I hope it will help.</NIV1>
</ROOT> Note that the segments of the XLIFF that you merged are not confirmed. |
The sample doesn't, the 200KB+ XML file I was working on and on which I noticed the bug does When I created this issue, my objective was not to find a workaround or find a way to make the sample work, the purpose was to show you that the merge operation can fail with valid data (not produced by Swordfish). |
As far as I can see, the merge operation works as designed. I don't see a bug. The lines of code that you mentioned are doing exactly what is intended, recover text and tags from If you want me to fix anything, provide a real test case that I can use to reproduce the problem. |
Hello, thank you for sharing this tool.
I found a problem related to merge operation, I'll try to describe it in this issue
How to reproduce
test.skl
test.xlf
command
actual xml result
expected xml result
Investigation
First, I observed that this issue appears because I formatted the xliff file.
If I rollback some changes, the result is correct. Rolled back changes are:
→
(needed as well for
ph1
)Looking into the code, I think I understand the problem.
In https://github.com/rmraya/OpenXLIFF/blob/v3.15.0/src/com/maxprograms/xliff2/FromXliff2.java#L294-L326
We have:
At this point
joinedSource
andjoinedTarget
point to the same content (same java objects per references)Then l. 326 we have:
Which internally calls
joinedSource.getContent();
which itself callsElement#mergeText
.And this is precisely were we have an issue.
When harvesting for the source we call the
mergeText
function that mutates someXMLnode
s that are also referenced by the target. After harvesting for the source, the content of target becomes invalid.To confirm this hypothesis, I simply removed the call to
#mergeText
inElement#getContent
and it actually fixes the problem.Solution
I don't know what would be the best to fix this issue, I think there are different possibilities like:
#getContent
to mutate the state ofElement
)equals
method, which can be dangerous as wellElement
objets, in which case that would requires to copyXMLNode
sIf you want me to share a PR with a fix proposal, feel free to ask.
By the way, I think it could be very helpful to introduce a
pom.xml
in order to have maven dependencies (in which case it would be necessary to publishXMLJava
in maven central)The text was updated successfully, but these errors were encountered: