Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to convert OLE objects to Latex. #15

Closed
cannshui opened this issue Sep 9, 2019 · 34 comments
Closed

Trying to convert OLE objects to Latex. #15

cannshui opened this issue Sep 9, 2019 · 34 comments

Comments

@cannshui
Copy link

cannshui commented Sep 9, 2019

Env: CentOS 7.4 with root login, latest calabash-frontend

I'm trying to just convert OLE objects to LaTeX, not the whole word/docx. I think the process can be this: OLE objects -> MathML -> Latex. Seems transpect/mathtype-extension and transpect/mml2tex are necessary.

After cloned calabash-frontend into calabash recursively(git clone https://github.com/transpect/calabash-frontend.git calabash --recursive), MATHTYPE_CP was set.

Follow the example in transpect/mathtype-extension, I run the command under calabash directory:

# cd /path/to/calabash
# java -cp $MATHTYPE_CP com.xmlcalabash.drivers.Main -c extensions/transpect/transpect-config.xml extensions/transpect/mathtype-extension/xpl/mathtype-example.xpl file=file:///my/oleObject1.bin

I got ERROR output:

ERROR: http://transpect.github.io/../index.html:1:107:Not a pipeline or library: html
ERROR: err:XS0044:Unexpected step name: tr:store-debug
ERROR: It is a static error if any element in the XProc namespace or any step has element children other than those specified for it by this specification. In particular, the presence of atomic steps for which there is no visible declaration may raise this error.

I find a similar problem Catalog related problem, but can't figure it out.

Is even my idea about "convert OLE objects to LaTeX" OK?

Thank you.

@mkraetke
Copy link
Member

mkraetke commented Sep 9, 2019

The error is raised due to an import URIs that cannot be resolved because the module xproc-util is missing and there is no appropriate XML catalog entry for it. Transpect modules are identified by canonical URIs. These URIs have to be declared with XML catalogs.

mkdir test
cd test
git clone https://github.com/transpect/calabash-frontend --recursive
git clone https://github.com/transpect/xproc-util
mkdir xmlcatalog
touch xmlcatalog/catalog.xml

Then edit xmlcatalog/catalog.xml and add the catalog entry for xproc-util:

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  nextCatalog catalog="../xproc-util/xmlcatalog/catalog.xml"/>
</catalog>

When you invoke XML Calabash with Java, don't forget to pass the XML catalog as option with -Dxml.catalog.files=xmlcatalog/catalog.xml. In calabash-frontend/calabash.sh you can see how to invoke Calabash with XML resolver and XML Catalogs.

@mkraetke
Copy link
Member

mkraetke commented Sep 9, 2019

Please see also the mml2tex docs here: https://github.com/transpect/mml2tex

@gimsieke
Copy link
Contributor

gimsieke commented Sep 9, 2019

If you download a docx2tex release, you’ll have the correct catalog settings already in place – when using calabash.sh or calabash.bat; otherwise you need to set -Dxml.catalog.files to the xmlcatalog directory below the calabash frontend directory, as is done in calabash.sh.
If you only add xmlcatalog/catalog.xml below your project directory to the catalog list as Martin suggested, the mathtype extension will not be found.
calabash-frontend/xml-catalog/catalog.sh (or calabash/xml-catalog/catalog.sh if you have checked out Calabash to that directory) is the only XML catalog that you need to specify. If the calabash directory resides immediately below your project directory, it will look for your project’s xmlcatalog/catalog.xml by virtue of a nextCatalog instruction.
Please note that the docx2tex release still uses XML Calabash for Saxon 9.6. If you check out all submodules yourself, it is advisable to use the Saxon 9.8 branch of calabash-frontend.

@cannshui
Copy link
Author

cannshui commented Sep 9, 2019

Thank you very much, @mkraetke, for such a timely teaching. Follow your instructions, I succeed. The final command is something like this, if someone needs:

java -Dxml.catalog.files=xmlcatalog/catalog.xml -cp $MATHTYPE_CP com.xmlcalabash.drivers.Main -c calabash/extensions/transpect/transpect-config.xml calabash/extensions/transpect/mathtype-extension/xpl/mathtype-example.xpl file=file:///my/oleObject1.bin > oleObject1-mml.xml

@gimsieke Yes, I had downloaded docx2tex before. All things went right while using d2t or d2t.bat to convert whole docx to LaTeX. But seems so many temp files were generated, and I hardly needed, also too many files cause a little time costs. So I was thinking just convert the OLE objects, and the execution time has been shortened a lot.

Thanks for your great job again.

@cannshui cannshui closed this as completed Sep 9, 2019
@hardtrivedi
Copy link

I am getting this error by running this command:
C:\Users\Mr.Trivedi>java -Dxml.catalog.files=C:\Users\Mr.Trivedi\test\xmlcatalog\catalog.xml -cp %MATHTYPE_CP% com.xmlcalabash.drivers.Main -c C:\Users\Mr.Trivedi\test\calabash-frontend\extensions\transpect\transpect-config.xml C:\Users\Mr.Trivedi\test\calabash-frontend\extensions\transpect\mathtype-extension\xpl\mathtype-example.xpl file=F:\oleObject1.bin > oleObject1-mml.xml
ERROR: http://transpect.github.io/../index.html:1:107:Not a pipeline or library: html
ERROR: err:XS0044:Unexpected step name: tr:store-debug
ERROR: It is a static error if any element in the XProc namespace or any step has element children other than those specified for it by this specification. In particular, the presence of atomic steps for which there is no visible declaration may raise this error.
Can anyone help me regarding this error

@gimsieke
Copy link
Contributor

gimsieke commented Apr 15, 2020

@hardtrivedi See for example transpect/xml2tex#5 (comment) and transpect/xml2tex#3 (comment)
One issue could be that xml.catalog.files needs to be a (semicolon-separated list of) file URI(s) rather than file system paths, like here. The same file URI requirement might apply to other paths that you supplied in the invocation above.

@hardtrivedi
Copy link

hardtrivedi commented Apr 16, 2020

Hi, @gimsieke I try to run cmd using file URI but I am getting the same error.

java "-Dxml.catalog.files=file:///C:/Users/Mr.Trivedi/test/xmlcatalog/catalog.xml;" -cp %MATHTYPE_CP% com.xmlcalabash.drivers.Main -c C://Users//Mr.Trivedi//test//calabash-frontend//extensions//transpect//transpect-config.xml C://Users//Mr.Trivedi//test//calabash-frontend/extensions//transpect//mathtype-extension//xpl//mathtype-example.xpl file=file:///F://oleObject1.bin > oleObject1-mml.xml

OUTPUT
C:\Users\Mr.Trivedi>java "-Dxml.catalog.files=file:///C:/Users/Mr.Trivedi/test/xmlcatalog/catalog.xml;" -cp %MATHTYPE_CP% com.xmlcalabash.drivers.Main -c C://Users//Mr.Trivedi//test//calabash-frontend//extensions//transpect//transpect-config.xml C://Users//Mr.Trivedi//test//calabash-frontend/extensions//transpect//mathtype-extension//xpl//mathtype-example.xpl file=file:///F://oleObject1.bin > oleObject1-mml.xml
ERROR: http://transpect.github.io/../index.html:1:107:Not a pipeline or library: html
ERROR: err:XS0044:Unexpected step name: tr:store-debug
ERROR: It is a static error if any element in the XProc namespace or any step has element children other than those specified for it by this specification. In particular, the presence of atomic steps for which there is no visible declaration may raise this error.

can you please elaborate cmd?

@gimsieke
Copy link
Contributor

Can you paste the content of test/xmlcatalog/catalog.xml?
Is the directory test/xproc-util/store-debug present and populated?

@hardtrivedi
Copy link

hardtrivedi commented Apr 16, 2020

Hi @gimsieke
Content of test/xmlcatalog/catalog.xml is in the attachment:
catalog

Yes the directory test/xproc-util/store-debug is present and it has two folders with this files xmlcatalog-->catalog.xml and xpl-->store-debug.xpl

@gimsieke
Copy link
Contributor

Then I’m afraid without a full archive of your test directory we cannot debug this adequately. Have you tried calling calabash-frontend/calabash.bat instead of the explicit Java invocation yet?

@hardtrivedi
Copy link

hardtrivedi commented Apr 16, 2020

@gimsieke
When I tried to run calabash.bat then it is giving me this error and also my MATHTYPE_CP environment variable is already set(Using Windows System):
err

@hardtrivedi
Copy link

@gimsieke Now I am getting this after changing calabash.bat file
err1
:
I changed %classpath% to %MATHTYPE_CP% in calabash.bat file.

@gimsieke
Copy link
Contributor

Apparently you are using a Calabash with Saxon 9.9. I recommend that you use the Saxon 9.8 branch.

I just had a look at the catalog screenshot that you sent above. Is it true that nextCatalog is just text, without an opening angle bracket?
Also, it is not sufficient to only use the xproc-util as nextCatalog. I recommend that you look at the docx2tex catalog and use all the nextCatalog entries included there.

The second screenshot indicates that you didn’t supply a pipeline to invoke (or that something else doesn’t meet the requirements of invoking XML Calabash).

@gaurav-bothra
Copy link

Hi @gimsieke

I am facing same issue while converting oleObject into mathml

image

I followed @mkraetke instruction as well as you mentioned in one of above reply to use docx catalog and use all the nextCatalog entries. I cloned repo of docx2tex and gave path of docx2tex catalog while executing java cmd.

Command which I excute

java -cp %MATHTYPE_CP% -Dfile.encoding=UTF8 -Dsun.jnu.encoding=UTF-8 -Dxml.catalog.files=.\docx2tex\xmlcatalog\catalog.xml com.xmlcalabash.drivers.Main -E org.xmlresolver.Resolver -U org.xmlresolver.Resolver -c .\calabash-frontend\extensions\transpect\transpect-config.xml .\calabash-frontend\extensions\transpect\mathtype-extension\xpl\mathtype-example.xpl file=file:///C:\Users\gaura\Desktop\parser\oleObject1.bin

This is my catalog.xml

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  
  <nextCatalog catalog="../schema/hub/xmlcatalog/catalog.xml"/>

  <rewriteURI uriStartString="http://transpect.io/docx2tex/" rewritePrefix="../"/>
  <rewriteURI uriStartString="http://customers.le-tex.de/generic/book-conversion/" rewritePrefix="../"/>

  <nextCatalog catalog="../cascade/xmlcatalog/catalog.xml"/>
  
  <nextCatalog catalog="../docx2hub/xmlcatalog/catalog.xml"/>

  <nextCatalog catalog="../fontmaps/xmlcatalog/catalog.xml"/>
  
  <nextCatalog catalog="../xproc-util/xmlcatalog/catalog.xml"/>

  <nextCatalog catalog="../htmlreports/xmlcatalog/catalog.xml"/>
  
  <nextCatalog catalog="../xslt-util/xmlcatalog/catalog.xml"/>
  
  <nextCatalog catalog="../evolve-hub/xmlcatalog/catalog.xml"/>
  
  <nextCatalog catalog="../mml2tex/xmlcatalog/catalog.xml"/>

  <nextCatalog catalog="../mml-normalize/xmlcatalog/catalog.xml"/>
  
  <nextCatalog catalog="../xml2tex/xmlcatalog/catalog.xml"/>
</catalog>

And also, when I tried to set MATHTYPE_CP env variable /path/to/calabash/extensions/transpect/mathtype-extension/ruby/stdlib this folder missing in repo.

can you please guide me where I was wrong in process

Thanks

@gimsieke
Copy link
Contributor

An issue could be that in many places, file URIs instead of directory paths are expected, in particular for the xml.catalog.files property.

@gaurav-bothra
Copy link

Hi @gimsieke

I also tried with file uri in xml.catalog.files flag

image
But still same error I am getting again.

@gimsieke
Copy link
Contributor

Did you leave out -E org.xmlresolver.Resolver -U org.xmlresolver.Resolver now?
Have you tried invoking calabash-frontend/calabash.bat?

@gaurav-bothra
Copy link

Yes @gimsieke, I tried by removing -E -U both flags, but nothing happens.

I also tried to execute dirrect calabash.bat file I am getting error

Error: Could not find or load main class com.xmlcalabash.drivers.Main

image

@gimsieke
Copy link
Contributor

Ah, now I see. The classpath refers to calabash/distro/xmlcalabash-1.1.22-98.jar, but the docx2tex release zip contains xmlcalabash-1.1.22-99.jar

@mkraetke Can you preproduce & fix this?

@gaurav-bothra
Copy link

I changed version of xml calabash to xmlcalabash-1.1.26-99.jar in calabash.bat file. xmlcalabash-1.1.26-99.jar which is inside calabash-frontend/distro/ folder.

image

@gimsieke
Copy link
Contributor

This looks good so far. Now try to run a pipeline.

@gaurav-bothra
Copy link

Getting same error while adding .xpl file and input file

image

@gimsieke
Copy link
Contributor

Please zip the whole test directory and upload it to some service like Dropbox or Google drive and send me a link so that I can download and debug the whole thing.

@gaurav-bothra
Copy link

Hi @gimsieke ,

I share a link of google drive of test folder.

https://drive.google.com/file/d/1L3wIe23Nb3e5SHAQaVJH0Szh9Og7MHMF/view?usp=sharing

@mkraetke
Copy link
Member

@gimsieke I've changed the classpath of the calabash.bat and updated the v1.4 release zip

@gimsieke
Copy link
Contributor

It might take a couple hours until I can further look into this.

@gaurav-bothra
Copy link

Hi @gimsieke
Please let me know, if any update is there regarding my issue.

@gimsieke
Copy link
Contributor

It is absolutely unnecessary that you change calabash.bat. Just give the pipeline name and the parameter file=… after calabash-frontend/calabash.bat.

Then you named the expected file xmlcatalog/catalog.xml xmlcatalog/catelog.xml (with an 'e'). If you rename it to catalog.xml, the pipeline might get a bit further. But I suspect that some more submodule directories and their nextCatalog instructions in the catalog might be missing. Note that this was the main mistake that you made, and we could not have guessed it until we looked at your zip.

Since @mkraetke said that he fixed https://github.com/transpect/docx2tex/releases/download/v1.4/docx2tex-1.4-release.zip, I recommend that you download the zip again and try the calabash/calabash.bat (it is called calabash instead of calabash-frontend – it’s a bit arbitrary how you name this directory) and xmlcatalog/catalog.xml that shipped with that release. No need for any of your customizations. Note that the docx2tex directory that is in that zip replaces your test directory, it need not be extracted below your test directory.

@mkraetke
Copy link
Member

@gaurav-bothra I've also just created the 1.5 release of docx2tex which fixed also some other recent issues.
https://github.com/transpect/docx2tex/releases/tag/v.1.5

@gaurav-bothra
Copy link

Hello @mkraetke and @gimsieke ,
I downloaded latest release of docx2tex and tried to run mathtype-extension pipeline

image

But I am getting error

.\calabash.bat .\extensions\transpect\mathtype-extension\xpl\mathtype-example.xpl file=file:///C:\\Users\\gaura\\Desktop\\parser\\oleObject1.bin > obj.xml
Errno::ENOENT: No such file or directory - Equation Native
  dirent_from_path at uri:classloader:/ole/storage/file_system.rb:132
              open at uri:classloader:/ole/storage/file_system.rb:165
              read at uri:classloader:/ole/storage/file_system.rb:198
    read_from_file at uri:classloader:/file_parser/ole.rb:11
        initialize at uri:classloader:/file_parser/ole.rb:7
        set_parser at uri:classloader:/mathtype.rb:36
        initialize at uri:classloader:/mathtype.rb:16
            <main> at <script>:1
[ERROR] Mtef2Xml: (ENOENT) No such file or directory - Equation Native

@gimsieke
Copy link
Contributor

gimsieke commented Apr 18, 2020

oleObject1.bin (of question11.docx that you previously sent) is a ChemSketch OLE object. If you convert it using mathtype-example.xpl, you will see the error that you encountered. Converting oleObject2.bin will be successful though, because it is a MathType OLE object.

When running higher-level pipelines, such as docx2hub.xpl or docx2tex.xpl, files like word\_rels\document.xml.rels and document.xml will be consulted in order to find out which of the embedded OLE obejct .bin files are created by MathType or the legacy Word equation editor, so you won’t run into errors like the one above.

Except that unrecognized OLE .bin files might be included by docx2tex as if they were images, which will make LaTeX choke a bit.

@gimsieke
Copy link
Contributor

You see that you don’t need to fumble with calabash.bat at all. The invocation is like:

calabash/calabash.bat calabash/extensions/transpect/mathtype-extension/xpl/mathtype-example.xpl file=../path/to/oleObject2.bin

at least on my Cygwin. I also tried it on Windows cmd, and the use of forward/backward slashes is a bit idiosyncratic:

calabash\calabash.bat calabash\extensions\transpect\mathtype-extension\xpl\mathtype-example.xpl file=../tmp/question11.tmp.docx.tmp/word/embeddings/oleObject2.bin

calabash\calabash.bat needs to have a backslash, in the path to the pipeline it can be either forward or backward slash, and the file option is interpreted as a (possibly relative) URI, which needs to have forward slashes.

@gimsieke
Copy link
Contributor

Sorry @cannshui for all the noise. You probably have already muted this thread.

@gaurav-bothra Please open a new issue next time, even if you think that your issue is related to an existing issue (except when the other issue is still open and you notice exactly the same error that was reported in the other issue).

@gaurav-bothra
Copy link

ok @gimsieke ,

Thanks for your help 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants