Running dictionary command outputs error in JAVA memory #11

Rahul1711arora · 2019-03-11T13:48:15Z

Dear Prof. Peter,

The input for the getpapers was:
getpapers -q "((endophytic bacteria) AND (abiotic stress)) AND (PUB_TYPE:"Review" OR PUB_TYPE:"review-article")" -x -k 200 -o path\to\directory

Which made me download a total of 138 papers in the xml format.

Next, I created a dictionary with around 50 terms, using the command:
ami-dictionary create --terms "many" "terms" "were" "created" --dictionary name of the dictionary --directory path\to\directory -outformats xml,json,html
After running this command, I ran the command to search for the terms in my dictionary in the papers I downloaded to get the data table and the SVG diagrams.

The command I ran was:
ami-search-new -p path\to\files\inXML\format --dictionary path\to\my\dictionary
This normalized the xml to html format. But after doing this, when the count command was running to calculate the frequency of words, an error was thrown.
Please find attached a screenshot for the same.

Also. before this error was thrown, I got the tables for a test run but unfortunately, the SVG files were not formed.

I request you to kindly tell me how can I overcome this error.

The solution that I tried was changing the memory allocation for the JVM. I allocated a 2GB memory to it so that the heap space error can be overcomed, but I couldn't really find an alternative to the predicament.

Hope, the error gets resolved earlier and I can start my work soon.

Best
Rahul

The text was updated successfully, but these errors were encountered:

petermr · 2019-03-11T16:21:53Z

Can you indicate your operating system please? I assume it's a version of Windows because of the backslashes.

Dear Prof. Peter,

no need to add names - the whole world can help with this :-)

The input for the getpapers was:
getpapers -q "((endophytic bacteria) AND (abiotic stress)) AND (PUB_TYPE:"Review" OR PUB_TYPE:"review-article")" -x -k 200 -o path\to\directory

Which made me download a total of 138 papers in the xml format.

Next, I created a dictionary with around 50 terms, using the command:
ami-dictionary create --terms "many" "terms" "were" "created" --dictionary name of the dictionary --directory path\to\directory -outformats xml,json,html
After running this command, I ran the command to search for the terms in my dictionary in the papers I downloaded to get the data table and the SVG diagrams.

^^^ History ^^^
You can omit this history - you only need the ami-search-new command.
The command I ran was:

ami-search-new -p path\to\files\inXML\format --dictionary path\to\my\dictionary

This normalized the xml to html format. But after doing this, when the count command was running to calculate the frequency of words, an error was thrown.
Please find attached a screenshot for the same.

Much better to include the actual text as it can be cut-and-pasted. Please repost the output as text.

Also. before this error was thrown, I got the tables for a test run but unfortunately, the SVG files were not formed.

SVG will only be formed after the search completes.

The solution that I tried was changing the memory allocation for the JVM. I allocated a 2GB memory to it so that the heap space error can be overcomed, but I couldn't really find an alternative to the predicament.

Please give the exact command. Possible parameters are

-Xms and -Xmx

Hope, the error gets resolved earlier and I can start my work soon.

Open source projects cannot promise delivery dates, sorry.

===========
My guess is that there is a very large file causing problems. If you can post the PMCs as a list, I can download them and see if I get the same error.

Rahul1711arora · 2019-03-11T17:39:46Z

Hi,

Yes, the OS is windows 10, version 10.0.17134.
The output is as follows along with the command:

C:\bin>ami-search-new -p C:\Users\Rahul\Documents\New_book_chapter --dictionary C:\Users\Rahul\Documents\New_book_chapter\stress_and_bacteria.xml

Generic values (AMISearchTool)

basename null
cproject C:\Users\Rahul\Documents\New_book_chapter
ctree
cTreeList 138 trees [C:\Users\Rahul\Documents\New_book_chapter\PMC1240
dryrun false
excludeBase null
excludeTrees null
file types []
forceMake false
includeBase null
includeTrees null
log4j
logfile null
verbose 0

Specific values (AMISearchTool)

dictionaryList [C:\Users\Rahul\Documents\New_book_chapter\stress_and_bacteria.xml]
dictionaryTop null
dictionarySuffix [xml]
ignorePlugins []

cProject: New_book_chapter

running: word; word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}].............Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuffer.append(Unknown Source)
at nu.xom.Element.writeStartTag(Unknown Source)
at nu.xom.Element.toXML(Unknown Source)
at org.contentmine.graphics.html.HtmlFactory.parseLegacyHtmlToWellFormedXML(HtmlFactory.java:730)
at org.contentmine.graphics.html.HtmlFactory.parse(HtmlFactory.java:643)
at org.contentmine.graphics.html.HtmlFactory.parse(HtmlFactory.java:622)
at org.contentmine.cproject.args.DefaultArgProcessor.getScholarlyHtmlElement(DefaultArgProcessor.java:1382)
at org.contentmine.cproject.files.CTree.ensureScholarlyHtmlElement(CTree.java:1239)
at org.contentmine.cproject.args.DefaultArgProcessor.extractPSectionElements(DefaultArgProcessor.java:1365)
at org.contentmine.ami.plugins.AMIArgProcessor.ensureSectionElements(AMIArgProcessor.java:257)
at org.contentmine.ami.plugins.AMIArgProcessor.runRunMethodsOnChosenArgOptions(AMIArgProcessor.java:228)
at org.contentmine.cproject.args.DefaultArgProcessor.runAndOutput(DefaultArgProcessor.java:1296)
at org.contentmine.ami.plugins.word.WordPluginOption.run(WordPluginOption.java:36)
at org.contentmine.ami.plugins.CommandProcessor.runLegacyPluginOptions(CommandProcessor.java:301)
at org.contentmine.ami.tools.AMISearchTool.runLegacyCommandProcessor(AMISearchTool.java:128)
at org.contentmine.ami.tools.AMISearchTool.runSearch(AMISearchTool.java:112)
at org.contentmine.ami.tools.AMISearchTool.processProject(AMISearchTool.java:103)
at org.contentmine.ami.tools.AMISearchTool.runSpecifics(AMISearchTool.java:93)
at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:218)
at org.contentmine.ami.tools.AMISearchTool.main(AMISearchTool.java:75)

The exact command used to allocate the memory was: -Xmx2048m

Please find the text document containing the list of PMCs

Thanks
PMC_Ids.txt

petermr · 2019-03-11T23:08:11Z

I have run this on your PMC set but with an inbuilt dictionary. No crash:

pm286macbook:test pm286$ ami-search-new -p pmc/ --dictionary country

Generic values (AMISearchTool)
================================
basename            null
cproject            /Users/pm286/workspace/cmdev/normami/test/pmc
ctree               
cTreeList           138 trees [pmc/PMC1240683, pmc/PMC2216073, pmc/PMC3202864, p
dryrun              false
excludeBase         null
excludeTrees        null
file types          []
forceMake           false
includeBase         null
includeTrees        null
log4j               
logfile             null
verbose             0

Specific values (AMISearchTool)
================================
dictionaryList       [country]
dictionaryTop        null
dictionarySuffix     [xml]
ignorePlugins        []

cProject: pmc
0    [main] DEBUG org.contentmine.ami.plugins.CommandProcessor  - running NORMA -i fulltext.xml -o scholarly.html --transform nlm2html --project pmc
PMC1240683 .PMC2216073 PMC3202864 PMC3283951 PMC3355587 PMC3417362 PMC3497943 PMC3573209 PMC3604591 PMC3706808 PMC3707038 .PMC3728534 PMC3738838 PMC3775148 PMC3812866 PMC3815904 PMC3815906 PMC3820493 PMC3825493 PMC3836376 PMC3868918 .PMC3947992 PMC4022417 PMC4045152 PMC4163387 PMC4265070 PMC4265282 PMC4285135 PMC4285865 PMC4312627 PMC4318275 .PMC4333861 PMC4358370 PMC4377440 PMC4389352 PMC4413195 PMC4440916 PMC4479509 PMC4500914 PMC4512045 PMC4522733 .PMC4527079 PMC4550782 PMC4561359 PMC4563596 PMC4581282 PMC4585250 PMC4626563 PMC4632817 PMC4646962 PMC4729944 .PMC4748402 PMC4754410 PMC4778271 PMC4792885 PMC4801890 PMC4802167 PMC4811947 PMC4819777 PMC4844426 PMC4849068 .PMC4880627 PMC4885868 PMC4909795 PMC4917562 PMC4925718 PMC4938854 PMC4949542 PMC4988986 PMC5035732 PMC5035750 .PMC5043059 PMC5067414 PMC5069422 PMC5080360 PMC5085706 PMC5099148 PMC5116465 PMC5127157 PMC5156507 PMC5244474 .PMC5299014 PMC5299024 PMC5388769 PMC5395610 PMC5403934 PMC5532450 PMC5610682 PMC5660262 PMC5662797 PMC5671593 .PMC5686270 PMC5715960 PMC5741648 PMC5742157 PMC5744479 PMC5748579 PMC5748586 PMC5767233 PMC5786577 PMC5787091 .PMC5809494 PMC5811519 PMC5812248 PMC5818412 PMC5827301 PMC5870681 PMC5872327 PMC5923616 PMC5979581 PMC5981179 .PMC5994547 PMC5996133 PMC6027233 PMC6079243 PMC6092505 PMC6094092 PMC6110075 PMC6110341 PMC6111575 PMC6116750 .PMC6125355 PMC6132428 PMC6132541 PMC6164190 PMC6206271 PMC6218572 PMC6249440 PMC6273650 PMC6274040 PMC6277688 .PMC6289982 PMC6292962 PMC6308375 PMC6311197 PMC6313892 PMC6337347 PMC6359256 
running: word; word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}]PMC1240683 .PMC2216073 PMC3202864 PMC3283951 PMC3355587 PMC3417362 PMC3497943 PMC3573209 PMC3604591 PMC3706808 PMC3707038 .PMC3728534 PMC3738838 PMC3775148 PMC3812866 PMC3815904 PMC3815906 PMC3820493 PMC3825493 PMC3836376 PMC3868918 .PMC3947992 PMC4022417 PMC4045152 PMC4163387 PMC4265070 PMC4265282 PMC4285135 PMC4285865 PMC4312627 PMC4318275 .PMC4333861 PMC4358370 PMC4377440 PMC4389352 PMC4413195 PMC4440916 PMC4479509 PMC4500914 PMC4512045 PMC4522733 .PMC4527079 PMC4550782 PMC4561359 PMC4563596 PMC4581282 PMC4585250 PMC4626563 PMC4632817 PMC4646962 PMC4729944 .PMC4748402 PMC4754410 PMC4778271 PMC4792885 PMC4801890 PMC4802167 PMC4811947 PMC4819777 PMC4844426 PMC4849068 .PMC4880627 PMC4885868 PMC4909795 PMC4917562 PMC4925718 PMC4938854 PMC4949542 PMC4988986 PMC5035732 PMC5035750 .PMC5043059 PMC5067414 PMC5069422 PMC5080360 PMC5085706 PMC5099148 PMC5116465 PMC5127157 PMC5156507 PMC5244474 .PMC5299014 PMC5299024 PMC5388769 PMC5395610 PMC5403934 PMC5532450 PMC5610682 PMC5660262 PMC5662797 PMC5671593 .PMC5686270 PMC5715960 PMC5741648 PMC5742157 PMC5744479 PMC5748579 PMC5748586 PMC5767233 PMC5786577 PMC5787091 .PMC5809494 PMC5811519 PMC5812248 PMC5818412 PMC5827301 PMC5870681 PMC5872327 PMC5923616 PMC5979581 PMC5981179 .PMC5994547 PMC5996133 PMC6027233 PMC6079243 PMC6092505 PMC6094092 PMC6110075 PMC6110341 PMC6111575 PMC6116750 .PMC6125355 PMC6132428 PMC6132541 PMC6164190 PMC6206271 PMC6218572 PMC6249440 PMC6273650 PMC6274040 PMC6277688 .PMC6289982 PMC6292962 PMC6308375 PMC6311197 PMC6313892 PMC6337347 PMC6359256 ..........................................
running: search; search([country])[]..........................................
create data tables
rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrpm286macbook:test pm286$

What is in your dictionary? I think the problem may be there. Can you rerun my example and see if you get a crash.

And reproduce the commandline/s

Rahul1711arora · 2019-03-12T08:24:49Z

I reran your example but again got the same crash.

C:\bin>ami-search-new -p C:\Users\Rahul\Documents\New_book_chapter\ --dictionary country

Generic values (AMISearchTool)

basename null
cproject C:\Users\Rahul\Documents\New_book_chapter
ctree
cTreeList 138 trees [C:\Users\Rahul\Documents\New_book_chapter\PMC1240
dryrun false
excludeBase null
excludeTrees null
file types []
forceMake false
includeBase null
includeTrees null
log4j
logfile null
verbose 0

Specific values (AMISearchTool)

dictionaryList [country]
dictionaryTop null
dictionarySuffix [xml]
ignorePlugins []

cProject: New_book_chapter

running: word; word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}].............Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuffer.append(Unknown Source)
at nu.xom.Element.writeStartTag(Unknown Source)
at nu.xom.Element.toXML(Unknown Source)
at org.contentmine.graphics.html.HtmlFactory.parseLegacyHtmlToWellFormedXML(HtmlFactory.java:730)
at org.contentmine.graphics.html.HtmlFactory.parse(HtmlFactory.java:643)
at org.contentmine.graphics.html.HtmlFactory.parse(HtmlFactory.java:622)
at org.contentmine.cproject.args.DefaultArgProcessor.getScholarlyHtmlElement(DefaultArgProcessor.java:1382)
at org.contentmine.cproject.files.CTree.ensureScholarlyHtmlElement(CTree.java:1239)
at org.contentmine.cproject.args.DefaultArgProcessor.extractPSectionElements(DefaultArgProcessor.java:1365)
at org.contentmine.ami.plugins.AMIArgProcessor.ensureSectionElements(AMIArgProcessor.java:257)
at org.contentmine.ami.plugins.AMIArgProcessor.runRunMethodsOnChosenArgOptions(AMIArgProcessor.java:228)
at org.contentmine.cproject.args.DefaultArgProcessor.runAndOutput(DefaultArgProcessor.java:1296)
at org.contentmine.ami.plugins.word.WordPluginOption.run(WordPluginOption.java:36)
at org.contentmine.ami.plugins.CommandProcessor.runLegacyPluginOptions(CommandProcessor.java:301)
at org.contentmine.ami.tools.AMISearchTool.runLegacyCommandProcessor(AMISearchTool.java:128)
at org.contentmine.ami.tools.AMISearchTool.runSearch(AMISearchTool.java:112)
at org.contentmine.ami.tools.AMISearchTool.processProject(AMISearchTool.java:103)
at org.contentmine.ami.tools.AMISearchTool.runSpecifics(AMISearchTool.java:93)
at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:218)
at org.contentmine.ami.tools.AMISearchTool.main(AMISearchTool.java:75)

I have almost 50 terms in my dictionary, is there a limit on how many terms one can add to their own dictionary?

petermr · 2019-03-12T09:27:36Z

On Tue, Mar 12, 2019 at 8:26 AM Rahul1711arora ***@***.***> wrote: I reran your example but again got the same crash. I have almost 50 terms in my dictiintensive.onary, is there a limit on how many terms one can add to their own dictionary? No. I have dictionaries with 50,000 terms.

I can't help because no one else has had this error so it seems to be due to the setup on your machine. I shall be gradually modifying the code to make it less memory-intensive. But this won't be immediate. All I can suggest is that you use a different machine. I don't know where you are setting the memory size but you shouldn't have to. I can't do more here. P —

…

You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAsxSwkOkVJ9OcYRRc4zyuoNFFTJrCkxks5vV2TRgaJpZM4bopEe> .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

Rahul1711arora · 2019-03-13T08:55:28Z

Thank you very much! I reran the entire process with another set of files and it worked fine. But for the ones I was originally working with still had a crash. No worries, I'll run the same on another machine. Thanks!

petermr · 2019-03-13T18:18:00Z

On Wed, Mar 13, 2019 at 8:55 AM Rahul1711arora ***@***.***> wrote: Thank you very much! I reran the entire process with another set of files and it worked fine. But for the ones I was originally working with still had a crash. No worries, I'll run the same on another machine. Thanks!

The files and dictionaries are small so I suspect there is a rogue file (or combination of files) of some sort. Can you do a binary chop on the files, e.g. Split CProject into Cproject1 and CProject2. If error still persists recursively split (p1.1, p1.1, p2.1, p2.2) until you find the smallest set that shows the error. If that is 1 paper it shows the article which gives problems. (Less likely) there may be a dictionary entry that gives problems. Make sure that there are no terms which would have a huge number of hits (e.g. term="a"). P.

…

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr · 2019-03-13T18:20:43Z

If you have only 50 dictionary entries, suggest you post the whole dictionary here. On Wed, Mar 13, 2019 at 6:17 PM Peter Murray-Rust < [email protected]> wrote:

…

On Wed, Mar 13, 2019 at 8:55 AM Rahul1711arora ***@***.***> wrote: > Thank you very much! I reran the entire process with another set of files > and it worked fine. But for the ones I was originally working with still > had a crash. No worries, I'll run the same on another machine. Thanks! > The files and dictionaries are small so I suspect there is a rogue file (or combination of files) of some sort. Can you do a binary chop on the files, e.g. Split CProject into Cproject1 and CProject2. If error still persists recursively split (p1.1, p1.1, p2.1, p2.2) until you find the smallest set that shows the error. If that is 1 paper it shows the article which gives problems. (Less likely) there may be a dictionary entry that gives problems. Make sure that there are no terms which would have a huge number of hits (e.g. term="a"). P. > -- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running dictionary command outputs error in JAVA memory #11

Running dictionary command outputs error in JAVA memory #11

Rahul1711arora commented Mar 11, 2019 •

edited

Loading

petermr commented Mar 11, 2019

Rahul1711arora commented Mar 11, 2019

petermr commented Mar 11, 2019

Rahul1711arora commented Mar 12, 2019

petermr commented Mar 12, 2019 via email

Rahul1711arora commented Mar 13, 2019

petermr commented Mar 13, 2019 via email

petermr commented Mar 13, 2019 via email

Running dictionary command outputs error in JAVA memory #11

Running dictionary command outputs error in JAVA memory #11

Comments

Rahul1711arora commented Mar 11, 2019 • edited Loading

petermr commented Mar 11, 2019

Rahul1711arora commented Mar 11, 2019

Generic values (AMISearchTool)

Specific values (AMISearchTool)

petermr commented Mar 11, 2019

Rahul1711arora commented Mar 12, 2019

Generic values (AMISearchTool)

Specific values (AMISearchTool)

petermr commented Mar 12, 2019 via email

Rahul1711arora commented Mar 13, 2019

petermr commented Mar 13, 2019 via email

petermr commented Mar 13, 2019 via email

Rahul1711arora commented Mar 11, 2019 •

edited

Loading