Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running dictionary command outputs error in JAVA memory #11

Open
Rahul1711arora opened this issue Mar 11, 2019 · 8 comments
Open

Running dictionary command outputs error in JAVA memory #11

Rahul1711arora opened this issue Mar 11, 2019 · 8 comments

Comments

@Rahul1711arora
Copy link

Rahul1711arora commented Mar 11, 2019

Dear Prof. Peter,

The input for the getpapers was:
getpapers -q "((endophytic bacteria) AND (abiotic stress)) AND (PUB_TYPE:"Review" OR PUB_TYPE:"review-article")" -x -k 200 -o path\to\directory

Which made me download a total of 138 papers in the xml format.

Next, I created a dictionary with around 50 terms, using the command:
ami-dictionary create --terms "many" "terms" "were" "created" --dictionary name of the dictionary --directory path\to\directory -outformats xml,json,html
After running this command, I ran the command to search for the terms in my dictionary in the papers I downloaded to get the data table and the SVG diagrams.

The command I ran was:
ami-search-new -p path\to\files\inXML\format --dictionary path\to\my\dictionary
This normalized the xml to html format. But after doing this, when the count command was running to calculate the frequency of words, an error was thrown.
Please find attached a screenshot for the same.

Also. before this error was thrown, I got the tables for a test run but unfortunately, the SVG files were not formed.

I request you to kindly tell me how can I overcome this error.

The solution that I tried was changing the memory allocation for the JVM. I allocated a 2GB memory to it so that the heap space error can be overcomed, but I couldn't really find an alternative to the predicament.

Hope, the error gets resolved earlier and I can start my work soon.

Best
Rahul
AMI_error

@petermr
Copy link
Owner

petermr commented Mar 11, 2019

Can you indicate your operating system please? I assume it's a version of Windows because of the backslashes.

Dear Prof. Peter,

no need to add names - the whole world can help with this :-)

The input for the getpapers was:
getpapers -q "((endophytic bacteria) AND (abiotic stress)) AND (PUB_TYPE:"Review" OR PUB_TYPE:"review-article")" -x -k 200 -o path\to\directory

Which made me download a total of 138 papers in the xml format.

Next, I created a dictionary with around 50 terms, using the command:
ami-dictionary create --terms "many" "terms" "were" "created" --dictionary name of the dictionary --directory path\to\directory -outformats xml,json,html
After running this command, I ran the command to search for the terms in my dictionary in the papers I downloaded to get the data table and the SVG diagrams.

^^^ History ^^^
You can omit this history - you only need the ami-search-new command.
The command I ran was:

ami-search-new -p path\to\files\inXML\format --dictionary path\to\my\dictionary
This normalized the xml to html format. But after doing this, when the count command was running to calculate the frequency of words, an error was thrown.
Please find attached a screenshot for the same.

Much better to include the actual text as it can be cut-and-pasted. Please repost the output as text.

Also. before this error was thrown, I got the tables for a test run but unfortunately, the SVG files were not formed.

SVG will only be formed after the search completes.

The solution that I tried was changing the memory allocation for the JVM. I allocated a 2GB memory to it so that the heap space error can be overcomed, but I couldn't really find an alternative to the predicament.

Please give the exact command. Possible parameters are

-Xms and -Xmx
Hope, the error gets resolved earlier and I can start my work soon.

Open source projects cannot promise delivery dates, sorry.

===========
My guess is that there is a very large file causing problems. If you can post the PMCs as a list, I can download them and see if I get the same error.

@Rahul1711arora
Copy link
Author

Hi,

Yes, the OS is windows 10, version 10.0.17134.
The output is as follows along with the command:

C:\bin>ami-search-new -p C:\Users\Rahul\Documents\New_book_chapter --dictionary C:\Users\Rahul\Documents\New_book_chapter\stress_and_bacteria.xml

Generic values (AMISearchTool)

basename null
cproject C:\Users\Rahul\Documents\New_book_chapter
ctree
cTreeList 138 trees [C:\Users\Rahul\Documents\New_book_chapter\PMC1240
dryrun false
excludeBase null
excludeTrees null
file types []
forceMake false
includeBase null
includeTrees null
log4j
logfile null
verbose 0

Specific values (AMISearchTool)

dictionaryList [C:\Users\Rahul\Documents\New_book_chapter\stress_and_bacteria.xml]
dictionaryTop null
dictionarySuffix [xml]
ignorePlugins []

cProject: New_book_chapter

running: word; word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}].............Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuffer.append(Unknown Source)
at nu.xom.Element.writeStartTag(Unknown Source)
at nu.xom.Element.toXML(Unknown Source)
at org.contentmine.graphics.html.HtmlFactory.parseLegacyHtmlToWellFormedXML(HtmlFactory.java:730)
at org.contentmine.graphics.html.HtmlFactory.parse(HtmlFactory.java:643)
at org.contentmine.graphics.html.HtmlFactory.parse(HtmlFactory.java:622)
at org.contentmine.cproject.args.DefaultArgProcessor.getScholarlyHtmlElement(DefaultArgProcessor.java:1382)
at org.contentmine.cproject.files.CTree.ensureScholarlyHtmlElement(CTree.java:1239)
at org.contentmine.cproject.args.DefaultArgProcessor.extractPSectionElements(DefaultArgProcessor.java:1365)
at org.contentmine.ami.plugins.AMIArgProcessor.ensureSectionElements(AMIArgProcessor.java:257)
at org.contentmine.ami.plugins.AMIArgProcessor.runRunMethodsOnChosenArgOptions(AMIArgProcessor.java:228)
at org.contentmine.cproject.args.DefaultArgProcessor.runAndOutput(DefaultArgProcessor.java:1296)
at org.contentmine.ami.plugins.word.WordPluginOption.run(WordPluginOption.java:36)
at org.contentmine.ami.plugins.CommandProcessor.runLegacyPluginOptions(CommandProcessor.java:301)
at org.contentmine.ami.tools.AMISearchTool.runLegacyCommandProcessor(AMISearchTool.java:128)
at org.contentmine.ami.tools.AMISearchTool.runSearch(AMISearchTool.java:112)
at org.contentmine.ami.tools.AMISearchTool.processProject(AMISearchTool.java:103)
at org.contentmine.ami.tools.AMISearchTool.runSpecifics(AMISearchTool.java:93)
at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:218)
at org.contentmine.ami.tools.AMISearchTool.main(AMISearchTool.java:75)

The exact command used to allocate the memory was: -Xmx2048m

Please find the text document containing the list of PMCs

Thanks
PMC_Ids.txt

@petermr
Copy link
Owner

petermr commented Mar 11, 2019

I have run this on your PMC set but with an inbuilt dictionary. No crash:

pm286macbook:test pm286$ ami-search-new -p pmc/ --dictionary country

Generic values (AMISearchTool)
================================
basename            null
cproject            /Users/pm286/workspace/cmdev/normami/test/pmc
ctree               
cTreeList           138 trees [pmc/PMC1240683, pmc/PMC2216073, pmc/PMC3202864, p
dryrun              false
excludeBase         null
excludeTrees        null
file types          []
forceMake           false
includeBase         null
includeTrees        null
log4j               
logfile             null
verbose             0

Specific values (AMISearchTool)
================================
dictionaryList       [country]
dictionaryTop        null
dictionarySuffix     [xml]
ignorePlugins        []

cProject: pmc
0    [main] DEBUG org.contentmine.ami.plugins.CommandProcessor  - running NORMA -i fulltext.xml -o scholarly.html --transform nlm2html --project pmc

running: word; word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}
running: search; search([country])[]..........................................
create data tables
rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrpm286macbook:test pm286$ 

What is in your dictionary? I think the problem may be there. Can you rerun my example and see if you get a crash.

And reproduce the commandline/s

@Rahul1711arora
Copy link
Author

I reran your example but again got the same crash.

C:\bin>ami-search-new -p C:\Users\Rahul\Documents\New_book_chapter\ --dictionary country

Generic values (AMISearchTool)

basename null
cproject C:\Users\Rahul\Documents\New_book_chapter
ctree
cTreeList 138 trees [C:\Users\Rahul\Documents\New_book_chapter\PMC1240
dryrun false
excludeBase null
excludeTrees null
file types []
forceMake false
includeBase null
includeTrees null
log4j
logfile null
verbose 0

Specific values (AMISearchTool)

dictionaryList [country]
dictionaryTop null
dictionarySuffix [xml]
ignorePlugins []

cProject: New_book_chapter

running: word; word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}].............Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuffer.append(Unknown Source)
at nu.xom.Element.writeStartTag(Unknown Source)
at nu.xom.Element.toXML(Unknown Source)
at org.contentmine.graphics.html.HtmlFactory.parseLegacyHtmlToWellFormedXML(HtmlFactory.java:730)
at org.contentmine.graphics.html.HtmlFactory.parse(HtmlFactory.java:643)
at org.contentmine.graphics.html.HtmlFactory.parse(HtmlFactory.java:622)
at org.contentmine.cproject.args.DefaultArgProcessor.getScholarlyHtmlElement(DefaultArgProcessor.java:1382)
at org.contentmine.cproject.files.CTree.ensureScholarlyHtmlElement(CTree.java:1239)
at org.contentmine.cproject.args.DefaultArgProcessor.extractPSectionElements(DefaultArgProcessor.java:1365)
at org.contentmine.ami.plugins.AMIArgProcessor.ensureSectionElements(AMIArgProcessor.java:257)
at org.contentmine.ami.plugins.AMIArgProcessor.runRunMethodsOnChosenArgOptions(AMIArgProcessor.java:228)
at org.contentmine.cproject.args.DefaultArgProcessor.runAndOutput(DefaultArgProcessor.java:1296)
at org.contentmine.ami.plugins.word.WordPluginOption.run(WordPluginOption.java:36)
at org.contentmine.ami.plugins.CommandProcessor.runLegacyPluginOptions(CommandProcessor.java:301)
at org.contentmine.ami.tools.AMISearchTool.runLegacyCommandProcessor(AMISearchTool.java:128)
at org.contentmine.ami.tools.AMISearchTool.runSearch(AMISearchTool.java:112)
at org.contentmine.ami.tools.AMISearchTool.processProject(AMISearchTool.java:103)
at org.contentmine.ami.tools.AMISearchTool.runSpecifics(AMISearchTool.java:93)
at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:218)
at org.contentmine.ami.tools.AMISearchTool.main(AMISearchTool.java:75)

I have almost 50 terms in my dictionary, is there a limit on how many terms one can add to their own dictionary?

@petermr
Copy link
Owner

petermr commented Mar 12, 2019 via email

@Rahul1711arora
Copy link
Author

Thank you very much! I reran the entire process with another set of files and it worked fine. But for the ones I was originally working with still had a crash. No worries, I'll run the same on another machine. Thanks!

@petermr
Copy link
Owner

petermr commented Mar 13, 2019 via email

@petermr
Copy link
Owner

petermr commented Mar 13, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants