Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running dictionary command outputs error in JAVA memory #11

Open
Rahul1711arora opened this issue Mar 11, 2019 · 8 comments
Open

Running dictionary command outputs error in JAVA memory #11

Rahul1711arora opened this issue Mar 11, 2019 · 8 comments

Comments

@Rahul1711arora
Copy link

Rahul1711arora commented Mar 11, 2019

Dear Prof. Peter,

The input for the getpapers was:
getpapers -q "((endophytic bacteria) AND (abiotic stress)) AND (PUB_TYPE:"Review" OR PUB_TYPE:"review-article")" -x -k 200 -o path\to\directory

Which made me download a total of 138 papers in the xml format.

Next, I created a dictionary with around 50 terms, using the command:
ami-dictionary create --terms "many" "terms" "were" "created" --dictionary name of the dictionary --directory path\to\directory -outformats xml,json,html
After running this command, I ran the command to search for the terms in my dictionary in the papers I downloaded to get the data table and the SVG diagrams.

The command I ran was:
ami-search-new -p path\to\files\inXML\format --dictionary path\to\my\dictionary
This normalized the xml to html format. But after doing this, when the count command was running to calculate the frequency of words, an error was thrown.
Please find attached a screenshot for the same.

Also. before this error was thrown, I got the tables for a test run but unfortunately, the SVG files were not formed.

I request you to kindly tell me how can I overcome this error.

The solution that I tried was changing the memory allocation for the JVM. I allocated a 2GB memory to it so that the heap space error can be overcomed, but I couldn't really find an alternative to the predicament.

Hope, the error gets resolved earlier and I can start my work soon.

Best
Rahul
AMI_error

@petermr
Copy link
Owner

petermr commented Mar 11, 2019

Can you indicate your operating system please? I assume it's a version of Windows because of the backslashes.

Dear Prof. Peter,

no need to add names - the whole world can help with this :-)

The input for the getpapers was:
getpapers -q "((endophytic bacteria) AND (abiotic stress)) AND (PUB_TYPE:"Review" OR PUB_TYPE:"review-article")" -x -k 200 -o path\to\directory

Which made me download a total of 138 papers in the xml format.

Next, I created a dictionary with around 50 terms, using the command:
ami-dictionary create --terms "many" "terms" "were" "created" --dictionary name of the dictionary --directory path\to\directory -outformats xml,json,html
After running this command, I ran the command to search for the terms in my dictionary in the papers I downloaded to get the data table and the SVG diagrams.

^^^ History ^^^
You can omit this history - you only need the ami-search-new command.
The command I ran was:

ami-search-new -p path\to\files\inXML\format --dictionary path\to\my\dictionary
This normalized the xml to html format. But after doing this, when the count command was running to calculate the frequency of words, an error was thrown.
Please find attached a screenshot for the same.

Much better to include the actual text as it can be cut-and-pasted. Please repost the output as text.

Also. before this error was thrown, I got the tables for a test run but unfortunately, the SVG files were not formed.

SVG will only be formed after the search completes.

The solution that I tried was changing the memory allocation for the JVM. I allocated a 2GB memory to it so that the heap space error can be overcomed, but I couldn't really find an alternative to the predicament.

Please give the exact command. Possible parameters are

-Xms and -Xmx
Hope, the error gets resolved earlier and I can start my work soon.

Open source projects cannot promise delivery dates, sorry.

===========
My guess is that there is a very large file causing problems. If you can post the PMCs as a list, I can download them and see if I get the same error.

@Rahul1711arora
Copy link
Author

Hi,

Yes, the OS is windows 10, version 10.0.17134.
The output is as follows along with the command:

C:\bin>ami-search-new -p C:\Users\Rahul\Documents\New_book_chapter --dictionary C:\Users\Rahul\Documents\New_book_chapter\stress_and_bacteria.xml

Generic values (AMISearchTool)

basename null
cproject C:\Users\Rahul\Documents\New_book_chapter
ctree
cTreeList 138 trees [C:\Users\Rahul\Documents\New_book_chapter\PMC1240
dryrun false
excludeBase null
excludeTrees null
file types []
forceMake false
includeBase null
includeTrees null
log4j
logfile null
verbose 0

Specific values (AMISearchTool)

dictionaryList [C:\Users\Rahul\Documents\New_book_chapter\stress_and_bacteria.xml]
dictionaryTop null
dictionarySuffix [xml]
ignorePlugins []

cProject: New_book_chapter

running: word; word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}].............Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuffer.append(Unknown Source)
at nu.xom.Element.writeStartTag(Unknown Source)
at nu.xom.Element.toXML(Unknown Source)
at org.contentmine.graphics.html.HtmlFactory.parseLegacyHtmlToWellFormedXML(HtmlFactory.java:730)
at org.contentmine.graphics.html.HtmlFactory.parse(HtmlFactory.java:643)
at org.contentmine.graphics.html.HtmlFactory.parse(HtmlFactory.java:622)
at org.contentmine.cproject.args.DefaultArgProcessor.getScholarlyHtmlElement(DefaultArgProcessor.java:1382)
at org.contentmine.cproject.files.CTree.ensureScholarlyHtmlElement(CTree.java:1239)
at org.contentmine.cproject.args.DefaultArgProcessor.extractPSectionElements(DefaultArgProcessor.java:1365)
at org.contentmine.ami.plugins.AMIArgProcessor.ensureSectionElements(AMIArgProcessor.java:257)
at org.contentmine.ami.plugins.AMIArgProcessor.runRunMethodsOnChosenArgOptions(AMIArgProcessor.java:228)
at org.contentmine.cproject.args.DefaultArgProcessor.runAndOutput(DefaultArgProcessor.java:1296)
at org.contentmine.ami.plugins.word.WordPluginOption.run(WordPluginOption.java:36)
at org.contentmine.ami.plugins.CommandProcessor.runLegacyPluginOptions(CommandProcessor.java:301)
at org.contentmine.ami.tools.AMISearchTool.runLegacyCommandProcessor(AMISearchTool.java:128)
at org.contentmine.ami.tools.AMISearchTool.runSearch(AMISearchTool.java:112)
at org.contentmine.ami.tools.AMISearchTool.processProject(AMISearchTool.java:103)
at org.contentmine.ami.tools.AMISearchTool.runSpecifics(AMISearchTool.java:93)
at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:218)
at org.contentmine.ami.tools.AMISearchTool.main(AMISearchTool.java:75)

The exact command used to allocate the memory was: -Xmx2048m

Please find the text document containing the list of PMCs

Thanks
PMC_Ids.txt

@petermr
Copy link
Owner

petermr commented Mar 11, 2019

I have run this on your PMC set but with an inbuilt dictionary. No crash:

pm286macbook:test pm286$ ami-search-new -p pmc/ --dictionary country

Generic values (AMISearchTool)
================================
basename            null
cproject            /Users/pm286/workspace/cmdev/normami/test/pmc
ctree               
cTreeList           138 trees [pmc/PMC1240683, pmc/PMC2216073, pmc/PMC3202864, p
dryrun              false
excludeBase         null
excludeTrees        null
file types          []
forceMake           false
includeBase         null
includeTrees        null
log4j               
logfile             null
verbose             0

Specific values (AMISearchTool)
================================
dictionaryList       [country]
dictionaryTop        null
dictionarySuffix     [xml]
ignorePlugins        []

cProject: pmc
0    [main] DEBUG org.contentmine.ami.plugins.CommandProcessor  - running NORMA -i fulltext.xml -o scholarly.html --transform nlm2html --project pmc
PMC1240683 .PMC2216073 PMC3202864 PMC3283951 PMC3355587 PMC3417362 PMC3497943 PMC3573209 PMC3604591 PMC3706808 PMC3707038 .PMC3728534 PMC3738838 PMC3775148 PMC3812866 PMC3815904 PMC3815906 PMC3820493 PMC3825493 PMC3836376 PMC3868918 .PMC3947992 PMC4022417 PMC4045152 PMC4163387 PMC4265070 PMC4265282 PMC4285135 PMC4285865 PMC4312627 PMC4318275 .PMC4333861 PMC4358370 PMC4377440 PMC4389352 PMC4413195 PMC4440916 PMC4479509 PMC4500914 PMC4512045 PMC4522733 .PMC4527079 PMC4550782 PMC4561359 PMC4563596 PMC4581282 PMC4585250 PMC4626563 PMC4632817 PMC4646962 PMC4729944 .PMC4748402 PMC4754410 PMC4778271 PMC4792885 PMC4801890 PMC4802167 PMC4811947 PMC4819777 PMC4844426 PMC4849068 .PMC4880627 PMC4885868 PMC4909795 PMC4917562 PMC4925718 PMC4938854 PMC4949542 PMC4988986 PMC5035732 PMC5035750 .PMC5043059 PMC5067414 PMC5069422 PMC5080360 PMC5085706 PMC5099148 PMC5116465 PMC5127157 PMC5156507 PMC5244474 .PMC5299014 PMC5299024 PMC5388769 PMC5395610 PMC5403934 PMC5532450 PMC5610682 PMC5660262 PMC5662797 PMC5671593 .PMC5686270 PMC5715960 PMC5741648 PMC5742157 PMC5744479 PMC5748579 PMC5748586 PMC5767233 PMC5786577 PMC5787091 .PMC5809494 PMC5811519 PMC5812248 PMC5818412 PMC5827301 PMC5870681 PMC5872327 PMC5923616 PMC5979581 PMC5981179 .PMC5994547 PMC5996133 PMC6027233 PMC6079243 PMC6092505 PMC6094092 PMC6110075 PMC6110341 PMC6111575 PMC6116750 .PMC6125355 PMC6132428 PMC6132541 PMC6164190 PMC6206271 PMC6218572 PMC6249440 PMC6273650 PMC6274040 PMC6277688 .PMC6289982 PMC6292962 PMC6308375 PMC6311197 PMC6313892 PMC6337347 PMC6359256 
running: word; word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}]PMC1240683 .PMC2216073 PMC3202864 PMC3283951 PMC3355587 PMC3417362 PMC3497943 PMC3573209 PMC3604591 PMC3706808 PMC3707038 .PMC3728534 PMC3738838 PMC3775148 PMC3812866 PMC3815904 PMC3815906 PMC3820493 PMC3825493 PMC3836376 PMC3868918 .PMC3947992 PMC4022417 PMC4045152 PMC4163387 PMC4265070 PMC4265282 PMC4285135 PMC4285865 PMC4312627 PMC4318275 .PMC4333861 PMC4358370 PMC4377440 PMC4389352 PMC4413195 PMC4440916 PMC4479509 PMC4500914 PMC4512045 PMC4522733 .PMC4527079 PMC4550782 PMC4561359 PMC4563596 PMC4581282 PMC4585250 PMC4626563 PMC4632817 PMC4646962 PMC4729944 .PMC4748402 PMC4754410 PMC4778271 PMC4792885 PMC4801890 PMC4802167 PMC4811947 PMC4819777 PMC4844426 PMC4849068 .PMC4880627 PMC4885868 PMC4909795 PMC4917562 PMC4925718 PMC4938854 PMC4949542 PMC4988986 PMC5035732 PMC5035750 .PMC5043059 PMC5067414 PMC5069422 PMC5080360 PMC5085706 PMC5099148 PMC5116465 PMC5127157 PMC5156507 PMC5244474 .PMC5299014 PMC5299024 PMC5388769 PMC5395610 PMC5403934 PMC5532450 PMC5610682 PMC5660262 PMC5662797 PMC5671593 .PMC5686270 PMC5715960 PMC5741648 PMC5742157 PMC5744479 PMC5748579 PMC5748586 PMC5767233 PMC5786577 PMC5787091 .PMC5809494 PMC5811519 PMC5812248 PMC5818412 PMC5827301 PMC5870681 PMC5872327 PMC5923616 PMC5979581 PMC5981179 .PMC5994547 PMC5996133 PMC6027233 PMC6079243 PMC6092505 PMC6094092 PMC6110075 PMC6110341 PMC6111575 PMC6116750 .PMC6125355 PMC6132428 PMC6132541 PMC6164190 PMC6206271 PMC6218572 PMC6249440 PMC6273650 PMC6274040 PMC6277688 .PMC6289982 PMC6292962 PMC6308375 PMC6311197 PMC6313892 PMC6337347 PMC6359256 ..........................................
running: search; search([country])[]..........................................
create data tables
rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrpm286macbook:test pm286$ 

What is in your dictionary? I think the problem may be there. Can you rerun my example and see if you get a crash.

And reproduce the commandline/s

@Rahul1711arora
Copy link
Author

I reran your example but again got the same crash.

C:\bin>ami-search-new -p C:\Users\Rahul\Documents\New_book_chapter\ --dictionary country

Generic values (AMISearchTool)

basename null
cproject C:\Users\Rahul\Documents\New_book_chapter
ctree
cTreeList 138 trees [C:\Users\Rahul\Documents\New_book_chapter\PMC1240
dryrun false
excludeBase null
excludeTrees null
file types []
forceMake false
includeBase null
includeTrees null
log4j
logfile null
verbose 0

Specific values (AMISearchTool)

dictionaryList [country]
dictionaryTop null
dictionarySuffix [xml]
ignorePlugins []

cProject: New_book_chapter

running: word; word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}].............Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuffer.append(Unknown Source)
at nu.xom.Element.writeStartTag(Unknown Source)
at nu.xom.Element.toXML(Unknown Source)
at org.contentmine.graphics.html.HtmlFactory.parseLegacyHtmlToWellFormedXML(HtmlFactory.java:730)
at org.contentmine.graphics.html.HtmlFactory.parse(HtmlFactory.java:643)
at org.contentmine.graphics.html.HtmlFactory.parse(HtmlFactory.java:622)
at org.contentmine.cproject.args.DefaultArgProcessor.getScholarlyHtmlElement(DefaultArgProcessor.java:1382)
at org.contentmine.cproject.files.CTree.ensureScholarlyHtmlElement(CTree.java:1239)
at org.contentmine.cproject.args.DefaultArgProcessor.extractPSectionElements(DefaultArgProcessor.java:1365)
at org.contentmine.ami.plugins.AMIArgProcessor.ensureSectionElements(AMIArgProcessor.java:257)
at org.contentmine.ami.plugins.AMIArgProcessor.runRunMethodsOnChosenArgOptions(AMIArgProcessor.java:228)
at org.contentmine.cproject.args.DefaultArgProcessor.runAndOutput(DefaultArgProcessor.java:1296)
at org.contentmine.ami.plugins.word.WordPluginOption.run(WordPluginOption.java:36)
at org.contentmine.ami.plugins.CommandProcessor.runLegacyPluginOptions(CommandProcessor.java:301)
at org.contentmine.ami.tools.AMISearchTool.runLegacyCommandProcessor(AMISearchTool.java:128)
at org.contentmine.ami.tools.AMISearchTool.runSearch(AMISearchTool.java:112)
at org.contentmine.ami.tools.AMISearchTool.processProject(AMISearchTool.java:103)
at org.contentmine.ami.tools.AMISearchTool.runSpecifics(AMISearchTool.java:93)
at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:218)
at org.contentmine.ami.tools.AMISearchTool.main(AMISearchTool.java:75)

I have almost 50 terms in my dictionary, is there a limit on how many terms one can add to their own dictionary?

@petermr
Copy link
Owner

petermr commented Mar 12, 2019 via email

@Rahul1711arora
Copy link
Author

Thank you very much! I reran the entire process with another set of files and it worked fine. But for the ones I was originally working with still had a crash. No worries, I'll run the same on another machine. Thanks!

@petermr
Copy link
Owner

petermr commented Mar 13, 2019 via email

@petermr
Copy link
Owner

petermr commented Mar 13, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants