Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress INFO messages? #3

Open
FinnWoelm opened this issue Nov 8, 2018 · 5 comments
Open

Suppress INFO messages? #3

FinnWoelm opened this issue Nov 8, 2018 · 5 comments

Comments

@FinnWoelm
Copy link

Hi there,

First of all: Thank you for forking Yomu and bringing it back alive. Absolutely amazing work.

Second: Any idea on how I might suppress INFO messages from showing up? These occur when I'm parsing a PDF document. My Rails logger is set to warning, but I'm guessing these show because they're coming directly from Apache Tika.

INFO  To get higher rendering speed on JDK8 or later,
INFO    use the option -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
INFO    or call System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider")

Cheers,
Finn

@abrom
Copy link
Owner

abrom commented Nov 9, 2018

Yeah, unfortunately that's coming from PDFBox - as used by Tika (due to a change in Java 8 where the default is to use LittleCMS instead of KCMS). According to their own documentation:

KCMS is the unmaintained, legacy provider and is far faster than the newer replacement.
However, there are stability and security risks with using the unmaintained legacy provider.

So why they feel it necessary to spout all of that 'information' about it is beyond me.

The info itself is coming from:
https://github.com/apache/pdfbox/blob/f83bcc1fe60502759024a3b51983b29c7de66327/pdfbox/src/main/java/org/apache/pdfbox/rendering/PDFRenderer.java#L394

I did look into it a while back, and as far as I could tell there wasn't really a nice way to suppress this info (and not end up suppressing ALL info). I've just been putting up with it.

If you feel the need, you can overload the config for the pdfbox logger and pipe it to somewhere else.

@FinnWoelm
Copy link
Author

Thanks for the fast reply! Much appreciated!

You having used Apache Tika much longer than I have, do you think I would be losing anything of importance if I decided to filter out all 'INFO' messages by filtering the return of io.read?

I would imagine any issues of crucial concern would have an ERROR or WARNING status. It could even become a Henkei setting, e.g. Henkei.log_info = true/false.

@abrom
Copy link
Owner

abrom commented Nov 14, 2018

Hmm that sounds a bit dangerous (ie you could filter out non-info things you didn't mean to). I would think the more reliable solution would be to overload the config for the pdfbox logger to simply change the logger level.

@FinnWoelm
Copy link
Author

Hmmm, fair enough.

I'm pretty unfamiliar with Java, that's why I tried to avoid having to touch the pdfbox logger config 😅 Is that something I would do in jar/tika-config.xml?

@abrom
Copy link
Owner

abrom commented Nov 15, 2018

It's been a while since I've looked at Java.

The pdfbox library uses the Apache Commons Logging library so I think that'd be the place to start: https://commons.apache.org/proper/commons-logging/guide.html#Quick_Start

It appears to be more of a wrapper for other logging systems and I have no idea which one that actually would be. It seems like it depends on what you have installed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants