Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the PDF pre-processor #117

Open
nevvermind opened this issue Mar 19, 2019 · 13 comments
Open

Using the PDF pre-processor #117

nevvermind opened this issue Mar 19, 2019 · 13 comments

Comments

@nevvermind
Copy link

Firstly, thanks for your effort. It's a cool setup you've done here.

And thanks @xf0e for the PR.

Unfortunately, it seems that one of the Docker images up in DockerHub is out of date. I mean https://hub.docker.com/r/tleyden5iwx/open-ocr-preprocessor. It doesn't seem to contain the PDF pre-processor code added by @xf0e's PR.

$ docker images
REPOSITORY                          TAG                 IMAGE ID            CREATED             SIZE
busybox                             latest              d8233ab899d4        4 weeks ago         1.2MB
tleyden5iwx/open-ocr-2              latest              7b3add377eb6        3 months ago        845MB
rabbitmq                            3.6.5-management    5335b737c380        2 years ago         179MB
tleyden5iwx/open-ocr-preprocessor   latest              00a689ddd4f8        4 years ago         1.31GB

The preprocessor_rpc_worker.go file is different in the built image than the one found in the master branch.

docker exec docker-compose_strokewidthtransform_1 cat /opt/go/src/github.com/tleyden/open-ocr/preprocessor_rpc_worker.go

I'm assuming that's why the No preprocessor found for: "convert-pdf" error happens: #108 (comment)

Would you mind checking if the image is up to date with the code, @tleyden, when you've got some time?

@tleyden
Copy link
Owner

tleyden commented Mar 19, 2019

It doesn't seem to contain the PDF pre-processor code added by @xf0e's PR.

Thanks for tracking that down! I will update this and post to this issue.

@nevvermind
Copy link
Author

Hi, @tleyden. Any news on the new build?

@tleyden
Copy link
Owner

tleyden commented Apr 1, 2019

Hey, sorry I haven't had a chance yet. I will do so soon.

@tleyden
Copy link
Owner

tleyden commented Apr 1, 2019

@nevvermind
Copy link
Author

Thanks, @tleyden.

I can't access the above, but these instead:

It seems there's no new build for the latter. Should there be?

@tleyden
Copy link
Owner

tleyden commented Apr 2, 2019

The build failed with: https://gist.github.com/tleyden/7de181727c5ab843c87654c8054c6843

I haven't looked closely into it yet. If you have any ideas, please post!

@nevvermind
Copy link
Author

Looks related to streadway/amqp#291

Do you need to bump the Go version maybe?

@tleyden
Copy link
Owner

tleyden commented Apr 5, 2019

Thanks @nevvermind! Seems likely.

I'm adding a ticket to use go modules.

@nevvermind
Copy link
Author

Hmm, that could take a while, am I right?
I'm wondering how did @xf0e build it without bumping the Go version.

@xf0e
Copy link
Contributor

xf0e commented Apr 10, 2019

@nevvermind I always try to use the most recently Go version. This is why I never encountered this problem.

@tleyden
Copy link
Owner

tleyden commented Apr 11, 2019

@nevvermind yeah, I don't think switching to go modules needs to block this...

quick fix: bump go version in dockerfile build
deep fix: use go modules so this doesn't keep happening

@darmanovic
Copy link

Maybe I'm missing the point, I updated go version to 1.12 but problem persists?

@tleyden
Copy link
Owner

tleyden commented Apr 12, 2019

@darmanovic you're probably not talking about the go version in the container, which is what matters at build time.

I think the underlying problem is this:

https://github.com/tleyden/docker/blob/master/stroke-width-transform/Dockerfile#L2

FROM ubuntu:14.04

It's using an ancient ubuntu, and therefore an ancient Go version.

sfcodes added a commit to sfcodes/docker that referenced this issue Aug 16, 2019
This is a major update to the open-ocr-preprocessor Dockerfile.  It updates to the latest Ubuntu LTS, latest Go, and latest available libraries; which in turn enables building PDF pre-processor support.

Previously PDF support did not work because the golang version was too outdated and build failed. This is documented in tleyden/open-ocr#117

To shrink the image size I used a two-stage build; where the first stage install the many dependencies necessary for the build, but the end result image only include the few dependencies required in runtime.  This shrink the image from 1.72GB to 310MB.

Finally, I eliminated the underlaying stroke-width-transform image as it didn't really make sense here anymore; this new image supports both stroke-width-transform and convert-pdf.
tleyden pushed a commit to tleyden/docker that referenced this issue Sep 4, 2019
This is a major update to the open-ocr-preprocessor Dockerfile.  It updates to the latest Ubuntu LTS, latest Go, and latest available libraries; which in turn enables building PDF pre-processor support.

Previously PDF support did not work because the golang version was too outdated and build failed. This is documented in tleyden/open-ocr#117

To shrink the image size I used a two-stage build; where the first stage install the many dependencies necessary for the build, but the end result image only include the few dependencies required in runtime.  This shrink the image from 1.72GB to 310MB.

Finally, I eliminated the underlaying stroke-width-transform image as it didn't really make sense here anymore; this new image supports both stroke-width-transform and convert-pdf.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants