Skip to content

Commit

Permalink
[RELEASE] iText 7 pdfOcr - 1.0.0
Browse files Browse the repository at this point in the history
https://git.itextsupport.com/

* release/1.0.0:
  [RELEASE] 1.0.0-SNAPSHOT -> 1.0.0
  Hide possibility to set userWords
  Refactor MultiThreadingTest test to reuse code from IntegrationTestHelper
  Make the scope of a method stricter
  A couple of small fixes to remove workarounds from code
  Allow tesseract4 events from com.itextpdf.pdfocr space
  Implement pdfOcr licensing
  Create jar with sources in Maven
  User words file is unexpectedly removed from disk
  Increase test timeouts
  Deploy jar with sources to Artifactory as the add-on is open source
  Make pdfOcr classes autoportable
  Remove licensekey version property from root pom file
  Fix issue with saving processed images
  Performance drop on some complex halftone images
  Improve artifact descriptions
  Improve Javadocs for Tesseract implementations
  Small fix to avoid inner class in .NET
  Change in Jenkinsfile to abort possible already running automatic builds
  Change in Jenkinsfile so that the automatic build is only blocked when the build for itextcore for Java is running
  Hide AbstractTesseract4OcrEngine#doTesseractOcr(File , List<File>, OutputFormat, int)
  AbstractIntegrationTest#testSimpleTextOutput is triggered 13 times PDFOC-89
  Add copyright headers
  Add license information
  Fix several Javadoc and code remarks  PDFOC-84
  Throw proper exceptions in case the Tesseract prerequisites have possbily not been met
  Add FontProvider mechanism PDFOC-73
  Update .mailmap
  Improve test coverage
  Add ActualText if there are NotDef glyphs
  Introduce an option not to add layers to output PDF file PDFOC-74
  Move NOTICE.txt to another directory
  Update log message
  Update comments
  Update command structure for executable
  Fix remarks related to TesseractOcrUtil class and add check for NOTDEF glyphs
  Update target branch for sonar
  Fix various code remarks
  Fix various code and API design remarks
  Split to two modules
  Change name of root artifact
  Remove clirr-maven-plugin
  Improve test coverage
  Split to two modules
  Fix for SonarQube analysis
  PDFOC-65 Fix various code remarks in test code
  PDFOC-65 Fix various code remarks in test code
  Set user_defined_dpi
  Fix various code and API design remarks
  Fix various code and API design remarks
  On Linux the VM crashes at times to build the Java version of pdfOCR
  Remove vulnerable dependency
  On Linux the VM crashes at times to build the Java version of pdfOCR
  Fix various code remarks
  Build only on windows until PDFOC-68 is fixed
  Add category to tests
  Refactor test to junit ExpectedException
  Add test for invalid font
  Fix javadoc issues
  Fix code style for enums
  Rename test files
  Refactor ocr images method and remove ImgFormat enum
  Add license info for fonts
  Refactor exceptions and log messages
  Remove unused method
  Add tests for log messages
  Remove unused test files
  Remove commented code
  Change Jsoup to styled-xml-parser and fix according to review
  Refactoring for porting to .net
  Update test files
  Fix for user words
  Update dependencies
  Update tests
  Fix text positioning
  Add tests for PDFCOC-31,32,33,34
  Refactor image preprocessing
  Add tests for ppm images
  Remove creating the sources jar from pom.xml
  Remove creating the sources jar from the Jenkinsfile
  Fix ocr for ppm images
  Fix Jenkinsfile: mvn workspace repository for Windows machines
  Check ppm on linux
  Fix for eng language
  Fix for tmp file in tmp directory
  Add default language with adding user-words
  Fix getting font path
  Fix for embedded font in jar
  Refactoring for porting to .net
  Add custom user words
  Fix wrong message in OCRException
  Add tests for text files
  Add tests for text file output
  Add possibility to OCR to a file + refactoring for multipage tiffs
  Small refactoring, add test for ppm images
  Fix for PNM images
  Fix for tif images
  Performance improvements of Jenkins builds
  Update text positioning PDFOC-18
  Add gitattributes
  Update default font
  Add .gitignore
  Add tests for path to hocr script
  Update preprocessing
  Fix tests
  Make path to tess data mandatory
  Add separator for tess data path
  Remove createPdfA3u parameter
  Update tests with transparent text
  Update TextInfo to public
  Change default text color to transparent
  Add new font for tests
  Add comments
  Add greek test
  Add missed test pdf
  Update compare tool test
  Replace few tests using compare tool
  Fix for tiff images
  Add preprocessing and fix tests
  Add logging for exceptions
  Move tests for lib
  Add tesseract lib and tests
  Update images coordinates calculation
  Update tesseract dir
  Add null check for imagedata
  Update scale mode tests
  Update default scale mode
  Update exception handling
  Add new test image
  Add empty text test
  Update tests and code style
  Add basic exception handling and cosmetic refactoring
  Add placeholder in case of corrupted images
  Update tests with compare tool
  Refactoring according to the checkstyle plugin checks
  Add japicmp plugin
  Clean up dependencies
  Fix logging lib
  Add tests for tiff
  Update tests for new tess data files
  Add tests for scripts
  Update directories structure and add tests for languages
  Update exception handling
  Update temp filenames in tests
  Add tests for pdfa3u
  Add tests and update structure
  Add extended tests using compare tool
  first approach
  • Loading branch information
iText-CI committed Jun 25, 2020
2 parents 65582e2 + 306e56c commit f271a87
Show file tree
Hide file tree
Showing 185 changed files with 14,404 additions and 82 deletions.
62 changes: 62 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Set the default behavior, in case people don't have core.autocrlf set.
* text=auto

# Explicitly declare text files you want to always be normalized and converted
# to LF line endings on checkout.
*.afm text eol=lf
*.cmap text eol=lf
*.cs text eol=lf ident
*.css text eol=lf
*.htm text eol=lf
*.html text eol=lf
*.java text eol=lf ident
*.lng text eol=lf
*.md text eol=lf
*.pom text eol=lf
*.properties text eol=lf
*.svg text eol=lf
*.txt text eol=lf
*.xfdf text eol=lf
*.xht text eol=lf
*.xhtml text eol=lf
*.xml text eol=lf
port-hash text eol=lf

# Declare files that will always have CRLF line endings on checkout.
*.bat text eol=crlf
*.csproj text eol=crlf
*.sln text eol=crlf

# Denote all files that are truly binary and should not be modified.
*.aif binary
*.aiff binary
*.bmp binary
*.cer binary
*.cmp binary
*.crt binary
*.dib binary
*.gif binary
*.icc binary
*.j2k binary
*.jb2 binary
*.jp2 binary
*.jpc binary
*.jpg binary
*.key binary
*.otf binary
*.p12 binary
*.pdf binary
*.pfb binary
*.pfm binary
*.png binary
*.snd binary
*.tif binary
*.tiff binary
*.ttc binary
*.ttf binary
*.u3d binary
*.wav binary
*.wmf binary
*.woff binary
*.woff2 binary
*.dat binary
157 changes: 157 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Created by https://www.gitignore.io

### Java ###
*.class

# Mobile Tools for Java (J2ME)
.mtj.tmp/

# Package Files #
*.jar
*.war
*.ear

# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
hs_err_pid*


### Eclipse ###
*.pydevproject
.metadata
.gradle
bin/
tmp/
*.tmp
*.bak
*.swp
*~.nib
local.properties
.settings/
.loadpath

# Eclipse Core
.project

# External tool builders
.externalToolBuilders/

# Locally stored "Eclipse launch configurations"
*.launch

# CDT-specific
.cproject

# JDT-specific (Eclipse Java Development Tools)
.classpath

# PDT-specific
.buildpath

# sbteclipse plugin
.target

# TeXlipse plugin
.texlipse


### Intellij ###
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm

*.iml

## Directory-based project format:
.idea/
# if you remove the above rule, at least ignore the following:

# User-specific stuff:
# .idea/workspace.xml
# .idea/tasks.xml
# .idea/dictionaries

# Sensitive or high-churn files:
# .idea/dataSources.ids
# .idea/dataSources.xml
# .idea/sqlDataSources.xml
# .idea/dynamic.xml
# .idea/uiDesigner.xml

# Gradle:
# .idea/gradle.xml
# .idea/libraries

# Mongo Explorer plugin:
# .idea/mongoSettings.xml

## File-based project format:
*.ipr
*.iws

## Plugin-specific files:

# IntelliJ
out/

# mpeltonen/sbt-idea plugin
.idea_modules/

# JIRA plugin
atlassian-ide-plugin.xml

# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties


### NetBeans ###
nbproject/private/
build/
nbbuild/
dist/
nbdist/
nbactions.xml
nb-configuration.xml
.nb-gradle/


### Linux ###
*~

# KDE directory preferences
.directory

# Linux trash folder which might appear on any partition or disk
.Trash-*


### Windows ###
# Windows image file caches
Thumbs.db
ehthumbs.db

# Folder config file
Desktop.ini

# Recycle Bin used on file shares
$RECYCLE.BIN/

# Windows Installer files
*.cab
*.msi
*.msm
*.msp

# Windows shortcuts
*.lnk

target/
nbactions*.xml
.checkstyle
.pmd
.pmdruleset.xml

# Ignore generated files
*.log

.vagrant/
.vscode/
79 changes: 79 additions & 0 deletions .mailmap
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
Alan Goo <[email protected]> <[email protected]>
Alexander Chingarev <[email protected]> <[email protected]>
Alexander Chingarev <[email protected]> <[email protected]>
Alexander Chingarev <[email protected]> <[email protected]>
Alexey Subach <[email protected]> <[email protected]>
Alexey Subach <[email protected]> <[email protected]>
Amedee Van Gasse <[email protected]> <[email protected]>
Amedee Van Gasse <[email protected]> <[email protected]>
Andrew Panfilov <[email protected]> <[email protected]>
Bart De Meyer <[email protected]> <[email protected]>
Benoît Lagae <[email protected]> <benoit@iText-blagae>
Benoît Lagae <[email protected]> <[email protected]>
Benoît Lagae <[email protected]> <[email protected]>
Bruno Lowagie <[email protected]> <[email protected]>
Bruno Lowagie <[email protected]> <[email protected]>
Bruno Lowagie <[email protected]> <iText@Catullus>
Bryan <[email protected]> <[email protected]>
Dimitry Alexandrov <[email protected]> <[email protected]>
Dmitry Trusevich <[email protected]> <[email protected]>
Dmitry Trusevich <dmitry.trusevich@duallab> <dmitry.trusevich@duallab>
Dominik Helm <[email protected]> <[email protected]>
gothinkfree <[email protected]> <[email protected]>
Ilya Idamkin <[email protected]> <ilya.idamkin@TeamCity>
iText Software <[email protected]> <[email protected]>
iText Software <[email protected]> <[email protected]>
iText Software <[email protected]> <[email protected]>
iText Software <[email protected]> <[email protected]>
iText Software <[email protected]> <[email protected]>
iText Software <[email protected]> <teamcity.bot@TeamCity>
iText Software <[email protected]> <[email protected]>
iText Software <[email protected]> <[email protected]>
Jeff Monson <[email protected]> <[email protected]>
Joris Schellekens <[email protected]> <[email protected]>
Kevin Day <[email protected]> <[email protected]>
Kevin Day <[email protected]> <[email protected]>
Kevin Willems <[email protected]> <[email protected]>
LaughingMan <[email protected]> <[email protected]>
Markus Wernig <[email protected]> <[email protected]>
Marvin Wichmann <[email protected]> <[email protected]>
Marvin Wichmann <[email protected]> <[email protected]>
Marvin Wichmann <[email protected]> <[email protected]>
Michaël Demey <[email protected]> michael.demey <>
Michaël Demey <[email protected]> <[email protected]>
Michaël Demey <[email protected]> <michael.demey@TeamCity>
Michaël Demey <[email protected]> <[email protected]>
Michael Glazunoff <[email protected]> <[email protected]>
Michael Klink <[email protected]> <[email protected]>
Michael Klink <[email protected]> <[email protected]>
Nadia Ivaniukovich <[email protected]> <[email protected]>
Nadia Ivaniukovich <[email protected]> <[email protected]>
Nadja Sych <[email protected]> <[email protected]>
Natalia Zgirovskaya <[email protected]> <[email protected]>
Natalia Zgirovskaya <[email protected]> <[email protected]>
Olivier Blaise <[email protected]> <[email protected]>
Orabi Nakhla <[email protected]> <[email protected]>
Orabi Nakhla <[email protected]> <[email protected]>
Paulo Soares <[email protected]> <[email protected]>
Paulo Soares <[email protected]> <[email protected]>
Pavel Alay <[email protected]> pavel.alay <>
Pavel Alay <[email protected]> <[email protected]>
Pavel Alay <[email protected]> <pavel.alay@TeamCity>
Pavel Morozov <[email protected]> <[email protected]>
Pavel Morozov <[email protected]> <[email protected]>
Peter Goodman <[email protected]> <[email protected]>
Peter Goodman <[email protected]> <[email protected]>
Peter Goodman <[email protected]> <[email protected]>
Peter Kjuak <[email protected]> <[email protected]>
Richard Schwark <[email protected]> <[email protected]>
Roman Leonov <[email protected]> <[email protected]>
Roman Nadvodny <[email protected]> <[email protected]>
Sasha Kalykhan <[email protected]> <[email protected]>
Sasha Kalykhan <[email protected]> <[email protected]>
Semen Yakushev <[email protected]> <[email protected]>
Valera <[email protected]> <[email protected]>
Veronika Lisovskaya <[email protected]> <veronika.lisovskaya@TeamCity>
Vit Nemecky <[email protected]> <[email protected]>
Yanina Cheremisina <[email protected]> <[email protected]>
Yulian Gaponenko <[email protected]> <duallab@DESKTOP-PG4L5J1>
Yulian Gaponenko <[email protected]> <yulian.gaponenko@TeamCity>
Loading

0 comments on commit f271a87

Please sign in to comment.