Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • slapos slapos
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Merge requests 129
    • Merge requests 129
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Jobs
  • Commits
Collapse sidebar
  • nexedinexedi
  • slaposslapos
  • Merge requests
  • !985

OCR: Tesseract 4.1.1 / Ghostscript 9.54.0

  • Review changes

  • Download
  • Patches
  • Plain diff
Merged Jérome Perrin requested to merge feat/tesseract-version-up into master May 20, 2021
  • Overview 6
  • Commits 5
  • Pipelines 14
  • Changes 7

With tesseract v4.0.0-beta.3 we often observe crashes with:

contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 511

This seems to have been fixed by https://github.com/tesseract-ocr/tesseract/pull/1954

Still, even after updating to 4.1.1, text recognition from PDF in ERP5 is too expensive. We also update Ghostscript to 9.54.0, because this version has built-in OCR, which does not need to convert the PDF to PNG then TIFF as we currently do in ERP5.

Edited May 25, 2021 by Jérome Perrin
Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: feat/tesseract-version-up
GitLab Nexedi Edition | About GitLab | About Nexedi | 沪ICP备2021021310号-2 | 沪ICP备2021021310号-7