Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TESSDATA_PREFIX Environment Variable set but it still cannot find it #663

Open
Hakxsorus opened this issue Mar 5, 2024 · 1 comment
Open

Comments

@Hakxsorus
Copy link

Hakxsorus commented Mar 5, 2024

Error

Error opening data file tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!

But I do have the environment variable set to the tessdata folder with eng.traineddaata.

image

It only works if I call it from my project root directory where tessdata folder is published into.

Works

PS D:\Development\Blitz\Blitz\bin\Release\net7.0\win-x86\publish> blitz scan

Does Not Work

PS C:\Users\mdabr> blitz scan
Error opening data file tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Unhandled exception. Tesseract.TesseractException: Failed to initialise tesseract engine.. See https://github.com/charlesw/tesseract/wiki/Error-1 for details.
   at Tesseract.TesseractEngine.Initialise(String datapath, String language, EngineMode engineMode, IEnumerable`1 configFiles, IDictionary`2 initialValues, Boolean setOnlyNonDebugVariables)
   at Tesseract.TesseractEngine..ctor(String datapath, String language, EngineMode engineMode, IEnumerable`1 configFiles, IDictionary`2 initialOptions, Boolean setOnlyNonDebugVariables)
   at Tesseract.TesseractEngine..ctor(String datapath, String language, EngineMode engineMode)
   at Blitz.Program.RunScanMoronCommand(ScanMoronOptions opts) in D:\Development\Blitz\Blitz\Program.cs:line 156
   at Blitz.Program.<>c.<Main>b__1_3(ScanMoronOptions opts) in D:\Development\Blitz\Blitz\Program.cs:line 27
   at CommandLine.ParserResultExtensions.MapResult[T1,T2,T3,T4,T5,TResult](ParserResult`1 result, Func`2 parsedFunc1, Func`2 parsedFunc2, Func`2 parsedFunc3, Func`2 parsedFunc4, Func`2 parsedFunc5, Func`2 notParsedFunc)
   at Blitz.Program.Main(String[] args) in D:\Development\Blitz\Blitz\Program.cs:line 16
@Hakxsorus
Copy link
Author

Hakxsorus commented Mar 15, 2024

Workaround

I will not close this issue since the error persists. However, for anyone experiencing similar problems, I have found a workaround.

This is simply done by programmatically creating the tessdata directory and downloading eng.traineddata to a known location in the user's file system on app initialisation.

Note that this is for a production environment and only needs to be done once. Consider disabling this check for local debugging.

1. Get a known path (e.g. AppData)

Create the tessdata directory there.

private const string AppDataFolderName = "YourAppName";
private const string TessdataFolderName = "tessdata";

/// <summary>
/// Gets the path to Blitz's directory in the AppData folder.
/// </summary>
/// <returns>The application directory path.</returns>
public string GetAppDataFolderPath()
{
    var appDataPath = Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData);
    return Path.Combine(appDataPath, AppDataFolderName);   
}

/// <summary>
/// Gets the path to Blitz's tessdata directory.
/// </summary>
/// <returns>The tessdata path.</returns>
public string GetTessdataFolderPath()
{
    return Path.Combine(GetAppDataFolderPath(), TessdataFolderName);
}

2. Download tessdata/*.traineddata to that path

Make sure to check the directory exists there before downloading.

/// <summary>
/// Downloads the Tesseract English language model to the tessdata folder.
/// </summary>
/// <param name="tessdataFolderPath">The path to the tessdata folder.</param>
private static async Task DownloadTrainedData(string tessdataFolderPath)
{
    const string tessdataEngFileName = "eng.traineddata";
    const string tessdataEngUrl = "https://github.com/tesseract-ocr/tessdata_fast/raw/main/eng.traineddata";

    using var client = new HttpClient();

    await using var stream = await client.GetStreamAsync(tessdataEngUrl);
    await using var fs = new FileStream(Path.Combine(tessdataFolderPath, tessdataEngFileName),
        FileMode.OpenOrCreate);

    await stream.CopyToAsync(fs);
}

3. Initialize the engine using your defined path

using var engine = new TesseractEngine(tessdataFolderPath, "eng", EngineMode.Default);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant