Replies: 9 comments 2 replies
-
Hi Elisheva, thanks for using my library.
No. How could I? You haven't shown me a single line of your code, so I'm not able to help you. But if you do, there will be a chance. :) |
Beta Was this translation helpful? Give feedback.
-
hi! and calling it using: and that's it :) Thanks |
Beta Was this translation helpful? Give feedback.
-
Alright, but this is still too little information. Please show me the code of your benchmark, too. And the content of your text files, if possible. |
Beta Was this translation helpful? Give feedback.
-
hi we're running this:
Lingua is struggling to detect a particular Turkish string : Bu yerli bir metin dizesinde bir dil bulmak yeteneği bizim doğrulamak gereken bir testtir. Ben iyi çalışıyor umuyoruz. other sentences are detected in less than a second. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Ah, now I think I know what's going on. It takes longer the first time because the library loads the language models lazily into memory, i.e. only on demand. If you load all language models beforehand, then these performance differences will go away. This is documented in the docs. Build your language detector like so: LanguageDetector detector = LanguageDetectorBuilder
.fromAllLanguages()
.withPreloadedLanguageModels() // this method loads all models eagerly
.build(); |
Beta Was this translation helpful? Give feedback.
-
ok, thanks! |
Beta Was this translation helpful? Give feedback.
-
Yes. That's why you should think about using only a subset of the supported languages for your task. It's very likely that you don't need all 75 languages for detecting the languages of your data. Alternatively, simply leave the current lazy loading as it is. |
Beta Was this translation helpful? Give feedback.
-
we use lingua in an SDK that in use by many services in my team |
Beta Was this translation helpful? Give feedback.
-
No, the detection won't be slower. If you load all languages models at once, they will just consume more memory. For version 1.2.0, I'm currently working on improving performance and reducing the memory footprint, so this will become even better in the future. |
Beta Was this translation helpful? Give feedback.
-
Hi!
we're using your wonderful language detection package :)
We're consuming version 1.0.0
However, it seems that Turkish is causing a lag:
Took 0.982967443 seconds to detect language of DANISH
Took 0.013375939 seconds to detect language of GERMAN
Took 0.00895221 seconds to detect language of ENGLISH
Took 0.004931715 seconds to detect language of SPANISH
Took 0.007953544 seconds to detect language of FRENCH
Took 0.004886938 seconds to detect language of ITALIAN
Took 0.003902518 seconds to detect language of JAPANESE
Took 0.002207307 seconds to detect language of KOREAN
Took 0.01076291 seconds to detect language of MALAY
Took 0.004513402 seconds to detect language of DUTCH
Took 0.005380963 seconds to detect language of NORWEGIAN
Took 0.009575043 seconds to detect language of POLISH
Took 0.004399465 seconds to detect language of PORTUGUESE
Took 0.00420948 seconds to detect language of SWEDISH
Took 0.001934336 seconds to detect language of THAI
Took 6.708378177 seconds to detect language of TURKISH
Took 8.58397E-4 seconds to detect language of CHINESE_SIMPLIFIED
Took 7.28538E-4 seconds to detect language of CHINESE_TRADITIONAL
any idea what causing this issue?
Thank you!
Elisheva.
Beta Was this translation helpful? Give feedback.
All reactions