pre-extracted data in .tsv format #140

Digital-XxX · 2022-07-11T10:13:05Z

Please give me pre-extracted data in .tsv format please. Goldendict mobile cannot read .json dictionaries.

kristian-clausal · 2022-07-11T10:36:16Z

Unfortunately the data we provide is not suitable to be used straightforwardly in .tsv or .csv. The JSON data is hierarchical, with big and reasonably sprawling word structures that contain smaller structures, dictionaries and lists, and translating that to .tsv needs to be done on a case-by-case basis. It's not a universal data format that is swappable between different programs (at least yet, or in the near future), it's just a bunch of data we've put into an adhoc data structure as need be.

To make what you want possible you need to:

figure out what the Goldendict .tsv format needs to work
understand the structure (which can change as time goes by) of our .json data
create a translation or mapping to get the data you want from one to the other
program a script that will do that translation by reading the json file object-by-object and then outputting it into .tsv

We welcome any contributions to the project to make it more accessible.

Digital-XxX · 2022-07-11T10:49:27Z

program a script that will do that translation by reading the json file object-by-object and then outputting it into .tsv

I think pyglossary supports conversion of .json to .tsv/.tab

kristian-clausal · 2022-07-11T11:12:50Z

program a script that will do that translation by reading the json file object-by-object and then outputting it into .tsv

I think pyglossary supports conversion of .json to .tsv/.tab

We would be happy to have someone implement a conversion utility for our .json to other formats, but someone has to code it first, and our data structure and format can change as time goes by.

Vuizur · 2022-07-27T16:09:47Z

I created a project that is able to create tsv/stardict/kindle dictionaries from the kaikki dump. It is only not extremely well tested, but possibly it works: https://github.com/Vuizur/ebook_dictionary_creator

Vuizur · 2022-08-22T11:51:50Z

I also now have a repository with directly downloadable dictionaries for a lot of languages in 3 different formats: https://github.com/Vuizur/Wiktionary-Dictionaries

GrimPixel · 2024-04-08T14:55:55Z

Here is a new tool: https://codeberg.org/GrimPixel/Text_to_Wordlist
You can place your text file in the corresponding directory 0_text, then check the text_setting.yaml and dictionary_setting.yaml, then run extract_text.py and extract_dictionary.py to generate a TSV file with values separated as described in README.adoc.

kristian-clausal added the enhancement New feature or request label Jul 11, 2022

kristian-clausal added the good first issue Good for newcomers label Jul 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre-extracted data in .tsv format #140

pre-extracted data in .tsv format #140

Digital-XxX commented Jul 11, 2022

kristian-clausal commented Jul 11, 2022

Digital-XxX commented Jul 11, 2022 •

edited

Loading

kristian-clausal commented Jul 11, 2022

Vuizur commented Jul 27, 2022 •

edited

Loading

Vuizur commented Aug 22, 2022

GrimPixel commented Apr 8, 2024 •

edited

Loading

pre-extracted data in .tsv format #140

pre-extracted data in .tsv format #140

Comments

Digital-XxX commented Jul 11, 2022

kristian-clausal commented Jul 11, 2022

Digital-XxX commented Jul 11, 2022 • edited Loading

kristian-clausal commented Jul 11, 2022

Vuizur commented Jul 27, 2022 • edited Loading

Vuizur commented Aug 22, 2022

GrimPixel commented Apr 8, 2024 • edited Loading

Digital-XxX commented Jul 11, 2022 •

edited

Loading

Vuizur commented Jul 27, 2022 •

edited

Loading

GrimPixel commented Apr 8, 2024 •

edited

Loading