Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf-8 unicode #44

Open
haditabealhojeh opened this issue Jan 2, 2024 · 1 comment
Open

utf-8 unicode #44

haditabealhojeh opened this issue Jan 2, 2024 · 1 comment
Assignees

Comments

@haditabealhojeh
Copy link

haditabealhojeh commented Jan 2, 2024

Hi everybody. I am new to Norconex. I am trying to write a crawler using this config. but i got unicode characters instead of utf-8 (Persian) in the results.

@ohtwadi
Copy link

ohtwadi commented Jan 23, 2024

Please try with pretty=true when querying ES - http://localhost:9200/dental2-index/_search?pretty=true

The JSONFileCommitter clearly shows the characters as expected.

[{"upsert":{"reference":"https://www.paziresh24.com/app","metadata":{"title":["دانلود اپلیکیشن نوبت دهی پزشکان پذیرش24"]},"content":"دریافت نوبت\n\n         اولین نوبت خالی برای شماست! مهم نیست چه زمانی و کجا نیازی به پزشک داشته ... }}]

You may also want to read up on setting up your ES for the Persian language (see here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants