Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to filter tokens by languages #224

Open
tailtq opened this issue May 27, 2022 · 2 comments
Open

Option to filter tokens by languages #224

tailtq opened this issue May 27, 2022 · 2 comments

Comments

@tailtq
Copy link

tailtq commented May 27, 2022

Hi everyone, I'm using the /parser/search API to search a location using free-text. Can we have another option besides lang to filter tokens by only a specific language? Thanks in advance.

Use-cases

I searched for the keyword Holland, Michigan and it returned Holland but in the War language (Austroasiatic language used by the minority of people in Bangladesh and India). When mapping into English, it turned into Baraga, which caused confusion to the users.

I have debugged and found the problem within this function:
https://github.com/pelias/placeholder/blob/master/lib/Queries.js#L83-L105

As well as two queries match_subject_distinct_subject_ids and match_subject_autocomplete_distinct_subject_ids

Proposal

In my opinion, we should define an option to specify which languages or all of them should be used. For example, we can pass a search_language parameter with a value like "eng,fra" to filter tokens in English and French. If this option is ignored, we can filter all languages.

The SQL queries after editing should be:

SELECT DISTINCT( t1.id ) AS subjectId
FROM tokens AS t1
  JOIN fulltext AS f1 ON f1.rowid = t1.rowid
WHERE f1.fulltext MATCH $subject
AND t1.lang IN ('eng', 'fra')
-- AND t1.tag NOT IN ( 'colloquial' )
ORDER BY t1.id ASC
LIMIT $limit
SELECT DISTINCT( t1.id ) AS subjectId
FROM tokens AS t1
  JOIN fulltext AS f1 ON f1.rowid = t1.rowid
WHERE f1.fulltext MATCH $subject
AND t1.lang IN ('eng', 'fra')
-- AND t1.tag NOT IN ( 'colloquial' )
ORDER BY t1.id ASC
LIMIT $limit
@tailtq
Copy link
Author

tailtq commented Jun 6, 2022

@missinglink Could you please take a look?

@missinglink
Copy link
Member

missinglink commented Jun 6, 2022

I don't understand the issue completely, your description says:

I searched for the keyword Holland, Michigan and it 
returned Holland but in the War language 
(Austroasiatic language used by the minority of 
people in Bangladesh and India). When mapping 
into English, it turned into Baraga, which caused 
confusion to the users.

However when I enter Holland, Michigan in the demo I don't see Baraga:
https://placeholder.demo.geocode.earth/demo/#eng

What am I missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants