Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove source_id field #383

Open
orangejulius opened this issue Oct 4, 2019 · 0 comments
Open

Remove source_id field #383

orangejulius opened this issue Oct 4, 2019 · 0 comments

Comments

@orangejulius
Copy link
Member

orangejulius commented Oct 4, 2019

Each of our Elasticsearch documents stores the original source ID from the upstream data in two places: as part of the GID in the _id field, and in the source_id field.

Conservatively estimating 10 bytes per record, this probably represents about 6GB of duplicated data in a full planet build, and thus it would make sense to get rid of it when possible.

There will be changes required to the API, pelias/model, and here in pelias/schema to do this.

We'll also have to ensure reasonable backwards compatibility is maintained. Especially in the API, where the source_id field in our GeoJSON responses will still be required.

orangejulius added a commit to pelias/api that referenced this issue Oct 4, 2019
This change implements the 5th and final step in the GID migration plan
outlined in
pelias/pelias#672 (comment).

It adjusts the expectations of the API so that it requires the
full GID, rather than the source_id, to be stored in the `_id` field in
Elasticsearch.

In the process, the `source_id` field is also no longer used in
`helper/geojsonify.js`, making this the first step towards removing the
`source_id` field (pelias/schema#383).

Connects pelias/pelias#672
missinglink pushed a commit to pelias/api that referenced this issue Oct 9, 2019
This change implements the 5th and final step in the GID migration plan
outlined in
pelias/pelias#672 (comment).

It adjusts the expectations of the API so that it requires the
full GID, rather than the source_id, to be stored in the `_id` field in
Elasticsearch.

In the process, the `source_id` field is also no longer used in
`helper/geojsonify.js`, making this the first step towards removing the
`source_id` field (pelias/schema#383).

Connects pelias/pelias#672
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant