-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script to import resmap collection & small fixes #209
Conversation
With respect to the categories and category groups: It is not correctly displayed, I will rework it. The rest of the items appear to be shown ok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice!
This can be merged as-is, or you can push further:
- Create a ResourceMap namespace and split a few files, for example:
app/models/resource_map/collection.rb
app/models/resource_map/site_parser.rb
note: we might want to reuse normalization
that is already used rather than introduce a new parser
term?
- Move the CSV generation out of the site parser and keep it in the script.
Then in a subsequent PR: use the metadata on the resmap fields to identify the fields to map, so we can use any collection in an abstract way, not just the collection we happen to have.
bin/import-resmap-collection
Outdated
|
||
def generate_unique_id(name) | ||
return nil if name.nil? | ||
Digest::MD5.hexdigest(name) # TODO: use UUIDs? How to enconde ids? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The id
can be used for in-place updates and it might be used through the API, so it must be stable (well, as stable as possible). Calculating the hash digest of a stable value is a good solution since the hash digest will be as stable as the input value.
Alternatively we could use UUIDv3 or UUIDv5 to generate stable UUIDs that are neither random nor time based (they internally use MD5 or SHA1) but I'm not sure this is worth it.
See https://en.wikipedia.org/wiki/Universally_unique_identifier#Versions_3_and_5_(namespace_name-based)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Digest::MD5.hexdigest(name) if name.presence
feels more like Ruby.
bin/import-resmap-collection
Outdated
return nil if category_name.nil? | ||
|
||
# Create or find the category | ||
unless categories.has_key?(category_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bug:
unless categories.has_key?(category_name) | |
unless categories.has_key?(category_id) |
bin/import-resmap-collection
Outdated
category_group_name = properties["fac_ type-2"] ? properties["fac_type-1"] : category_name | ||
category_group_id = generate_unique_id(category_group_name) | ||
|
||
return nil if category_name.nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this should be right after the assignment of category_name
.
nitpick:
return nil if category_name.nil? | |
return unless category_name |
bin/import-resmap-collection
Outdated
unless locations.key?(location_id) | ||
location = {} | ||
location["id"] = location_id | ||
location["name"] = value | ||
location["parent_id"] = parent_ids[level - 1] # Parent from previous level | ||
|
||
locations[location_id] = location | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Ruby's idiomatic way is:
unless locations.key?(location_id) | |
location = {} | |
location["id"] = location_id | |
location["name"] = value | |
location["parent_id"] = parent_ids[level - 1] # Parent from previous level | |
locations[location_id] = location | |
end | |
locations[location_id] ||= { | |
"id" => location_id, | |
"name" => value, | |
"parent_id" => parent_ids[level - 1], # Parent from previous level | |
} |
Additional notes: The more stability we have, the less likely we are to break anything when we do an upgrade. For example, if we use random UUID then the IDs will change on each upgrade, and won't that break each facility URL? I'd hash I'm not fond of delete everything and reinsert everything. That would break FPP during the upgrade, and since everything's stored in ElasticSearch and not in PostgreSQL, we can't even count on a database transaction (:face_exhaling:). Now, maybe doing the whole update in one bulk request would be fast enough that it's acceptable? |
For facilities, the import script that is already there is doing inside fpp:
Then I won't use the uuid for them and save the resmap one. In the example dataset there was uuids everywhere and I will use them then for all the other entities. |
I realized that I wasn't 100% clear with respect to the deletion-updating of sites: this script creates a dataset to be imported with the Also, I will create an issue for your suggestion:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one typo and a couple suggestion, then LGTM!
facility["last"] = nil # TODO | ||
facility["photo"] = nil # TODO | ||
facility | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: maybe:
def extract_facility(site, properties)
return unless site["lat"] && site["lng"]
{
"lat" => site["lat"],
...
}
end
facility["name"] = site["name"] | ||
facility["lat"] = site["lat"] | ||
return nil if facility["lat"].nil? | ||
facility["lng"] = site["long"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
facility["lng"] = site["long"] | |
facility["lng"] = site["lng"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, in resmap sites the name is long
not lng
.
Co-authored-by: Julien Portalier <[email protected]>
Small changes & fixes:
bin/import-dataset
script was successful (tested with the raw dataset available on the slack channel) but the facilities couldn't be visualized since thephoto
field was not set for facilities, it was added.es
locale in development generated unnecessary problems since the datasets we're working with have no Spanish data.Import ResMap collection:
New script to import data consuming it directly from ResourceMap.
ResourceMap::Collection
will be filled with hashes representingSite
s in ResourceMap.ResmapSiteParser
is a class that will parse each of theseSite
s and will generate an FPPFacility
but also all the other entities (Category
(ies),Service
s,Location
s) with its hierarchies.bin/import-dataset
script.Site
s without coordinates and only one withoutCategory
).Some notes:
Digest::MD5.hexdigest(name)
function. There is no visualization of the IDs, they are just used on the database. For other datasets like theraw
dataset, they usedUUID
s. When importingSite
s from ResMap the IDs are auto numeric ones and this could bring errors to FPP, that's why this decision was made, any suggestions are welcome here.facility_types
, this on FPP has aname
and apriority
. Since there was no data for this. A unique type, "Institution", with priority "1" was hardcoded for all the facilities. In the PdMing stage, this will have to be revised.photo
information was present in the Tanzania dataset, then nothing was done with respect to this, to be revised also in the PdMing stage.