Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding relationships to dimension tables #2767

Open
atalyaalon opened this issue Jan 21, 2025 · 0 comments
Open

Adding relationships to dimension tables #2767

atalyaalon opened this issue Jan 21, 2025 · 0 comments
Milestone

Comments

@atalyaalon
Copy link
Collaborator

Adding relationships to dimension tables is relevant for CBS tables (AccidentMarker, Involved, Vehicle) and Safety Data tables (SDAccident, SDInvolved).

This solution is best practice for data integrity, that can make most or all of the Heberw View tables redundant (InvolvedMarkerView, AccidentMarkerView, AccidentMarkerView, VehicleMarkerView, VehiclesView)

Adding DB relationships for dimension tables (RoadType, Cities, MultiLane, AgeGroup) will ensure both data integrity. Querying hebrew fields (and in the coming future also English fields) will be straight forward, with no need for creating our local constant dictionaries, when there's no such special need (exceptions for that are in calculated fields of our own, for example injured_type_enriched_hebrew).

One can access tables linked in foreign keys from facts table, for example in this way makrers filters involved
Also, once the foreign keys exist, there is no need for explicitly joining them. It also makes sure that we have data integrity between foreign keys (we can't add road_type value to SDAccidents that does not exist in dictionary RoadType table)

However, to use this function we will need to make sure there is data consistency in the dictionary tables.
Hence, in each such dimension table you're using, we need to make sure that when we update data, there is consistency in DB, meaning data is not deleted w/o insertion or update, and when deletion occurs it's not committed to DB before insertion, and that rollback exists in error.
For example I added this pr for CBS dictionary tables.

Another advantage is readability:
In this case, for example, the query itself becomes more readable, no need to mention the key joining on when relationship exists:

def sd_involved_query_from_anyway_tables():
    street1_alias = aliased(Streets)
    street2_alias = aliased(Streets)

    query = (
        db.session.query(
            Involved.id,
            AccidentMarker.accident_year,
            AccidentMarker.accident_timestamp,
            AccidentMarker.city.heb_name.label("city_name"),
            street1_alias.street_hebrew.label("street1"),
            street2_alias.street_hebrew.label("street2"),
            AccidentMarker.road_light.road_light_hebrew.label("road_light"),
            AccidentMarker.road_type.road_type_hebrew.label("road_type"),
            AccidentMarker.latitude,
            AccidentMarker.longitude,
            Involved.injury_severity.injury_severity_hebrew.label("injury_severity"),
            Involved.sex.sex_hebrew.label("gender"),
            Involved.age_group.age_group_hebrew.label("age_group"),
        )
        .join(AccidentMarker)  # Implicitly uses relationships
        .join(AccidentMarker.Cities)
        .join(Involved.injury_severity)
        .join(street1_alias, AccidentMarker.street1 == street1_alias.street)
        .join(street2_alias, AccidentMarker.street2 == street2_alias.street)
        .join(Involved.sex)
        .join(Involved.age_group)
        .limit(50)
    )

    result = query.all()
    return result

This issue needs further planning before implementation, but it will add best practices into our work.

@atalyaalon atalyaalon added this to the Future milestone Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant