-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: SSAS ingestion #4
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
- Start Date: (2013-09-01) | ||
- RFC PR: [https://github.com/datahub-project/rfcs/pull/4](https://github.com/datahub-project/rfcs/pull/4) | ||
- Discussion Issue: (None) | ||
- Implementation PR(s): [https://github.com/datahub-project/datahub/pull/10286](https://github.com/datahub-project/datahub/pull/10286) | ||
|
||
# SSAS Ingestion Module | ||
|
||
## Summary | ||
|
||
Adding the functionality of ingesting MSSQL OLAP metadata into DataHub is to provide a more comprehensive view of the data landscape and enable better data discovery and analysis. | ||
The company I work for has developed an MVP ingestion module that caters to both tabular and multidimensional SSAS. We are considering contributing it to Datahub, but I have a couple of questions about the process. | ||
|
||
## Motivation | ||
|
||
By ingesting OLAP metadata from MSSQL, DataHub can provide users with a better understanding of the data stored in MSSQL OLAP cubes, including information about dimensions, hierarchies, measures, and calculations. | ||
|
||
Ingesting MSSQL OLAP metadata into DataHub can help improve data governance and data quality. Metadata can be used to build full data lineage, improve data discovery and analysis. By having a centralized view of the OLAP metadata, DataHub can help ensure that data is being used correctly and consistently across the organization. | ||
|
||
|
||
## Requirements | ||
|
||
- Ingestion metadata from SSAS Tabular models | ||
- Ingestion metadata from SSAS Multidimensional models | ||
|
||
|
||
### Extensibility | ||
|
||
- Build lineage to/from SSAS models | ||
|
||
## Detailed design | ||
|
||
General information about [OLAP cubes](https://learn.microsoft.com/en-us/system-center/scsm/olap-cubes-overview?view=sc-sm-2022). | ||
|
||
|
||
The interaction with SSAS (SQL Server Analysis Services) is carried out through [Microsoft's solution](https://learn.microsoft.com/en-us/analysis-services/instances/configure-http-access-to-analysis-services-on-iis-8-0?view=asallproducts-allversions). | ||
|
||
Arguments in favor of such a solution: | ||
- Cross-platform compatibility. | ||
- A single, standardized entry point for working with SSAS. | ||
|
||
|
||
General scheme. | ||
```mermaid | ||
graph LR; | ||
id1[DataHub]---id2[IIS web server]; | ||
id2[IIS web server]---id3[SSAS1]; | ||
id2[IIS web server]---id4[SSAS2]; | ||
``` | ||
Data exchange occurs using XMLA queries wrapped in HTTP. | ||
- For multidimensional SSAS servers, a [DISCOVER_XML_METADATA](https://learn.microsoft.com/en-us/openspecs/sql_server_protocols/ms-ssas/51647299-75c7-471d-896f-a691e4114b18) type query is used. | ||
- For tabular SSAS servers, [DMV](https://learn.microsoft.com/en-us/analysis-services/instances/use-dynamic-management-views-dmvs-to-monitor-analysis-services?view=asallproducts-allversions) (Dynamic Management View) queries are utilized. | ||
|
||
|
||
|
||
The following scheme was proposed for entity mapping: | ||
```mermaid | ||
graph TB; | ||
c1---b1; | ||
b1---a1; | ||
b1---a2; | ||
subgraph s1[Properties]; | ||
a1["Dimension"]; | ||
a2["Measure"]; | ||
end; | ||
subgraph s2[DataSet]; | ||
b1["Cube"]; | ||
end; | ||
subgraph s3[Container]; | ||
c1["Catalog(database)"]; | ||
end; | ||
``` | ||
- Server maps to a container. | ||
- Catalog maps to a container (and is hierarchically nested within the server container). | ||
- Cube is mapped as a dataset. | ||
- Dimension and measure become properties of the dataset. | ||
## How we teach this | ||
|
||
We should create/update user guides to educate users for: | ||
- Search & discovery experience (how to find a SSAS models in DataHub) | ||
- Lineage experience (how to find different entities connected to the SSAS models) | ||
|
||
## Rollout / Adoption Strategy | ||
|
||
If it will be standalone module only who want will use it. So we no need any migration tools and braking changes. | ||
|
||
## Future Work | ||
|
||
Establish a complete data lineage from the data source to the analytical models. | ||
|
||
## Unresolved questions | ||
|
||
- It would be better to create this module as a standalone, focusing solely on SSAS, or should it be integrated into the existing Mssql module? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It probably makes sense for this to be a separate ingestion source, standalone from the existing mssql module. |
||
- Is it relevant to add SSAS entities (catalog, cube, dimension, measure) to the DataHub? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The mapping you described above (catalog -> container, etc) should be fine. For dimensions and measures, we can model those as schema fields with tags of "Dimension" or "Measure". We already do something similar for Looker. Eventually we want to add dimensions/measures as more first class things within datahub, and can migrate accordingly when the time comes. |
||
- Does the proposed communication method with SSAS align with the project's needs? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes - the ingestion source can connect to SSAS and push metadata into datahub. |
||
- Does the proposed entity mapping approach for SSAS entities suit the project's requirements? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes - see my comment above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how important data lineage is to the discovery process, I wonder if this should be in scope for the initial version instead of left as future work.