Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: SSAS ingestion #4

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions active/4-SSAS-ingestion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
- Start Date: (2013-09-01)
- RFC PR: [https://github.com/datahub-project/rfcs/pull/4](https://github.com/datahub-project/rfcs/pull/4)
- Discussion Issue: (None)
- Implementation PR(s): [https://github.com/datahub-project/datahub/pull/10286](https://github.com/datahub-project/datahub/pull/10286)

# SSAS Ingestion Module

## Summary

Adding the functionality of ingesting MSSQL OLAP metadata into DataHub is to provide a more comprehensive view of the data landscape and enable better data discovery and analysis.
The company I work for has developed an MVP ingestion module that caters to both tabular and multidimensional SSAS. We are considering contributing it to Datahub, but I have a couple of questions about the process.

## Motivation

By ingesting OLAP metadata from MSSQL, DataHub can provide users with a better understanding of the data stored in MSSQL OLAP cubes, including information about dimensions, hierarchies, measures, and calculations.

Ingesting MSSQL OLAP metadata into DataHub can help improve data governance and data quality. Metadata can be used to build full data lineage, improve data discovery and analysis. By having a centralized view of the OLAP metadata, DataHub can help ensure that data is being used correctly and consistently across the organization.


## Requirements

- Ingestion metadata from SSAS Tabular models
- Ingestion metadata from SSAS Multidimensional models


### Extensibility

- Build lineage to/from SSAS models

## Detailed design

General information about [OLAP cubes](https://learn.microsoft.com/en-us/system-center/scsm/olap-cubes-overview?view=sc-sm-2022).


The interaction with SSAS (SQL Server Analysis Services) is carried out through [Microsoft's solution](https://learn.microsoft.com/en-us/analysis-services/instances/configure-http-access-to-analysis-services-on-iis-8-0?view=asallproducts-allversions).

Arguments in favor of such a solution:
- Cross-platform compatibility.
- A single, standardized entry point for working with SSAS.


General scheme.
```mermaid
graph LR;
id1[DataHub]---id2[IIS web server];
id2[IIS web server]---id3[SSAS1];
id2[IIS web server]---id4[SSAS2];
```
Data exchange occurs using XMLA queries wrapped in HTTP.
- For multidimensional SSAS servers, a [DISCOVER_XML_METADATA](https://learn.microsoft.com/en-us/openspecs/sql_server_protocols/ms-ssas/51647299-75c7-471d-896f-a691e4114b18) type query is used.
- For tabular SSAS servers, [DMV](https://learn.microsoft.com/en-us/analysis-services/instances/use-dynamic-management-views-dmvs-to-monitor-analysis-services?view=asallproducts-allversions) (Dynamic Management View) queries are utilized.



The following scheme was proposed for entity mapping:
```mermaid
graph TB;
c1---b1;
b1---a1;
b1---a2;
subgraph s1[Properties];
a1["Dimension"];
a2["Measure"];
end;
subgraph s2[DataSet];
b1["Cube"];
end;
subgraph s3[Container];
c1["Catalog(database)"];
end;
```
- Server maps to a container.
- Catalog maps to a container (and is hierarchically nested within the server container).
- Cube is mapped as a dataset.
- Dimension and measure become properties of the dataset.
## How we teach this

We should create/update user guides to educate users for:
- Search & discovery experience (how to find a SSAS models in DataHub)
- Lineage experience (how to find different entities connected to the SSAS models)

## Rollout / Adoption Strategy

If it will be standalone module only who want will use it. So we no need any migration tools and braking changes.

## Future Work

Establish a complete data lineage from the data source to the analytical models.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how important data lineage is to the discovery process, I wonder if this should be in scope for the initial version instead of left as future work.


## Unresolved questions

- It would be better to create this module as a standalone, focusing solely on SSAS, or should it be integrated into the existing Mssql module?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense for this to be a separate ingestion source, standalone from the existing mssql module.

- Is it relevant to add SSAS entities (catalog, cube, dimension, measure) to the DataHub?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping you described above (catalog -> container, etc) should be fine. For dimensions and measures, we can model those as schema fields with tags of "Dimension" or "Measure". We already do something similar for Looker.

Eventually we want to add dimensions/measures as more first class things within datahub, and can migrate accordingly when the time comes.

- Does the proposed communication method with SSAS align with the project's needs?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - the ingestion source can connect to SSAS and push metadata into datahub.

- Does the proposed entity mapping approach for SSAS entities suit the project's requirements?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - see my comment above