Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion): [WIP] Adding SSAS as a New Source for Data Ingestion #10286

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DmytroYurchuk
Copy link

@DmytroYurchuk DmytroYurchuk commented Apr 15, 2024

[WIP] Adding SSAS as a New Source for Data Ingestion

Description:

This pull request introduces support for ingesting data from SSAS (SQL Server Analysis Services) into the DataHub platform. While the code is functional and currently being used within our organization, it's important to note that it lacks tests and some additional features due to time constraints and shifting priorities within our team.

Changes:

  • New Source Connector: Added a new connector specifically designed to interface with SSAS instances, enabling seamless ingestion of data from SSAS into DataHub.

  • Configuration Options: Included configuration options in the connector to specify SSAS server details, database name, cube name, and other relevant parameters necessary for establishing a connection.

Notes:

  • While this pull request is a work in progress and may not meet all quality standards, the code is functional and currently in use within our organization.

  • Due to time constraints and changing priorities, we haven't been able to complete the testing and add all planned features. However, we believe that this code may still be valuable to others in the community.

  • We encourage interested contributors to utilize this code as a starting point, and we welcome any improvements, additions, or feedback that the community may have.

Additional Notes:

  • Feedback and contributions are welcome to further enhance the SSAS connector and its integration with DataHub.

  • We hope that by open-sourcing this code, it may benefit others who are looking to integrate SSAS with their data workflows.

RFC Link:
Link to RFC #4

Recipe Example:

source:
  type: ssas_multidimension
  config:
    username: user_name
    password: ***
    host_port: localhost:81
    server_alias: localhost
    virtual_directory_name: ssas
    instance: localhost
    use_https: false
    ssas_instance: ssas001
    ssas_instance_auth_type: HTTPBasicAuth
    dns_suffixes:
      - local.lan

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Summary by CodeRabbit

  • New Features

    • Introduced support for SQL Server Analysis Services (SSAS) integration, enhancing data ingestion capabilities.
    • Added new dependency group ssas with necessary libraries for SSAS functionality.
    • Implemented multiple classes for XMLA communication and metadata retrieval from both multidimensional and tabular SSAS models.
  • Bug Fixes

    • Improved error handling for XMLA server responses.
  • Documentation

    • Updated documentation to include new modules and classes for SSAS integration.
  • Chores

    • Added utility classes for DNS resolution and XMLA query handling.

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Apr 15, 2024
@hsheth2
Copy link
Collaborator

hsheth2 commented Jun 12, 2024

I see this is marked as WIP, so we haven't taken a look at it yet. @DmytroYurchuk is this ready for review? Is there anything we can do to help with this?

@hsheth2 hsheth2 added the pending-submitter-response Issue/request has been reviewed but requires a response from the submitter label Aug 8, 2024
Copy link
Contributor

coderabbitai bot commented Aug 8, 2024

Walkthrough

The recent changes enhance the data ingestion capabilities for SQL Server Analysis Services (SSAS) by introducing new modules and updating existing ones. Key features include structured handling of XMLA responses, robust configurations for SSAS connections, and improved data models for multidimensional and tabular data. These modifications significantly streamline metadata retrieval and integration processes, fostering better interaction with SSAS environments.

Changes

File(s) Change Summary
setup.py Added new dependencies for SSAS integration and defined entry points for multidimensional and tabular data sources.
src/datahub/ingestion/source/ssas/api.py Introduced classes for handling XMLA communication with SSAS, establishing interfaces for SSAS API implementations.
src/datahub/ingestion/source/ssas/config.py Created configuration classes to manage SSAS connection parameters, enhancing connection management and validation.
src/datahub/ingestion/source/ssas/domains.py Defined data classes to model various SSAS components within the ingestion framework, facilitating structured data handling and representation.
src/datahub/ingestion/source/ssas/parser.py Introduced MdXmlaParser class for parsing XMLA data from multidimensional servers, enhancing data processing capabilities.
src/datahub/ingestion/source/ssas/ssas_core.py Implemented core classes for metadata ingestion from SSAS, managing metadata changes and work unit generation.
src/datahub/ingestion/source/ssas/ssas_multidimension/* Developed modules for multidimensional SSAS integration, including APIs and domain models for managing cube metadata.
src/datahub/ingestion/source/ssas/ssas_tabular/* Established modules for tabular SSAS integration, providing DAO layers and domain models for fetching and managing tabular data metadata.
src/datahub/ingestion/source/ssas/tools.py Added reusable tools for XMLA query handling and structured query definitions for tabular models.
src/datahub/ingestion/source/ssas/utils.py Introduced classes for DNS resolution, enhancing hostname management and connectivity with DNS suffix handling.
src/datahub/ingestion/source/ssas/xmla_server_response_error.py Created a custom exception class for handling XMLA server response errors, improving error management during SSAS interactions.
src/datahub/ingestion/source/ssas/xmlaclient.py Implemented an XMLA client to facilitate communication with SSAS servers, handling request construction and response parsing effectively.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant SSASClient
    participant XmlaClient
    participant XmlaResponse
    participant Parser

    User->>SSASClient: Request metadata
    SSASClient->>XmlaClient: Send XMLA request
    XmlaClient->>XmlaResponse: Process response
    XmlaResponse->>Parser: Parse XMLA data
    Parser-->>SSASClient: Return structured data
    SSASClient-->>User: Provide metadata
Loading

🐇 In the fields, I hop with glee,
New features bloom, just wait and see!
With data streams and queries bright,
SSAS shines, oh what a sight!
From cubes and tables, we will play,
Ingestion magic, hip-hip-hooray! 🌼✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 28

Outside diff range, codebase verification and nitpick comments (63)
metadata-ingestion/src/datahub/ingestion/source/ssas/xmla_server_response_error.py (1)

3-5: Clarify the docstring for XMLAServerResponseError.

The docstring could be more descriptive to clarify the purpose of the exception. Consider specifying what kind of errors this exception is intended to handle.

-    Any ErrorCodes occur in XMLA ssas server response.
+    Exception raised for errors in the XMLA SSAS server response.
+    
+    This exception is used to indicate that the server response contains error codes.
metadata-ingestion/src/datahub/ingestion/source/ssas/tools.py (1)

2-2: Typographical error in module docstring.

The word "Modul" should be corrected to "Module."

- Modul for reusable tools.
+ Module for reusable tools.
metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_multidimension/api.py (2)

2-2: Typographical error in module docstring.

The word "dao" should be capitalized as "DAO" to reflect the acronym for Data Access Object.

- Module for dao layer of multidimension ms ssas.
+ Module for DAO layer of multidimension MS SSAS.

46-53: Clarify the return type of auth_credentials.

The docstring indicates a return type of authorization dataclass, but the method returns an instance of HTTPKerberosAuth. Ensure that the docstring accurately reflects the return type.

- :return: authorization dataclass.
+ :return: HTTPKerberosAuth instance.
metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_tabular/tools.py (1)

2-2: Typographical error in module docstring.

The word "contain" should be corrected to "contains."

- Module contain DVM queries for tabular SSAS
+ Module contains DVM queries for tabular SSAS
metadata-ingestion/src/datahub/ingestion/source/ssas/config.py (2)

15-16: Clarify parameter descriptions.

The description for username is duplicated, and the instance parameter description is unclear. Consider revising for clarity.

-    username - Active Directory user login
-    username - Active Directory user password
+    username - Active Directory user login
+    password - Active Directory user password
-    instance -  not used ???
+    instance - SSAS instance name (clarify usage or remove if not used)

44-47: Improve error message clarity.

Consider providing a more descriptive error message to indicate the valid options for ssas_instance_auth_type.

-    raise ValueError("Support only HTTPBasicAuth or HTTPKerberosAuth auth type")
+    raise ValueError("Invalid ssas_instance_auth_type. Supported values are 'HTTPBasicAuth' and 'HTTPKerberosAuth'.")
metadata-ingestion/src/datahub/ingestion/source/ssas/utils.py (1)

126-127: Improve exception handling specificity.

Consider catching specific exceptions rather than a generic Exception to improve error handling clarity.

except socket.gaierror as excp:
    raise excp
metadata-ingestion/src/datahub/ingestion/source/ssas/xmlaclient.py (1)

155-157: Enhance error handling by logging request details.

Consider logging more detailed request information in the error handling block to aid debugging.

logger.error(f"Error occurred during sending request to {self.cfg.base_api_url} with data '{data}': {excp}")
metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_tabular/domains.py (6)

12-38: Ensure docstrings provide complete information.

The class XMLACube has a brief docstring. Consider expanding it to include more details about the class's purpose and usage.

- """Class for represent OLAP cube"""
+ """Represents an OLAP cube in SSAS, containing metadata such as creation date, last schema update, and description."""

41-59: Ensure docstrings provide complete information.

The class XMLADimension has a brief docstring. Consider expanding it to include more details about the class's purpose and usage.

- """Class for represent dimension in OLAP cube"""
+ """Represents a dimension in an OLAP cube, containing metadata such as name, type, and default hierarchy."""

77-95: Ensure docstrings provide complete information.

The class XMLAMeasure has a brief docstring. Consider expanding it to include more details about the class's purpose and usage.

- """Class representation of xmla measure."""
+ """Represents a measure in an OLAP cube, containing metadata such as name, expression, and description."""

125-147: Ensure docstrings provide complete information.

The class XMLADataSource has a brief docstring. Consider expanding it to include more details about the class's purpose and usage.

- """Class representation of xmla datasource."""
+ """Represents a data source in SSAS, containing metadata such as name, ID, and connection string."""

150-174: Ensure docstrings provide complete information.

The class XMLADataBase has a brief docstring. Consider expanding it to include more details about the class's purpose and usage.

- """Class representation of xmla database."""
+ """Represents a database in SSAS, containing metadata such as name, description, and compatibility level."""

176-183: Ensure docstrings provide complete information.

The class XMLAServer has a brief docstring. Consider expanding it to include more details about the class's purpose and usage.

- """Class representation of xmla server."""
+ """Represents a server in SSAS, containing metadata such as name, ID, and version."""
metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_core.py (6)

27-28: Consider configuring the logger using a configuration file.

While setting the logger level to DEBUG is useful for development, consider using a configuration file or environment variables to control logging levels in production.

# Consider configuring the logger using a configuration file for production use.
LOGGER.setLevel(logging.DEBUG)

69-79: Consider renaming __to_work_unit for clarity.

The method name __to_work_unit could be more descriptive. Consider renaming it to convert_to_work_unit or similar.

- def __to_work_unit(
+ def convert_to_work_unit(

81-122: Ensure logging messages are clear and informative.

The debug logging messages in construct_set_workunits could be more descriptive to aid in troubleshooting.

- LOGGER.debug(f"as_dataplatform_data methadata: {wu.get_metadata()}")
+ LOGGER.debug(f"Generated MetadataWorkUnit for data platform: {wu.get_metadata()}")

- LOGGER.debug(f"as_dataset_properties_data methadata: {wu.get_metadata()}")
+ LOGGER.debug(f"Generated MetadataWorkUnit for dataset properties: {wu.get_metadata()}")

- LOGGER.debug(f"as_upstream_lineage_aspect_data methadata: {wu.get_metadata()}")
+ LOGGER.debug(f"Generated MetadataWorkUnit for upstream lineage aspect: {wu.get_metadata()}")

125-135: Ensure docstrings provide complete information.

The class SsasSourceReport has a brief docstring. Consider expanding it to include more details about the class's purpose and usage.

- """Class for source report"""
+ """Tracks the number of scanned and dropped reports during SSAS source ingestion."""

137-141: Ensure docstrings provide complete information.

The class SsasSource has a brief docstring. Consider expanding it to include more details about the class's purpose and usage.

- """Class build datahub entities from tabular SSAS"""
+ """Ingests metadata from SSAS and builds DataHub entities."""

178-183: Ensure docstrings provide complete information.

The method gen_key lacks a docstring. Add a docstring to describe its purpose and usage.

def gen_key(self, name):
    """
    Generate a container key for SSAS using the provided name.
    :param name: The name to use for generating the key.
    :return: An SSASContainerKey object.
    """
metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_tabular/ssas_tabular.py (9)

30-31: Consider configuring the logger using a configuration file.

While setting the logger level to INFO is useful for development, consider using a configuration file or environment variables to control logging levels in production.

# Consider configuring the logger using a configuration file for production use.
logging.basicConfig(level=logging.INFO)

34-46: Ensure docstrings provide complete information.

The class SsasSourceReport has a brief docstring. Consider expanding it to include more details about the class's purpose and usage.

- """Class for source report"""
+ """Tracks the number of scanned and dropped reports during SSAS tabular source ingestion."""

48-51: Ensure docstrings provide complete information.

The class SsasTabularSource has a brief docstring. Consider expanding it to include more details about the class's purpose and usage.

- """Class build datahub entities from tabular SSAS"""
+ """Ingests metadata from SSAS tabular models and builds DataHub entities."""

59-71: Ensure docstrings provide complete information.

The method _get_catalog has a brief docstring. Consider expanding it to include more details about the method's purpose and usage.

- """Build datahub catalog entity."""
+ """Builds a DataHub catalog entity from an XMLA database representation."""

73-91: Ensure docstrings provide complete information.

The method _get_olap_cube has a brief docstring. Consider expanding it to include more details about the method's purpose and usage.

- """Build datahub cube entity."""
+ """Builds a DataHub cube entity and its lineage stream from an XMLA cube representation."""

92-106: Ensure docstrings provide complete information.

The method get_dimensions has a brief docstring. Consider expanding it to include more details about the method's purpose and usage.

- """Build list dimensions entities."""
+ """Retrieves dimensions for a given cube and database, returning them as a list of XMLADimension entities."""

107-119: Ensure docstrings provide complete information.

The method get_measures has a brief docstring. Consider expanding it to include more details about the method's purpose and usage.

- """Build list measures entities."""
+ """Retrieves measures for a given cube and database, returning them as a list of XMLAMeasure entities."""

120-128: Ensure docstrings provide complete information.

The method get_cubes has a brief docstring. Consider expanding it to include more details about the method's purpose and usage.

- """Build list OLAP cubes entities."""
+ """Retrieves OLAP cubes for a given database, returning them as a list of XMLACube entities."""

130-138: Ensure docstrings provide complete information.

The method get_databases has a brief docstring. Consider expanding it to include more details about the method's purpose and usage.

- """Build list SSAS catalogs entities."""
+ """Retrieves databases (catalogs) from the SSAS server, returning them as a list of XMLADataBase entities."""
metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_tabular/api.py (8)

44-56: Clarify the method description in get_cubes_by_catalog.

The docstring states "Get list catalogs" which seems incorrect. Update it to accurately reflect that it retrieves cubes for a given catalog.

-        Get  list catalogs from dvm query.
+        Get list of cubes from DVM query for a given catalog.

95-103: Clarify the return type of auth_credentials.

The method auth_credentials returns the authentication object. Specify the return type in the docstring for better clarity.

-        :return: authorization dataclass.
+        :return: Union[HTTPBasicAuth, HTTPKerberosAuth].

223-231: Clarify the return type of auth_credentials.

The method auth_credentials returns the authentication object. Specify the return type in the docstring for better clarity.

-        :return: authorization dataclass.
+        :return: Union[HTTPBasicAuth, HTTPKerberosAuth].

70-82: Improve error handling in get_dimensions_additional_info.

The execute method in XmlaClient lacks explicit error handling, which means the get_dimensions_additional_info method should implement its own error handling to manage potential exceptions from executing DVM queries. Consider adding try-except blocks to handle exceptions gracefully.

  • File: metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_tabular/api.py
  • Method: get_dimensions_additional_info
Analysis chain

Improve error handling in get_dimensions_additional_info.

Similar to other methods, ensure that this method handles exceptions and errors gracefully when executing DVM queries.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify error handling for `get_dimensions_additional_info`.

# Test: Search for error handling logic in the `execute` method of XmlaClient.
rg --type python 'def execute' -A 10

Length of output: 69


Script:

#!/bin/bash
# Description: Corrected script to verify error handling for `get_dimensions_additional_info`.

# Test: Search for the `execute` method in Python files and examine its implementation.
rg 'def execute' -A 20 --glob '*.py'

Length of output: 17954


83-94: Enhance error handling in get_measures_by_cube.

The execute method in the XmlaClient class does not currently implement error handling for the XMLA requests. Therefore, it is advisable to enhance error handling in the get_measures_by_cube method to manage potential issues during query execution.

  • Consider adding try-except blocks to handle exceptions from the execute method.
  • Ensure that meaningful error messages are logged or returned to aid in debugging.
Analysis chain

Enhance error handling in get_measures_by_cube.

Ensure that the method handles errors in the response from the DVM query execution.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify error handling for `get_measures_by_cube`.

# Test: Search for error handling logic in the `execute` method of XmlaClient.
rg --type python 'def execute' -A 10

Length of output: 69


Script:

#!/bin/bash
# Corrected script to find the `execute` method in Python files and check for error handling.

# Locate the XmlaClient class definition to ensure we are searching in the right context.
rg --type py 'class XmlaClient' -A 10

# Search for the `execute` method within XmlaClient to verify error handling.
rg --type py 'def execute' -A 20

Length of output: 18987


35-43: Improve Error Handling in find_node Method

The find_node method in xmlaclient.py uses a generic exception handler that may obscure errors during execution. Consider implementing specific error handling or logging to ensure that issues are properly captured and addressed. This will enhance the robustness of the get_catalogs method by providing better insights into potential failures.

  • File: metadata-ingestion/src/datahub/ingestion/source/ssas/xmlaclient.py
  • Method: find_node
Analysis chain

Verify the response handling in get_catalogs.

The method get_catalogs retrieves catalogs using a DVM query. Ensure that the response.get_node() method correctly handles the response, especially in cases of errors or empty results.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify response handling for `get_catalogs`.

# Test: Search for response handling logic in the `get_node` method.
rg --type python 'def get_node' -A 10

Length of output: 70


Script:

#!/bin/bash
# Description: Verify response handling for `get_catalogs`.

# Test: Search for response handling logic in the `get_node` method.
rg --type py 'def get_node' -A 10

Length of output: 1108


Script:

#!/bin/bash
# Description: Gather more details on the `find_node` method implementation.

# Test: Search for the complete implementation of the `find_node` method.
rg --type py 'def find_node' -A 20

Length of output: 2179


57-69: Add Error Handling in get_dimensions_by_cube Method

The get_dimensions_by_cube method currently lacks robust error handling, which could lead to unhandled exceptions from network issues or invalid responses. Consider implementing try-except blocks to manage potential parsing errors and invalid responses.

  • File: metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_tabular/api.py
  • Method: get_dimensions_by_cube
Analysis chain

Ensure robust error handling in get_dimensions_by_cube.

The method fetches dimensions using a DVM query. Ensure it handles potential errors in the response, such as network issues or invalid queries.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify error handling for `get_dimensions_by_cube`.

# Test: Search for error handling logic in the `execute` method of XmlaClient.
rg --type python 'def execute' -A 10

Length of output: 69


Script:

#!/bin/bash
# Correct the file type and search for the `execute` method in Python files.
rg 'def execute' -t py -A 20

Length of output: 17946


Script:

#!/bin/bash
# Check for error handling in XmlaResponse and self.__request in xmlaclient.py
rg 'class XmlaResponse' -A 20

# Check for error handling in self.__request method
rg 'def __request' -A 20

# Verify if get_dimensions_by_cube has any error handling
rg 'def get_dimensions_by_cube' -A 20

Length of output: 8279


143-152: Enhance error handling and testing for get_server method

The get_server method in MdXmlaParser lacks explicit error handling, which could lead to unhandled exceptions if the XMLA input is malformed or unexpected. Additionally, there are no test cases covering this method.

  • Consider adding try-except blocks in the get_server method to handle potential parsing errors.
  • Implement test cases to verify the method's robustness against different XMLA inputs and error scenarios.
Analysis chain

Verify the data parsing in get_server.

The method parses server metadata using MdXmlaParser. Ensure that the parsing logic correctly handles different server metadata formats and potential errors.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify data parsing in `get_server`.

# Test: Search for parsing logic in `MdXmlaParser.get_server`.
rg --type python 'def get_server' -A 10

Length of output: 72


Script:

#!/bin/bash
# Re-run the search for the `get_server` method in the `MdXmlaParser` class without specifying the file type.
rg 'def get_server' -A 10

Length of output: 13255


Script:

#!/bin/bash
# Examine the implementation of the `get_server` method in `MdXmlaParser` for error handling and parsing logic.
rg 'def get_server' -A 20 metadata-ingestion/src/datahub/ingestion/source/ssas/parser.py

# Check for any test cases related to `get_server` in the test directory to understand its robustness.
rg 'get_server' metadata-ingestion/tests/

Length of output: 671

metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_multidimension/ssas_multidimension.py (5)

65-77: Clarify the method description in _get_catalog.

The method constructs a Catalog object. Ensure that the parameters and return type in the docstring accurately describe the method's functionality.

-        :param database: database representation from xmla response.
+        :param database: XMLADataBase object representing the database.

143-151: Clarify the method description in get_dimensions.

The method returns dimensions for a cube. Ensure that the parameters and return type in the docstring accurately describe the method's functionality.

-        :param cube: cube entity.
+        :param cube: XMLACube object representing the cube.

152-159: Clarify the method description in get_measures.

The method returns measures for a cube. Ensure that the parameters and return type in the docstring accurately describe the method's functionality.

-        :param cube: cube entity.
+        :param cube: XMLACube object representing the cube.

161-168: Clarify the method description in get_cubes.

The method returns cubes for a database. Ensure that the parameters and return type in the docstring accurately describe the method's functionality.

-        :param database: database entity.
+        :param database: XMLADataBase object representing the database.

170-177: Clarify the method description in get_databases.

The method returns available databases. Ensure that the parameters and return type in the docstring accurately describe the method's functionality.

-        :return: server databases representation.
+        :return: XMLADataBasesContainer object representing the server databases.
metadata-ingestion/src/datahub/ingestion/source/ssas/domains.py (20)

52-59: Clarify the return type of formatted_name.

The method returns a formatted catalog name. Ensure that the docstring accurately describes the return value.

-        :return: string name.
+        :return: formatted catalog name as a string.

61-68: Clarify the return type of formatted_instance.

The method returns a formatted catalog instance. Ensure that the docstring accurately describes the return value.

-        :return: string instance.
+        :return: formatted catalog instance as a string.

70-77: Clarify the return type of full_type.

The method returns the catalog type. Ensure that the docstring accurately describes the return value.

-        :return: string type.
+        :return: catalog type as a string.

79-86: Clarify the return type of orchestrator.

The method returns the catalog orchestrator. Ensure that the docstring accurately describes the return value.

-        :return: string orchestrator.
+        :return: catalog orchestrator as a string.

88-95: Clarify the return type of cluster.

The method returns the catalog cluster. Ensure that the docstring accurately describes the return value.

-        :return: string cluster.
+        :return: catalog cluster as a string.

113-119: Clarify the return type of get_instance.

The method returns the dependency instance. Ensure that the docstring accurately describes the return value.

-        :return: string instance.
+        :return: dependency instance as a string.

147-156: Clarify the return type of as_property.

The method returns dependencies as a dictionary of properties. Ensure that the docstring accurately describes the return value.

-        :return: dictionary of properties.
+        :return: dependencies as a dictionary of properties.

173-180: Clarify the return type of full_type.

The method returns the cube type. Ensure that the docstring accurately describes the return value.

-        :return: string type.
+        :return: cube type as a string.

182-189: Clarify the return type of formatted_name.

The method returns a formatted cube name. Ensure that the docstring accurately describes the return value.

-        :return: string name.
+        :return: formatted cube name as a string.

191-198: Clarify the return type of full_name.

The method returns the cube full name. Ensure that the docstring accurately describes the return value.

-        :return: string name.
+        :return: cube full name as a string.

215-223: Clarify the return type of name.

The method returns the dataset name. Ensure that the docstring accurately describes the return value.

-        :return: string name.
+        :return: dataset name as a string.

225-233: Clarify the parameter description in add_property.

The method adds a property to the dataset. Ensure that the parameter descriptions accurately describe their usage.

-        :param value: propery value
+        :param value: property value, which can be a string, float, or int.

235-245: Clarify the return type of data_platform.

The method returns the data platform. Ensure that the docstring accurately describes the return value.

-        :return: string dataplatform.
+        :return: data platform as a string.

247-255: Clarify the return type of dataplatform_urn.

The method returns the data platform URN. Ensure that the docstring accurately describes the return value.

-        :return: string dataplatform urn.
+        :return: data platform URN as a string.

257-263: Clarify the return type of urn.

The method returns the dataset URN. Ensure that the docstring accurately describes the return value.

-        :return: string urn.
+        :return: dataset URN as a string.

266-282: Clarify the return type of as_dataplatform_data.

The method returns data for the data platform instance aspect. Ensure that the docstring accurately describes the return value.

-        :return: data in dictionary.
+        :return: data for dataPlatformInstance aspect as a dictionary.

285-298: Clarify the return type of as_upstream_lineage_aspect_data.

The method returns data for the upstream lineage aspect. Ensure that the docstring accurately describes the return value.

-        :return: data in dictionary.
+        :return: data for upstreamLineage aspect as a dictionary.

301-314: Clarify the return type of as_dataset_properties_data.

The method returns data for the dataset properties aspect. Ensure that the docstring accurately describes the return value.

-        :return: data in dictionary.
+        :return: data for datasetProperties aspect as a dictionary.

341-350: Clarify the return type of type.

The method returns the connection type. Ensure that the docstring accurately describes the return value.

-        :return: string type name.
+        :return: connection type as a string.

352-360: Clarify the parameter description in from_tuple.

The method creates an instance from a tuple. Ensure that the parameter description accurately describes its usage.

-        :param data: data in tuple.
+        :param data: list of strings representing the connection fields.
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 771ab0d and 5aa7bdf.

Files selected for processing (17)
  • metadata-ingestion/setup.py (5 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/api.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/config.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/domains.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/parser.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_core.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_multidimension/api.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_multidimension/domains.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_multidimension/ssas_multidimension.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_tabular/api.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_tabular/domains.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_tabular/ssas_tabular.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_tabular/tools.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/tools.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/utils.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/xmla_server_response_error.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/ssas/xmlaclient.py (1 hunks)
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_multidimension/domains.py

584-587: Use contextlib.suppress(ValueError) instead of try-except-pass

Replace with contextlib.suppress(ValueError)

(SIM105)

Additional comments not posted (30)
metadata-ingestion/src/datahub/ingestion/source/ssas/tools.py (1)

7-48: Ensure XML templates are well-formed.

The XML templates for SOAP and XMLA requests appear well-formed. However, ensure that placeholders like {xmla_query} and {query} are correctly populated before sending requests.

metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_multidimension/api.py (2)

27-35: Verify authentication method handling.

The __get_auth method supports HTTPBasicAuth and HTTPKerberosAuth. Ensure that the ssas_instance_auth_type configuration is validated and defaults are handled appropriately.


37-44: Ensure proper error handling for XMLA API calls.

The get_server method calls get_server_metadata on __xmla_api. Ensure that any exceptions or errors during this call are properly handled and logged.

metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_tabular/tools.py (1)

7-61: Verify SQL query correctness and security.

The SQL queries in DvmQueries are structured for metadata retrieval. Ensure that these queries are safe from SQL injection and correctly handle any special characters in input parameters.

metadata-ingestion/src/datahub/ingestion/source/ssas/config.py (1)

64-68: LGTM!

The SsasServerHTTPSourceConfig class is well-structured and correctly extends SsasServerHTTPConfig.

metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_tabular/api.py (8)

27-31: Ensure proper error handling in the constructor.

The constructor initializes an XmlaClient with the provided configuration and authentication. Consider adding error handling to manage potential exceptions during initialization.


110-118: Ensure correct handling of response in get_catalog_sources.

The method fetches catalog sources using a DVM query. Ensure that the response.as_dict() method correctly processes the response, especially in cases of errors or empty results.


126-142: Ensure proper error handling in the constructor.

The constructor initializes multiple components, including authentication and API clients. Consider adding error handling to manage potential exceptions during initialization.


161-168: Clarify the method description in get_cubes_by_catalog.

The docstring accurately describes the method's functionality. Ensure that the conversion to XMLACube objects handles all expected data formats and potential errors.

Verification successful

Conversion to XMLACube objects is robust due to Pydantic validation.

The XMLACube class uses Pydantic's BaseModel, which ensures data validation and conversion are handled effectively. This setup should manage expected data formats and potential errors. No changes are needed for this aspect of the code.

  • metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_tabular/domains.py: XMLACube class definition confirms robust data handling.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify cube data handling in `get_cubes_by_catalog`.

# Test: Search for data handling logic in the `XMLACube` constructor.
rg --type python 'class XMLACube' -A 10

Length of output: 72


Script:

#!/bin/bash
# Description: Correctly search for the `XMLACube` class definition in Python files.

# Test: Locate the `XMLACube` class definition and inspect its constructor.
rg 'class XMLACube' --type py -A 10

Length of output: 3920


153-160: Ensure robust handling of catalog data in get_catalogs.

The method converts catalog data into XMLADataBase objects. Verify that the conversion handles all expected data formats and potential errors.


212-221: Ensure robust handling of measure data in get_measures_by_cube.

The method processes measures into XMLAMeasure objects. Verify that the conversion handles all expected data formats and potential errors.

Verification successful

Robust Data Handling in get_measures_by_cube Confirmed

The XMLAMeasure class includes optional fields and a validator for handling different data formats, ensuring robust data processing. The method get_measures_by_cube is well-equipped to handle various data scenarios.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify measure data handling in `get_measures_by_cube`.

# Test: Search for data handling logic in the `XMLAMeasure` constructor.
rg --type python 'class XMLAMeasure' -A 10

Length of output: 75


Script:

#!/bin/bash
# Corrected script to find the `XMLAMeasure` class definition and inspect its constructor.
rg 'class XMLAMeasure' -A 20

Length of output: 6951


190-210: Ensure robust handling of dimension data in get_dimensions_by_cube.

The method processes dimensions into XMLADimension objects. Verify that the conversion handles all expected data formats and potential errors.

Verification successful

Robust Handling of Dimension Data Confirmed

The XMLADimension class uses Pydantic's BaseModel to handle data conversion and validation, ensuring robust processing of dimension data in get_dimensions_by_cube. The use of optional fields and aliases provides flexibility and error handling capabilities. No further changes are needed.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify dimension data handling in `get_dimensions_by_cube`.

# Test: Search for data handling logic in the `XMLADimension` constructor.
rg --type python 'class XMLADimension' -A 10

Length of output: 77


Script:

#!/bin/bash
# Description: Corrected search for the `XMLADimension` class and its constructor.
rg 'class XMLADimension' -A 20

Length of output: 7118


233-240: Verify the data retrieval in get_server_info.

The method retrieves server information using XMLA API. Ensure that the retrieval and processing logic handles all expected data formats and potential errors.

metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_multidimension/ssas_multidimension.py (4)

56-61: Ensure proper error handling in the constructor.

The constructor initializes a MultidimensionSsasAPI client. Consider adding error handling to manage potential exceptions during initialization.


179-221: Ensure robust error handling in get_workunits.

The method generates metadata work units. Ensure that it handles exceptions and errors gracefully, especially when interacting with external systems.


62-64: Verify the configuration retrieval in _get_default_ssas_instance_from_config.

Ensure that the method correctly retrieves default SSAS instances from the configuration and handles cases where the server is not found.

Verification successful

Configuration retrieval logic in _get_default_ssas_instance_from_config is correct.

The method properly retrieves default SSAS instances from the configuration dictionary default_ssas_instances_by_server and handles cases where the server is not found by returning None. No issues were found.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify configuration retrieval in `_get_default_ssas_instance_from_config`.

# Test: Search for configuration retrieval logic in SsasServerHTTPSourceConfig.
rg --type python 'class SsasServerHTTPSourceConfig' -A 10

Length of output: 90


Script:

#!/bin/bash
# Description: Verify configuration retrieval in `_get_default_ssas_instance_from_config`.

# Corrected: Search for the definition of SsasServerHTTPSourceConfig and its attributes.
rg 'class SsasServerHTTPSourceConfig' -A 20 --type py

Length of output: 672


Script:

#!/bin/bash
# Search for the definition and usage of `default_ssas_instances_by_server` in the codebase.
rg 'default_ssas_instances_by_server' --type py

Length of output: 366


79-99: Ensure robust handling of cube data in _get_olap_cube.

The method constructs a Cube object and its lineage. Verify that the construction handles all expected data formats and potential errors.

Verification successful

Robust Handling of Cube Data Verified in _get_olap_cube

The _get_olap_cube method constructs a Cube object using straightforward attributes from source_config and a Catalog object. The data handling is robust as long as source_config is correctly configured. No complex data formats or error management are involved in the Cube class.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify cube data handling in `_get_olap_cube`.

# Test: Search for data handling logic in the `Cube` constructor.
rg --type python 'class Cube' -A 10

Length of output: 68


Script:

#!/bin/bash
# Correcting the file type for ripgrep and searching for the Cube class definition and constructor.
rg 'class Cube' -A 20

Length of output: 3463


Script:

#!/bin/bash
# Verify the definitions and types of self.source_config and catalog used in _get_olap_cube.
# Check for the source_config definition.
rg 'self\.source_config' -A 5

# Check for the _get_catalog method to understand what catalog is.
rg 'def _get_catalog' -A 10

Length of output: 166649

metadata-ingestion/src/datahub/ingestion/source/ssas/domains.py (3)

30-37: Ensure the as_dict method handles all fields.

The as_dict method uses asdict to convert the dataclass to a dictionary. Verify that all fields are correctly included and handled.


130-145: Ensure the as_datasets_urn_list method handles all cases.

The method converts dependencies to a list of dataset URNs. Verify that it correctly handles all dependency types and formats.


317-328: Ensure all fields in Datasource are correctly defined.

The class defines fields for datasource information. Verify that all necessary fields are included and correctly typed.

metadata-ingestion/src/datahub/ingestion/source/ssas/ssas_multidimension/domains.py (6)

139-170: LGTM!

The XMLADimensionsContainer class is well-structured and correctly implements data aggregation and validation.


308-317: LGTM!

The XMLASchemaBinding class is straightforward and well-structured.


355-374: LGTM!

The XMLDataSourceViewsContainer class is well-structured and correctly implements data validation.


411-452: LGTM!

The XMLADataSourcesContainer class is well-structured and efficiently implements data retrieval by ID.


455-488: LGTM!

The XMLACube class is well-structured and correctly implements data aggregation and additional information properties.


491-510: LGTM!

The XMLACubesContainer class is well-structured and correctly implements data validation.

metadata-ingestion/setup.py (4)

266-272: LGTM!

The SSAS dependencies are appropriate for handling SSAS data ingestion and are correctly specified.


651-652: LGTM!

The entry points for ssas_multidimension and ssas_tabular are correctly defined and linked to the appropriate source classes.


550-550: LGTM!

Including SSAS in the development requirements is appropriate for testing and development purposes.


593-593: Verify if SSAS should be included in full test development requirements.

SSAS is commented out in the full test development requirements. Ensure this is intentional and aligns with the current state of implementation.

Comment on lines +54 to +61
class HostDefaults(str, Enum):
"""
Host default values.
"""

DOMAIN: str = ""
SERVER: str = ""
NAME: str = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider providing more descriptive default values.

The HostDefaults enum currently contains empty strings for most defaults. Consider providing more descriptive or meaningful defaults if applicable.

Comment on lines +15 to +23
def get_server(self, xmla_str: str) -> Dict[str, Any]:
"""
Get server data from xmla structure.

:param xmla_str: string xmla data.
:return: server data in dictionary.
"""
bs_content = bs(xmla_str, "xml")
return xmltodict.parse(str(bs_content.find("Server")))["Server"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for XML parsing.

The get_server method lacks error handling for cases where the XML structure might not be as expected. Consider adding try-except blocks to handle potential parsing errors gracefully.

def get_server(self, xmla_str: str) -> Dict[str, Any]:
    """
    Get server data from xmla structure.

    :param xmla_str: string xmla data.
    :return: server data in dictionary.
    """
    try:
        bs_content = bs(xmla_str, "xml")
        server_data = xmltodict.parse(str(bs_content.find("Server")))["Server"]
    except (xmltodict.expat.ExpatError, TypeError, KeyError) as e:
        raise XMLAServerResponseError("Failed to parse XMLA server data") from e

    return server_data

Comment on lines +33 to +36
def __init__(self, config: SsasServerHTTPConfig, auth: Union[HTTPKerberosAuth, HTTPBasicAuth]):
self.__config = config
self.__auth = auth
self.__client = XmlaClient(config=config, auth=self.__auth)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for network requests in SsasXmlaAPI.

The constructor initializes a client for making network requests. Consider adding error handling for potential network-related issues, such as connection timeouts or authentication failures.

def __init__(self, config: SsasServerHTTPConfig, auth: Union[HTTPKerberosAuth, HTTPBasicAuth]):
    self.__config = config
    self.__auth = auth
    try:
        self.__client = XmlaClient(config=config, auth=self.__auth)
    except SomeNetworkException as e:
        raise XMLAServerResponseError("Failed to initialize XMLA client") from e

Comment on lines +38 to +47
def get_server_info(self) -> Dict[str, Any]:
"""
Extract server metadata info from response
"""

server_data_xml = xmltodict.parse(self.get_server_metadata())

return server_data_xml["soap:Envelope"]["soap:Body"]["DiscoverResponse"][
"return"
]["root"]["row"]["xars:METADATA"]["Server"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for XML parsing in get_server_info.

The get_server_info method parses XML data without handling potential parsing errors. Consider adding try-except blocks to handle these errors gracefully.

def get_server_info(self) -> Dict[str, Any]:
    """
    Extract server metadata info from response
    """
    try:
        server_data_xml = xmltodict.parse(self.get_server_metadata())
        return server_data_xml["soap:Envelope"]["soap:Body"]["DiscoverResponse"][
            "return"
        ]["root"]["row"]["xars:METADATA"]["Server"]
    except (xmltodict.expat.ExpatError, KeyError) as e:
        raise XMLAServerResponseError("Failed to parse server metadata") from e

Comment on lines +49 to +54
def get_server_metadata(self) -> str:
"""
Get ssas server metadata
"""

return str(self.__client.discover(query=MsXmlaTemplates.QUERY_METADATA))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for network requests in get_server_metadata.

The get_server_metadata method makes a network request without handling potential network-related issues. Consider adding error handling for these cases.

def get_server_metadata(self) -> str:
    """
    Get ssas server metadata
    """
    try:
        return str(self.__client.discover(query=MsXmlaTemplates.QUERY_METADATA))
    except SomeNetworkException as e:
        raise XMLAServerResponseError("Failed to retrieve server metadata") from e

Comment on lines +338 to +352
@validator("sources", pre=True)
def validate_sources(cls, value):
"""
Validate datasources value.
Extract data from xmla structure.
Convert to list if it is not list instance.

:return: list type value.
"""
res = value["xs:schema"]["xs:element"]["xs:complexType"]["xs:choice"][
"xs:element"
]
if isinstance(res, dict):
return [res]
return res
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify the validate_sources method.

The validate_sources method can be simplified by using a helper function to handle the extraction logic, improving readability.

def extract_sources(value):
    res = value["xs:schema"]["xs:element"]["xs:complexType"]["xs:choice"]["xs:element"]
    return [res] if isinstance(res, dict) else res

@validator("sources", pre=True)
def validate_sources(cls, value):
    return extract_sources(value)

Comment on lines +396 to +408
@property
def connection(self) -> DSConnection:
"""
Get connection representation.

:return: connection of datasource.
"""
return DSConnection.from_tuple(
[
item.split("=")[1]
for item in self.connection_string.replace("\n", "").split(";")
]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve connection extraction logic.

The connection extraction logic in the connection property can be improved for readability by using a list comprehension.

@property
def connection(self) -> DSConnection:
    return DSConnection.from_tuple(
        [
            item.split("=")[1]
            for item in self.connection_string.replace("\n", "").split(";")
            if "=" in item
        ]
    )

Comment on lines +32 to +33
def get_server(self):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement the get_server method.

The get_server method is currently a placeholder. Implement this method to return server metadata or remove it if not needed.

Comment on lines +169 to +189
def add_dimension_additional_info(
self, catalog_name: str, dimension: XMLADimension
):
"""
Add additional info to dimension.

:return: dimension.
"""
dimension_name = dimension.name
if dimension_name is None:
return dimension

info = self.__dvm_api.get_dimensions_additional_info(
dimension_name=dimension_name, catalog_name=catalog_name
)

for item in info:
dimension.query_definition = item["QueryDefinition"]

return dimension

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize the add_dimension_additional_info method.

The method iterates over additional info to update dimensions. Consider optimizing this logic to handle large datasets more efficiently.

-        for item in info:
-            dimension.query_definition = item["QueryDefinition"]
+        dimension.query_definition = next((item["QueryDefinition"] for item in info), dimension.query_definition)
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def add_dimension_additional_info(
self, catalog_name: str, dimension: XMLADimension
):
"""
Add additional info to dimension.
:return: dimension.
"""
dimension_name = dimension.name
if dimension_name is None:
return dimension
info = self.__dvm_api.get_dimensions_additional_info(
dimension_name=dimension_name, catalog_name=catalog_name
)
for item in info:
dimension.query_definition = item["QueryDefinition"]
return dimension
def add_dimension_additional_info(
self, catalog_name: str, dimension: XMLADimension
):
"""
Add additional info to dimension.
:return: dimension.
"""
dimension_name = dimension.name
if dimension_name is None:
return dimension
info = self.__dvm_api.get_dimensions_additional_info(
dimension_name=dimension_name, catalog_name=catalog_name
)
dimension.query_definition = next((item["QueryDefinition"] for item in info), dimension.query_definition)
return dimension

Comment on lines +101 to +141
def _get_cube_dependency(
self, cube: XMLACube, catalog_sources: List[DataSource]
) -> OLAPLineageStream:
"""
Build cube lineage entity.

:param cube: cube representation from xmla response.
:param catalog_sources: list of catalog data sources.
:return: datahub lineage entity.
"""
upstream_dependencies = []
cube_sources_ids = cube.sources_ids
cube_sources = [
source
for source in catalog_sources
if source.name in cube_sources_ids or source.id in cube_sources_ids
]

for dependency in cube_sources:

server = dependency.server

if self.source_config.use_dns_resolver:
resolver = DNSHostNameResolver(
hostname=server, dns_suffix_list=self.source_config.dns_suffixes
)
server = resolver.primary_hostname

upstream_dependencies.append(
CubeDependency(
source=dependency.source,
server=server,
instance=dependency.instance if dependency.instance is not None else self._get_default_ssas_instance_from_config(server=server),
db=dependency.db,
schema=dependency.schema,
name=dependency.name,
type=dependency.type.upper(),
env=self.source_config.env,
)
)
return OLAPLineageStream(dependencies=upstream_dependencies)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize the _get_cube_dependency method.

The method constructs lineage dependencies for cubes. Consider optimizing the logic to handle large datasets more efficiently.

-        cube_sources = [
-            source
-            for source in catalog_sources
-            if source.name in cube_sources_ids or source.id in cube_sources_ids
-        ]

+        cube_sources_set = set(cube_sources_ids)
+        cube_sources = [source for source in catalog_sources if source.name in cube_sources_set or source.id in cube_sources_set]
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _get_cube_dependency(
self, cube: XMLACube, catalog_sources: List[DataSource]
) -> OLAPLineageStream:
"""
Build cube lineage entity.
:param cube: cube representation from xmla response.
:param catalog_sources: list of catalog data sources.
:return: datahub lineage entity.
"""
upstream_dependencies = []
cube_sources_ids = cube.sources_ids
cube_sources = [
source
for source in catalog_sources
if source.name in cube_sources_ids or source.id in cube_sources_ids
]
for dependency in cube_sources:
server = dependency.server
if self.source_config.use_dns_resolver:
resolver = DNSHostNameResolver(
hostname=server, dns_suffix_list=self.source_config.dns_suffixes
)
server = resolver.primary_hostname
upstream_dependencies.append(
CubeDependency(
source=dependency.source,
server=server,
instance=dependency.instance if dependency.instance is not None else self._get_default_ssas_instance_from_config(server=server),
db=dependency.db,
schema=dependency.schema,
name=dependency.name,
type=dependency.type.upper(),
env=self.source_config.env,
)
)
return OLAPLineageStream(dependencies=upstream_dependencies)
def _get_cube_dependency(
self, cube: XMLACube, catalog_sources: List[DataSource]
) -> OLAPLineageStream:
"""
Build cube lineage entity.
:param cube: cube representation from xmla response.
:param catalog_sources: list of catalog data sources.
:return: datahub lineage entity.
"""
upstream_dependencies = []
cube_sources_ids = cube.sources_ids
cube_sources_set = set(cube_sources_ids)
cube_sources = [
source
for source in catalog_sources
if source.name in cube_sources_set or source.id in cube_sources_set
]
for dependency in cube_sources:
server = dependency.server
if self.source_config.use_dns_resolver:
resolver = DNSHostNameResolver(
hostname=server, dns_suffix_list=self.source_config.dns_suffixes
)
server = resolver.primary_hostname
upstream_dependencies.append(
CubeDependency(
source=dependency.source,
server=server,
instance=dependency.instance if dependency.instance is not None else self._get_default_ssas_instance_from_config(server=server),
db=dependency.db,
schema=dependency.schema,
name=dependency.name,
type=dependency.type.upper(),
env=self.source_config.env,
)
)
return OLAPLineageStream(dependencies=upstream_dependencies)

@hsheth2
Copy link
Collaborator

hsheth2 commented Oct 17, 2024

@DmytroYurchuk wanted to bump this - let me know if you need any help with updating this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata pending-submitter-response Issue/request has been reviewed but requires a response from the submitter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants