Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add superset command to subscribe to data source changes on HQ #61

Closed
wants to merge 5 commits into from

Conversation

Charl1996
Copy link
Contributor

Ticket

As part of the linked ticket it's necessary to trigger a (re)subscribe of the data sources on CCA with that on HQ. I've written a flask command to handle this task, so we can reuse this in the future if needed.

Approach

I've decided to make the command reusable for various contexts, so we're now able to define the domain and data sources which needs subscribing in a normal text file on the server and let the command read from that to perform the necessary tasks. I think this approach would be most extensible in the long run, but I'm open to any other suggestions.

import json


def subscribe_data_sources(superset_base_url, hq_base_url, data_sources_file_path, username, apikey):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Charl1996

Can you share more about whose username / apikey this is?
Are we using the same one for all domains?

I assumed we would use the domain's credentials like done in

def subscribe_to_hq_datasource(domain, datasource_id):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkangia

Can you share more about whose username / apikey this is?

Any superuser.

Since a dev is running this command I thought it's reasonable to expect the dev to be able to generate and use the superuser credentials.

I assumed we would use the domain's credentials like done in

Initially I used it by found out soon enough that there's problems with using the HQRequest class with respect to the auth token. The class relies on the fact that there's an active user request which is oauth authenticated. Since running a script manually won't have this context I decided to take this approach in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We offlined about this and decided to skip re-subscription to avoid using a dev api key for this and additionally to have subscription authority with users/projects only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 @Charl1996 does that mean that this PR will be closed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaapstorm Thanks for flagging. Uhm...do we see any future use for something like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if we need it we can just come back to this PR. Closing it.

@@ -29,6 +29,8 @@ def flask_app_mutator(app):
# it. So I've run out of better options. (Norman 2024-03-13)
os.environ['AUTHLIB_INSECURE_TRANSPORT'] = '1'

register_commands(app)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of defining the command in two different places, you can use the same pattern that this flask_app_mutator() function uses to add views. For example ...

Suggested change
register_commands(app)
from . import commands # Append to import at the start of the function
app.cli.add_command(commands.subscribe_datasources)

Then you can drop the register_commands() function below, and move the command definition, with decorators and docstring, to the commands module. e.g.

# commands.py
import click


@click.command("subscribe-datasources")
@click.argument("superset_base_url")
@click.argument("hq_base_url")
@click.argument("data_sources_file_path", type=click.Path(exists=True))
@click.argument("username")
@click.argument("apikey", envvar="HQ_APIKEY")
def subscribe_datasources(
    superset_base_url,
    hq_base_url,
    data_sources_file_path,
    username,
    apikey,
):
    """
    This command takes all data sources in the data_sources_file_path
    and subscribes to their changes on HQ.

    :param superset_base_url: The base URL of the superset instance. This is
        used to construct the callback URL for publishing the changes to.

    :param hq_base_url: The base URL of the HQ server containing the data
        sources you'd like to subscribe to.

    :param data_sources_file_path: Path to file containing JSON-formatted
        data source IDs by domain. Example JSON:
        {"domain": ["data_source_id"]}

    :param username: An HQ superuser username.

    :param apikey: An HQ superuser API key. Alternatively, you can set the
        HQ_APIKEY environment variable to avoid passing this as an argument.
    """
    # params separated by blank lines for better readability on CLI:
    #     $ superset subscribe-datasources --help


failed_data_sources_by_id = {}
for domain, datasource_ids in data_sources_by_domain().items():
print(f"Handling domain: {domain}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, but recommended "in case the terminal is misconfigured" 🤷 .

Suggested change
print(f"Handling domain: {domain}")
click.echo(f"Handling domain: {domain}")

'client_secret': client.get_client_secret(),
}
response = requests.post(
f"{hq_base_url}/{endpoint}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use urllib.parse.urljoin to handle forward slashes gracefully. Django can figure out extra forward slashes, but proxies can make things unpredictable. Safer just to use urljoin().

Suggested change
f"{hq_base_url}/{endpoint}",
urljoin(hq_base_url, endpoint),

}
)

if response.status_code != 201:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Just for the sake of defensive programming, in case some future dev changes HQ to subscribe asynchronously and return a 202 instead of a 201 -- who knows what future devs will think up? 🙃 😂 -- safer to check all success status codes:

Suggested change
if response.status_code != 201:
if not response.ok:

@Charl1996
Copy link
Contributor Author

See #61 (comment)

@Charl1996 Charl1996 closed this Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants