-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add superset command to subscribe to data source changes on HQ #61
Conversation
import json | ||
|
||
|
||
def subscribe_data_sources(superset_base_url, hq_base_url, data_sources_file_path, username, apikey): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Charl1996
Can you share more about whose username / apikey this is?
Are we using the same one for all domains?
I assumed we would use the domain's credentials like done in
commcare-analytics/hq_superset/services.py
Line 173 in fb979e7
def subscribe_to_hq_datasource(domain, datasource_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share more about whose username / apikey this is?
Any superuser.
Since a dev is running this command I thought it's reasonable to expect the dev to be able to generate and use the superuser credentials.
I assumed we would use the domain's credentials like done in
Initially I used it by found out soon enough that there's problems with using the HQRequest
class with respect to the auth token. The class relies on the fact that there's an active user request which is oauth authenticated. Since running a script manually won't have this context I decided to take this approach in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We offlined about this and decided to skip re-subscription to avoid using a dev api key for this and additionally to have subscription authority with users/projects only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 @Charl1996 does that mean that this PR will be closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaapstorm Thanks for flagging. Uhm...do we see any future use for something like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if we need it we can just come back to this PR. Closing it.
@@ -29,6 +29,8 @@ def flask_app_mutator(app): | |||
# it. So I've run out of better options. (Norman 2024-03-13) | |||
os.environ['AUTHLIB_INSECURE_TRANSPORT'] = '1' | |||
|
|||
register_commands(app) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of defining the command in two different places, you can use the same pattern that this flask_app_mutator()
function uses to add views. For example ...
register_commands(app) | |
from . import commands # Append to import at the start of the function | |
app.cli.add_command(commands.subscribe_datasources) |
Then you can drop the register_commands()
function below, and move the command definition, with decorators and docstring, to the commands
module. e.g.
# commands.py
import click
@click.command("subscribe-datasources")
@click.argument("superset_base_url")
@click.argument("hq_base_url")
@click.argument("data_sources_file_path", type=click.Path(exists=True))
@click.argument("username")
@click.argument("apikey", envvar="HQ_APIKEY")
def subscribe_datasources(
superset_base_url,
hq_base_url,
data_sources_file_path,
username,
apikey,
):
"""
This command takes all data sources in the data_sources_file_path
and subscribes to their changes on HQ.
:param superset_base_url: The base URL of the superset instance. This is
used to construct the callback URL for publishing the changes to.
:param hq_base_url: The base URL of the HQ server containing the data
sources you'd like to subscribe to.
:param data_sources_file_path: Path to file containing JSON-formatted
data source IDs by domain. Example JSON:
{"domain": ["data_source_id"]}
:param username: An HQ superuser username.
:param apikey: An HQ superuser API key. Alternatively, you can set the
HQ_APIKEY environment variable to avoid passing this as an argument.
"""
# params separated by blank lines for better readability on CLI:
# $ superset subscribe-datasources --help
|
||
failed_data_sources_by_id = {} | ||
for domain, datasource_ids in data_sources_by_domain().items(): | ||
print(f"Handling domain: {domain}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary, but recommended "in case the terminal is misconfigured" 🤷 .
print(f"Handling domain: {domain}") | |
click.echo(f"Handling domain: {domain}") |
'client_secret': client.get_client_secret(), | ||
} | ||
response = requests.post( | ||
f"{hq_base_url}/{endpoint}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use urllib.parse.urljoin
to handle forward slashes gracefully. Django can figure out extra forward slashes, but proxies can make things unpredictable. Safer just to use urljoin()
.
f"{hq_base_url}/{endpoint}", | |
urljoin(hq_base_url, endpoint), |
} | ||
) | ||
|
||
if response.status_code != 201: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Just for the sake of defensive programming, in case some future dev changes HQ to subscribe asynchronously and return a 202 instead of a 201 -- who knows what future devs will think up? 🙃 😂 -- safer to check all success status codes:
if response.status_code != 201: | |
if not response.ok: |
See #61 (comment) |
Ticket
As part of the linked ticket it's necessary to trigger a (re)subscribe of the data sources on CCA with that on HQ. I've written a flask command to handle this task, so we can reuse this in the future if needed.
Approach
I've decided to make the command reusable for various contexts, so we're now able to define the domain and data sources which needs subscribing in a normal text file on the server and let the command read from that to perform the necessary tasks. I think this approach would be most extensible in the long run, but I'm open to any other suggestions.