-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Cleaned up authentication steps and readme after install test with bu…
…siness analyst.
- Loading branch information
1 parent
1cc9181
commit 0281b84
Showing
4 changed files
with
105 additions
and
82 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,89 @@ | ||
# README # | ||
Google Analytics 360 Flattener. A Google Cloud Platform (GCP) solution that unnests (flattens) Google Analytics Data stored in Bigquery. The GCP resources for the solutions are installed via Deployment Manager. | ||
Google Analytics 360 Flattener. A Google Cloud Platform (GCP) solution that unnests (flattens) Google Analytics Data stored in Bigquery. | ||
The GCP resources for the solutions are installed via Deployment Manager. | ||
|
||
## Local dependencies ## | ||
* Python 3.7 or higher as base interpreter | ||
* Create a virtual environment | ||
* Install python packages using cf/requirements.txt | ||
* Google Cloud Platform SDK. Download and install from these instructions: https://cloud.google.com/sdk/docs/install | ||
* Python >= 3.7. Download and install from https://python.org. | ||
* Web browser | ||
* git (**optional** for cloning this code repository) | ||
|
||
## Directories ## | ||
## Prerequisites ## | ||
1. Browse to https://cloud.console.google.com to create Google GCP project or use | ||
an existing project that has Google Analytics data flowing to it. | ||
Referred to as **[PROJECT_ID]**. | ||
2. Grant the installing user (you most likely) the pre-defined IAM role of "Owner". | ||
3. As the installing user for **[PROJECT_ID]**, enable the following APIs | ||
* Cloud Build API | ||
* Cloud Functions API. | ||
* Identity and Access Management (IAM) API | ||
4. As the installing user for **[PROJECT_ID]**, grant the following pre-defined IAM roles to | ||
**[PROJECT_NUMBER]**@cloudservices.gserviceaccount.com (built in service | ||
account) otherwise deployment will fail with permission errors. See | ||
<https://cloud.google.com/deployment-manager/docs/access-control> for | ||
detailed explanation. | ||
* Logs Configuration Writer | ||
* Cloud Functions Developer | ||
* pub/sub Admin | ||
5. As the installing user for **[PROJECT_ID]**, create bucket for staging code during deployment, for example: | ||
**[PROJECT_NUMBER]**-function-code-staging. Referred to as **[BUCKET_NAME]**. | ||
6. Clone this github repo or download the source code from the releases section to your local machine or | ||
cloud shell. | ||
7. Edit the _ga_flattener.yaml_ and _ga_flattener_colon.yaml_ files, specifically all occurrences of _properties-->codeBucket_ value . Set the value to **[BUCKET_NAME]** (see step above) | ||
|
||
_**The following steps are only required if you plan to backfill historical tables._** | ||
8. Install python 3.7 or higher | ||
9. From a command prompt, upgrade pip (Command: py -m pip install --upgrade pip) | ||
10. Navigate to the root directory of the source code that was downloaded or cloned in step 6 above. | ||
10. From a command prompt, install python virtual environments (Command: py -m pip install --user virtualenv) | ||
11. Create a virtual environment for the source code in step 6 (Command: py -m venv venv) | ||
12. Active the virtual environment in the step above. | ||
13. Install the python dependent packages into the virtual environment. (Command: pip install -r cf\requirements.txt) | ||
|
||
## Installation steps ## | ||
1. Execute command: gcloud config set project **[PROJECT_ID]** | ||
2. Execute command: gcloud config set account <[email protected]>. **Note** - This must be the installing user from above prerequisites. | ||
3. Navigate (locally) to root directory of this repository | ||
4. If **[PROJECT_ID]** does **NOT** contain a colon (:) execute command: | ||
* gcloud deployment-manager deployments create **[Deployment Name]** --config ga_flattener.yaml | ||
otherwise follow these steps: | ||
1. execute command: | ||
* gcloud deployment-manager deployments create **[Deployment Name]** --config ga_flattener_colon.yaml | ||
2. Trigger function (with a blank message) named **[Deployment Name]**-cfconfigbuilderps. It will create the necessary configuration file in the applications Google Coud Storage bucket. An easy method to do this is to browse to https://console.cloud.google.com/functions and click the cloud function named **[Deployment Name]**-cfconfigbuilderps and go to the testing section and click "TEST THIS FUNCTION". | ||
|
||
## Verification steps ## | ||
1. After installation, a configuration file named config_datasets.json exists in **gs://[Deployment Name]-[PROJECT_NUMBER]-adswerve-ga-flat-config/** (Cloud Storage Bucket within **[PROJECT_ID]**). This file contains all the datasets that have "ga_sessions_yyyymmdd" tables and which tables to unnest. This configuration is required for this GA flattener solution to run daily or to backfill historical data. Edit this file accordingly to include or exclude certain datasets or tables to unnest. For example: | ||
* { "123456789": ["sessions","hits","products"] } will only flatten those 3 nested tables for GA view 123456789 | ||
* { "123456789": ["sessions","hits","products", "promotions", "experiments"], "987654321": ["sessions","hits"] } will flatten all possible nested tables for GA view 123456789 but only _sessions_ and _hits_ for GA View 987654321. | ||
|
||
_**The following steps are only required if you plan to backfill historical tables._** | ||
2. Modify values in the configuration section of tools/pubsub_message_publish.py accordingly. **Suggestion:** Use a small date range to start, like yesterday only. | ||
3. From a gcloud command prompt, authenticate the installating user using command: | ||
_gcloud auth application-default login_ | ||
4. Run tools/pubsub_message_publish.py locally, which will publish a | ||
simulated logging event of GA data being ingested into BigQuery. Check dataset(s) that are configured for new date sharded tables such as (depending on what is configured): | ||
* ga_flat_experiments_(x) | ||
* ga_flat_hits_(x) | ||
* ga_flat_products_(x) | ||
* ga_flat_promotions_(x) | ||
* ga_flat_sessions_(x) | ||
|
||
## Un-install steps ## | ||
1. Optional command to remove solution: | ||
* gcloud deployment-manager deployments delete **[Deployment Name]** -q | ||
|
||
## Common errors ## | ||
### Install ### | ||
* * **Message:** AccessDeniedException: 403 **[PROJECT_NUMBER]**@cloudbuild.gserviceaccount.com does not have storage.objects.list access to the Google Cloud Storage bucket. | ||
* **Resolution:** Ensure the value (Cloud Storage bucket name) configured in "codeBucket" setting of ga_flattener*.yaml is correct. **[PROJECT_NUMBER]**@cloudbuild.gserviceaccount.com only requires GCP predefined role of _Cloud Build Service Account_ | ||
### Verification ### | ||
* * **Message:** google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started | ||
* **Resolution:** Ensure you run the gcloud command _gcloud auth application-default login_ as this sets up the required authentication and it'll just work. | ||
|
||
## Repository directories ## | ||
* cf : pub/sub triggered cloud function that executes a destination | ||
query to unnest(flatten) the <GA View ID>.ga_sessions_yyyymmdd table | ||
immediately upon arrival in BigQuery into 5 tables: | ||
immediately upon arrival in BigQuery into these tables, depending on the configuration: | ||
* ga_flat_sessions_yyyymmdd | ||
* ga_flat_hits_yyyymmdd | ||
* ga_flat_products_yyyymmdd | ||
|
@@ -23,7 +97,7 @@ Google Analytics 360 Flattener. A Google Cloud Platform (GCP) solution that unn | |
location: | ||
[DEPLOYMENT NAME]-[PROJECT_NUMBER]-adswerve-ga-flat-config\config_datasets.json | ||
|
||
## Files ## | ||
## Repository files ## | ||
* dm_helper.py: provides consistent names for GCP resources accross | ||
solution. Configuration and constants also found in the class in this | ||
file | ||
|
@@ -34,57 +108,4 @@ Google Analytics 360 Flattener. A Google Cloud Platform (GCP) solution that unn | |
* tools/pubsub_message_publish.py : python based utility to publish a | ||
message to simulate an event that's being monitored in GCP logging. | ||
Useful for smoke testing and back-filling data historically. | ||
* LICENSE: BSD 3-Clause open source license | ||
|
||
## Prerequisites ## | ||
1. Create Google GCP project or use an existing project that has Google | ||
Analytics data flowing to it. Referred to as [PROJECT_ID] | ||
2. Enable the Cloud Build API | ||
3. Enable the Cloud Functions API | ||
4. Enable the Identity and Access Management (IAM) API | ||
5. Add "Logs Configuration Writer", "Cloud Functions Developer", "pub/sub Admin" pre | ||
defined IAM roles to | ||
[PROJECT_NUMBER]@cloudservices.gserviceaccount.com (built in service | ||
account) otherwise deployment will fail with permission errors. See | ||
<https://cloud.google.com/deployment-manager/docs/access-control> for | ||
detailed explanation. | ||
6. Install gCloud locally or use cloud shell. | ||
7. Clone this github repo | ||
8. Create bucket for staging code during deployment, for example: | ||
[PROJECT_NUMBER]-function-code-staging. Referred to as [BUCKET_NAME]. | ||
9. Edit the ga_flattener.yaml file, specifically the | ||
_properties-->codeBucket_ value of the function and httpfunction | ||
resources. Set the value for both to [BUCKET_NAME] (see previous step) | ||
|
||
## Installation steps ## | ||
1. Execute command: gcloud config set project [PROJECT_ID] | ||
2. Execute command: gcloud config set account <[email protected]> | ||
3. Navigate (locally) to root directory of this repository | ||
4. If [PROJECT_ID] does **NOT** contain a colon (:) execute command: | ||
* gcloud deployment-manager deployments create [Deployment Name] --config ga_flattener.yaml | ||
otherwise follow these steps: | ||
1. execute command: | ||
* gcloud deployment-manager deployments create [Deployment Name] --config ga_flattener_colon.yaml | ||
2. Trigger function (with a blank message) named [Deployment Name]-cfconfigbuilderps. It will create the necessary configuration file in the applications Google Coud Storage bucket. | ||
|
||
## Verification steps ## | ||
1. After installation, a configuration file named config_datasets.json will exists in gs://[Deployment Name]-[PROJECT_NUMBER]-adswerve-ga-flat-config/ (Cloud Storage Bucket within [PROJECT_ID]). This file will contains all the datasets that have "ga_sessions_yyyymmdd" tables and which tables to unnest. This configuration is required for the cloud function to execute. | ||
|
||
## Testing / Simulating Event ## | ||
1. Modify values in lines 7-17 of | ||
tools/pubsub_message_publish.py accordingly. | ||
2. Run tools/pubsub_message_publish.py locally, which will publish a | ||
simulated logging event of GA data being ingested into BigQuery. Check dataset for date sharded tables named: | ||
* ga_flat_experiments_(x) | ||
* ga_flat_hits_(x) | ||
* ga_flat_products_(x) | ||
* ga_flat_promotions_(x) | ||
* ga_flat_sessions_(x) | ||
|
||
## Un-install steps ## | ||
1. Optional command to remove solution: | ||
* gcloud deployment-manager deployments delete [Deployment Name] -q | ||
|
||
## Common install errors ## | ||
1. * **Message:** Step #2: AccessDeniedException: 403 [PROJECT_NUMBER]@cloudbuild.gserviceaccount.com does not have storage.objects.list access to the Google Cloud Storage bucket. | ||
* **Resolution:** Ensure the value (Cloud Storage bucket name) configured in "codeBucket" setting of ga_flattener*.yaml is correct. [PROJECT_NUMBER]@cloudbuild.gserviceaccount.com only requires GCP predefined role of _Cloud Build Service Account_ | ||
* LICENSE: BSD 3-Clause open source license |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters