2.0 Open a Terminal Session
2.1 Create a virtual environment
2.2 Install the fightchurn package
2.3 Create a directory for output
2.4 Start the Python virtual environment
2.5 Import the run_churn_listing_module
2.6 Set the churn environment variables
2.7 Run the data simulation
2.8 Run code listings
Start by opening a terminal session.
First, you should make a new virtual environment in which to install the Fight Churn code...
python3 -m venv <python_environment_name>
Use whatever you like for <python_environment_name>
; if you don't normally make python environments, call it "py_venv". That command doesn't have any output.
Next, your need to activate your environment.
On Mac or Linux you source the activate
script in the bin folder:
~ user$ source py_venv/bin/activate
The prompt will change and you will see:
(py_venv) ~ user$
On Windows you activate the environment by running the activate
script as follows:
C:\Users\Owner\desktop> py_env\Scripts\activate
The prompt will change and you will see:
(py_env)C:\Users\Owner\desktop\>
pip install fightchurn
This will lead to a lot of outputs, starting with something like this:
(py_venv) ~ user$ pip install fightchurn
Collecting fightchurn
Using cached fightchurn-0.3.5-py3-none-any.whl (99 kB)
Collecting docutils==0.17.1
Using cached docutils-0.17.1-py2.py3-none-any.whl (575 kB)
...
Your output probably will not say "Using cached..." unless you have already installed this before, so don't worry if yours looks a bit different than whats shown above. Regardless, after several minutes (depending on your system and internet connection) you should see this:
Successfully installed Pillow-8.1.2 Pygments-2.9.0 SQLAlchemy-1.4.3 bleach-3.3.0 build-0.4.0 certifi-2020.12.5 chardet-4.0.0 cloudpickle-1.6.0 colorama-0.4.4 cycler-0.10.0 docutils-0.17.1 fightchurn-0.3.5 greenlet-1.0.0 idna-2.10 importlib-metadata-4.5.0 joblib-1.0.1 keyring-23.0.1 kiwisolver-1.3.1 llvmlite-0.36.0 matplotlib-3.4.0 numba-0.53.1 numpy-1.20.2 packaging-20.9 pandas-1.2.3 patsy-0.5.1 pep517-0.10.0 pkginfo-1.7.0 postgres-3.0.0 psycopg2-binary-2.8.6 psycopg2-pool-1.1 pyparsing-2.4.7 python-dateutil-2.8.1 pytz-2021.1 readme-renderer-29.0 requests-2.25.1 requests-toolbelt-0.9.1 rfc3986-1.5.0 scikit-learn-0.24.1 scipy-1.6.2 shap-0.39.0 six-1.15.0 slicer-0.0.7 statsmodels-0.12.2 threadpoolctl-2.1.0 toml-0.10.2 tqdm-4.59.0 twine-3.4.1 urllib3-1.26.4 webencodings-0.5.1 xgboost-1.3.3 zipp-3.4.1
Windows XG-Boost Warning: At the time of this writing there have been problems reported installing the xgboost package on Windows: https://discuss.xgboost.ai/t/pip-install-xgboost-isnt-working-on-windows-x64/57. If you are unable to install xgboost with pip, then you can try to install using the instructions outlined in that link. Alternatively, you can remove that requirement - note that you can still run all the code in the book except for the 2nd half of chapter 9 without xgboost.
Mac XG-Boost Warning: At the time of this writing there have been problems reported installing the xgboost package on recent versions of Mac OS: https://stackoverflow.com/questions/61971851/getting-this-simple-problem-while-importing-xgboost-on-jupyter-notebook If you are unable to install xgboost with pip, try performing the Lib OMP installation described above first. Alternatively, you can remove the xgboost requirement - note that you can still run all the code in the book except for the 2nd half of chapter 9 without xgboost.
You should make a local folder to store your output. On linux that would look as follows:
mkdir my_churn_output_folder
Naturally you can make your folder using the GUI if you are on Mac or Windows
Next you should start your Python environment, and enter a python shell:
source py_venv/bin/activate
python
(The command above is assuming you named your virtual environment py_env
). You should see something like the following...
(py_venv) :~ user$ python
Python 3.9.6 (default, Jun 29 2021, 05:25:02)
[Clang 12.0.5 (clang-1205.0.22.9)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
Next, you will import the fightchurn
package that you will use to run everything:
from fightchurn import run_churn_listing
That has no output, but it might take a few moments
Now you need to set a few enviroment variables. These are:
- The database name : 'churn' in the example below
- The username for the database
- The password for the database
- A local folder where outputs can be written....
run_churn_listing.set_churn_environment('churn','user','password','/path/to/my_churn_output_folder')
This will print out a confirmation line like this:
Setting Environment Variables user=your_user_name for db=churn, output path =`/path/to/my_churn_output_folder`
Next, you need to write some data to the database in order to run the code against - no data is provided with the code distribution. Use the following command:
run_churn_listing.run_standard_simulation(init_customers=10000)
The example is for a standard simulation of 10,000 customers. If you want to speed things up you can run it for 1000 customers and things will still work okay - the results will just be a bit more noisy and random.
You will see output as follows...
Creating schema socialnet7 (if not exists)...
Creating table event (if not exists)
Creating table subscription (if not exists)
Creating table event_type (if not exists)
Creating table metric (if not exists)
Creating table metric_name (if not exists)
Creating table active_period (if not exists)
Creating table observation (if not exists)
Creating table active_week (if not exists)
Creating table account (if not exists)
Creating 2000 initial customers for month of 2020-01-01
Simulated customer 0/2000: 2 subscriptions & 100 events
Simulated customer 100/2000: 448 subscriptions & 154,047 events
Simulated customer 200/2000: 872 subscriptions & 282,882 events
Simulated customer 300/2000: 1,324 subscriptions & 426,866 events
Simulated customer 400/2000: 1,767 subscriptions & 557,543 events
...
This will continue for a while - maybe 15-30 minutes if you ran the full 10,000 customer simulation.
Now you are ready to run some code from the book! To do that you use the run_listing
function that you previously imported. For examle, the following is chapter 2, listing 2:
run_churn_listing.run_listing(2,2)
You should see output like this:
Running chapter 2 listing 2 churn_rate on schema socialnet7
SQL:
----------
set search_path = 'socialnet7'; with
date_range as (
select '2020-03-01'::date as start_date, '2020-04-01'::date as end_date
),
start_accounts as
(
select distinct account_id
from subscription s inner join date_range d on
s.start_date <= d.start_date
and (s.end_date > d.start_date or s.end_date is null)
),
...
----------
RESULT:
Record(churn_rate=0.0570875665215288, retention_rate=0.942912433478471, n_start=2067, n_churn=118)
Explaining what you ares seeing there is beyond the scope of this README, thats what the book is about! But if you have gotten this far, then you have completed all the setup and you are ready to follow along with the book (or videos, however you are learning the code...)
In some parts of the book you might want to run more than one listing at once. To do this, pass as a list for the listing argument. For example, to run all four chapter 2 churn calculation listings try:
run_churn_listing.run_listing(2,[1,2,3,4])
Later in the book, some of the listings have multiple versions with different arguments. The
run_listing
function also takes a version argument. For example, to run a query and plot the
results of the events per day for the first event created by the simulation, try the following:
run_churn_listing.run_listing(chapter=3,listing=[9,10],version=1)
That command should save a plot like this to your output directory:
You can also run multiple versions at once:
run_churn_listing.run_listing(chapter=3,listing=[9,10],version=[2,3])
For more information about what the code listings do, see the book Fighting Churn With Data.