From a3fed8cabb393145439d684ad8928cb648bb71c3 Mon Sep 17 00:00:00 2001 From: Krishna Chaitanya Velaga <42508945+kcvelaga@users.noreply.github.com> Date: Sat, 6 Jan 2024 03:32:09 -0800 Subject: [PATCH] add html --- docs/T328913_sx_mobile_entry_points.html | 6694 ++++++++++++++++++++++ 1 file changed, 6694 insertions(+) create mode 100644 docs/T328913_sx_mobile_entry_points.html diff --git a/docs/T328913_sx_mobile_entry_points.html b/docs/T328913_sx_mobile_entry_points.html new file mode 100644 index 0000000..68b2ee0 --- /dev/null +++ b/docs/T328913_sx_mobile_entry_points.html @@ -0,0 +1,6694 @@ + + + + + + + + + + + +Mobile Entry Points Funnel Analysis + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+

Mobile Entry Points Funnel Analysis

+

Section Translation

+
+ + + +
+ +
+
Author
+
+

Krishna Chaitanya Velaga, Product Analytics

+
+
+ +
+
Published
+
+

January 5, 2023

+
+
+ + +
+ + +
+ +
+
+
+ +
+
+Introduction (T290428) +
+
+
+

Section translation is an expansion of the Content Translation capabilities. Section translation enables users to expand existing Wikipedia articles by translating new sections. In addition, Section Translation is designed to work on mobile devices (in addition to desktop), which enables users to translate that was not possible with Content Translation before.

+

The content translation events capture various aspects of user interactions with the content and section translation tools. This analysis is the first iteration of visualizing how users arrive through various entry points, flows, and how many reach the currently instrumented next stages.

+

90 days of data preceding 2023-12-31 was reviewed.

+
+
+
+
+
+ +
+
+Overall Summary +
+
+
+
+ +
+
+
    +
  • 85% of the newcomers opened the translation dashboard by navigating from frequent language selector, which surfaces missing languages to translate for an article.
  • +
  • As users gain more editing experience, more users tend to reach the dashboard increasingly through content language selector, which they can search for missing language to translate for an article.
  • +
  • Also, experienced users tend to open the dashboard directly as compared to newcomers.
  • +
  • For users with 1000+ edits, frequent languages selector is only 40% of the time to navigate to the dashboard.
  • +
+
+
+
+
+ +
+
+
    +
  • On larger Wikipedias, frequent languages selector was most used to navigate to the translation dashboard. +
      +
    • Among the top 20 Wikipedias, it was used 85% of the time to access the dashboard.
    • +
  • +
  • On smaller Wikipedias, although frequent language selector remains the most accessed, usage of content language selector is more compared to larger Wikipedia.
  • +
  • This is related to the observations from the user edit bucket, larger Wikipedias tend to have more newcomers compared to smaller Wikipedias.
  • +
+
+
+
+
+ +
+
+
    +
  • Only 7% of the dashboard_translation_start occurred independently of dashboard_open events. +
      +
    • That indicates that most of the users start the translations by already selecting an article/section to translate from an external entry point.
    • +
  • +
  • Among the ones who initiate dashboard_translation_start independently +
      +
    • Majority of the newcomers start a translation by accepting suggestions by the API in the absence of a seed article.
    • +
    • Majority of the experienced users start a translation by choosing the results of a search, followed by accepting a translation suggested because it is related to one of their recent edits.
    • +
  • +
+
+
+
+
+ +
+
+

(main events: dashboard_open, dashbaord_translation_start, editor_segment_end)

+
    +
  • In most cases (77%), those who opened the dashboard transitioned to the translation start screen. +
      +
    • This is because for users navigating to the dashboard from an external entry point, both events occur consecutively.
    • +
    • 13% ended the session and 8% refreshed the dashboard or came back to it later before the session expired.
    • +
  • +
  • Among the users who proceeded to the start screen, only in 15% of the cases they progressed to the editor and made an edit. +
      +
    • In 46% of the cases, users went back to the main dashboard, and 30% ended the session.
    • +
    • As most of the events were generated by users with 0 edits (newcomers), this is largely influenced by those events.
    • +
  • +
  • Among users who made at least one edit, in 80% of the cases, they continued to make additional edits, while 9% went back to the main dashboard, and the rest ended the session.
  • +
+
+
+
+
+ +
+
+

(main events: dashboard_open, dashbaord_translation_start, editor_segment_end)

+
    +
  • Across all edit count buckets, most of the users (>70%) who opened the dashboard proceeded to the translation start screen. +
      +
    • The percentage is higher for newcomers compared to experienced users. This is because most newcomers reach the dashboard through external entry points rather than directly opening the dashboard, in which case, both dashboard_open and dashboard_translation_start are consecutively triggered (with no user action in between), whereas, among experienced users, more users open the dashboard directly and then click to proceed translation start screen.
    • +
  • +
  • Among users who reached the translation start screen +
      +
    • Newcomers tend to end/abandon the session or return to the main dashboard
    • +
    • Only in 12% of the cases, newcomers continued to make an edit from this stage, whereas users with 1000+ made an edit in 32% of the cases. +
        +
      • With higher the editing experience, the more likely that users will continue to make an edit
      • +
    • +
  • +
  • Among users who made at least one, with increasing editing experience, the more likely that users will continue to make additional edits to the machine-translated content, and less likely to end the session or return to the dashboard.
  • +
+
+
+
+
+ +
+
+

(main events: dashboard_open, dashbaord_translation_start, editor_segment_end)

+
+

The rate of transition between various stages of the funnel by the source of entry is highly correlated to the usage of the respective entry point by various user experience levels.

+
+
    +
  • Among users who navigated through frequent languages menu, which was most frequently accessed by newcomers: +
      +
    • In 82% of the cases, they proceeded to the translation start screen.
    • +
    • From the translation start screen, users made at least one edit in 13% of the cases.
    • +
  • +
  • Among users who navigated through content language sector, which was frequently accessed by both newcomers and experienced users alike: +
      +
    • In 75% of the cases, they proceeded to the translation start screen.
    • +
    • From the translation start screen, users made at least one edit in 23% of the cases.
    • +
  • +
  • Among users who directly opened the dashboard, most frequently by experienced users: +
      +
    • In 36% of the cases, they proceeded to the translation start screen.
    • +
    • From the translation start screen, users made at least one edit in 34% of the cases.
    • +
  • +
  • Among users who navigated from an invitation shown on a non-existent page, which was frequently accessed by both newcomers and experienced users alike: +
      +
    • In 80% of the cases, they proceeded to the translation start screen.
    • +
    • From the translation start screen, users made at least one edit in 14% of the cases.
    • +
  • +
  • Among users who directly opened the dashboard with link to specific translation, most frequently by experienced users: +
      +
    • In 87% of the cases, they proceeded to the translation start screen.
    • +
    • From the translation start screen, users made at least one edit in 11% of the cases.
    • +
  • +
  • Among users who navigated from contributions page, which was frequently accessed by both newcomers and experienced users alike: +
      +
    • In 39% of the cases, they proceeded to the translation start screen.
    • +
    • From the translation start screen, users made at least one edit in 7% of the cases.
    • +
  • +
  • Among users who navigated from notice on recently translated articles to review/expand the translation, most frequently by experienced users: +
      +
    • In 88% of the cases, they proceeded to the translation start screen.
    • +
    • From the translation start screen, users made at least one edit in 31% of the cases.
    • +
  • +
+
+
+
+
+ +
+
+
    +
  • In cases where users discarded a suggested translation (677 occurrences), in 80% of the cases they continued to discard the next translation show as well, and 10% proceeded to the translation start screen.
  • +
  • In cases where users requested that the list of suggestions be regenerated (110 occurrences), in 33% of the cases they refreshed the suggestions again, and 20% proceeded to the translation start screen.
  • +
  • In cases where users initiated a search (958 occurrences), in 82% of the cases they proceeded to the translation start screen, and 12% returned to the dashboard.
  • +
  • In cases where users selected an in-progress translation (440 occurrences), in 67% of the cases they returned to the dashboard, and 16% made an edit to the translation.
  • +
  • In cases where users discarded an in-progress translation (132 occurrences), in 58% of the cases they discarded additional in-progress translations, and 13% initiated a search.
  • +
+
+
+
+
+
+
+

Data Gathering

+
+

Setup

+
+
+Code +
import wmfdata as wmf
+import pandas as pd
+from datetime import datetime, timedelta
+import great_tables as gt
+
+import plotly.express as px
+import plotly.graph_objects as go
+import plotly.subplots as sp
+from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
+
+from IPython.display import display_html, display, HTML, clear_output, Markdown
+
+import warnings
+
+
+
+
+Code +
init_notebook_mode(connected=True)
+
+pd.options.display.max_columns = None
+pd.options.display.max_rows = 250
+
+# width for charts
+iplot_width = 950
+max_width = 1250
+
+# always show options bar
+iplot_config = {'displayModeBar': True}
+
+# prints a string at center of the output, bold if needed
+def pr_centered(content, bold=False):
+    if bold:
+        content = f"<b>{content}</b>"
+    
+    centered_html = f"<div style='text-align:center'>{content}</div>"
+    
+    display(HTML(centered_html))
+
+
+
+
+Code +
spark_session = wmf.spark.get_active_session()
+
+if type(spark_session) == type(None):
+    spark_session = wmf.spark.create_custom_session(
+        master="yarn",
+        app_name='cx-funnel-entrypoints',
+        spark_config={
+            "spark.driver.memory": "4g",
+            "spark.dynamicAllocation.maxExecutors": 64,
+            "spark.executor.memory": "16g",
+            "spark.executor.cores": 4,
+            "spark.sql.shuffle.partitions": 256,
+            "spark.driver.maxResultSize": "2g"
+        }
+    )
+
+spark_session.sparkContext.setLogLevel("ERROR")
+
+clear_output()
+
+spark_session
+
+
+ +
+

SparkSession - hive

+ +
+

SparkContext

+ +

Spark UI

+ +
+
Version
+
v3.1.2
+
Master
+
yarn
+
AppName
+
cx-funnel-entrypoints
+
+
+ +
+ +
+
+
+
+

Query

+
+
+Code +
end_dt = '2023-12-31'
+start_dt = (datetime.strptime(end_dt, "%Y-%m-%d") - timedelta(days=90)).strftime("%Y-%m-%d")
+
+
+
+
+Code +
%%time
+
+query = """
+SELECT
+    dt AS ts,
+    DATE(dt) AS dt,
+    HOUR(dt) AS hour,
+    wiki_db,
+    access_method,
+    content_translation_session_id,
+    content_translation_session_position,
+    event_type,
+    event_source,
+    translation_type,
+    translation_source_language,
+    translation_target_language,
+    user_is_anonymous,
+    user_global_edit_count_bucket,
+    year,
+    day,
+    month
+FROM 
+    event_sanitized.mediawiki_content_translation_event
+WHERE
+    DATE(dt) >= DATE('{START_DT}')
+    AND DATE(dt) <= DATE('{END_DT}')
+"""
+
+all_events = wmf.spark.run(
+    query.format(
+        START_DT=start_dt, 
+        END_DT=end_dt
+    )
+)
+
+
+
CPU times: user 4.41 s, sys: 723 ms, total: 5.14 s
+Wall time: 2min 8s
+
+
+
+
+Code +
edit_buckets_across_all_sessions = (
+    all_events[['content_translation_session_id', 'user_global_edit_count_bucket']]
+    .user_global_edit_count_bucket
+    .value_counts(normalize=True)
+    .reset_index()
+    .rename({
+        'user_global_edit_count_bucket': 'Edit Bucket',
+        'proportion': 'Percentage of events'
+    }, axis=1)
+    .sort_values('Percentage of events', ascending=False, ignore_index=True)
+)
+
+edit_buckets_across_all_sessions['Percentage of events'] = edit_buckets_across_all_sessions['Percentage of events'].apply(lambda x:f"{x:.2%}")
+pr_centered(f'Distribution of edit buckets across all sessions', True)
+edit_buckets_across_all_sessions
+
+
+
Distribution of edit buckets across all sessions
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Edit BucketPercentage of events
00 edits53.93%
11000+ edits19.20%
25-99 edits10.76%
3100-999 edits9.87%
41-4 edits6.24%
+ +
+
+
+
+
+

Data Cleaning

+

During analysis, several issues related to the events produced were identified. The most significant issue was with content_translation_session_position where multiple events belong to different and same event types although occurred at different times, have the same session position. Currently, we are not sure whether the session position was being recorded incorrectly, in which it can be re-constructed based on the timestamp, or if they are duplicate events. More information and task to investigate these issues are at T353882. For this analysis, all sessions with potentially erroneous events will not be considered.

+
+
+Code +
temporal_columns = ['ts', 'dt', 'hour']
+
+# sessions with duplicate events expect for the temporal columns
+sessions_with_duplicate_events = (
+    all_events[[col for col in all_events.columns.tolist() if col not in temporal_columns]]
+    .value_counts()
+    .reset_index()
+    .rename({0: 'count'}, axis=1)
+    .query("""count > 1""")
+    .content_translation_session_id
+    .unique()
+    .tolist()
+)
+
+# various event types in a session having same session position althoguh the events occured later
+session_event_counts = (
+    all_events.groupby(['content_translation_session_id', 'content_translation_session_position'])
+    .agg(distinct_events=('event_type', pd.Series.nunique))
+)
+
+sessions_with_same_position_events = (
+    session_event_counts.query("""distinct_events > 1""")
+    .reset_index()
+    .content_translation_session_id
+    .unique()
+    .tolist()
+)
+
+# sessions where multiple global edit count buckets were recorded
+sessions_with_multiple_edit_counts = (
+    all_events.groupby('content_translation_session_id')['user_global_edit_count_bucket']
+    .nunique()
+    .reset_index()
+    .query("""user_global_edit_count_bucket > 1""")
+    .content_translation_session_id
+    .unique()
+    .tolist()
+)
+
+# sessions with no dashboard open at start
+sessions_with_no_dopen_start = (
+    all_events.query("""(content_translation_session_position == 0) & (event_type != 'dashboard_open')""")
+    .content_translation_session_id
+    .unique()
+    .tolist()
+)
+
+sessions_with_dopen = (
+    all_events
+    .query("""event_type == 'dashboard_open'""")['content_translation_session_id']
+    .unique()
+    .tolist()
+)
+
+# sessions without dashboard open
+sessions_without_dopen = (
+    all_events
+    .query("""content_translation_session_id != @sessions_with_dopen""")['content_translation_session_id']
+    .unique()
+    .tolist()
+)
+
+# sessions with multiple events having same session position
+duplicate_events_with_same_position = (
+    all_events[['content_translation_session_id', 'content_translation_session_position', 'event_type']]
+    .value_counts()
+    .reset_index()
+    .rename({0: 'count'}, axis=1)
+    .query("""count > 1""")
+    .content_translation_session_id
+    .unique()
+    .tolist()
+)
+
+
+
+
+Code +
# remove all potentially invalid sessions
+invalid_sessions = list(
+    set(
+        [*sessions_with_duplicate_events,
+         *sessions_with_same_position_events,
+         *sessions_with_multiple_edit_counts, 
+         *sessions_with_no_dopen_start, 
+         *sessions_without_dopen, 
+         *duplicate_events_with_same_position]
+    )
+)
+
+events = all_events.query("""content_translation_session_id != @invalid_sessions""")
+
+
+
+
+Code +
# ensure session positions follows timestamp; fix if needed
+def is_session_position_consistent(group):
+    return group['content_translation_session_position'].is_monotonic_increasing
+
+events = events.sort_values(by=['content_translation_session_id', 'ts'])
+consistency_check = events.groupby('content_translation_session_id').apply(is_session_position_consistent)
+
+assert len(consistency_check[consistency_check == False].index.tolist()) == 0, \
+    f'{len(consistency_check[consistency_check == False])} sessions have inconsistent position'
+
+
+
+
+Code +
# change to appropriate datatypes
+
+edit_buckets = ['0 edits', '1-4 edits', '5-99 edits', '100-999 edits', '1000+ edits']
+
+events = (
+    events
+    .assign(
+        user_global_edit_count_bucket=pd.Categorical(events['user_global_edit_count_bucket'], categories=edit_buckets, ordered=True),
+        ts=pd.to_datetime(events['ts'], utc=True)
+    )
+    .sort_values(by=['content_translation_session_id', 'content_translation_session_position'])
+    .reset_index(drop=True)
+)
+
+print('Dataframe Information')
+events.info()
+
+
+
Dataframe Information
+<class 'pandas.core.frame.DataFrame'>
+RangeIndex: 87874 entries, 0 to 87873
+Data columns (total 17 columns):
+ #   Column                                Non-Null Count  Dtype              
+---  ------                                --------------  -----              
+ 0   ts                                    87874 non-null  datetime64[ns, UTC]
+ 1   dt                                    87874 non-null  object             
+ 2   hour                                  87874 non-null  int32              
+ 3   wiki_db                               87874 non-null  object             
+ 4   access_method                         87874 non-null  object             
+ 5   content_translation_session_id        87874 non-null  object             
+ 6   content_translation_session_position  87874 non-null  int64              
+ 7   event_type                            87874 non-null  object             
+ 8   event_source                          62045 non-null  object             
+ 9   translation_type                      87874 non-null  object             
+ 10  translation_source_language           87872 non-null  object             
+ 11  translation_target_language           87874 non-null  object             
+ 12  user_is_anonymous                     87874 non-null  bool               
+ 13  user_global_edit_count_bucket         87874 non-null  category           
+ 14  year                                  87874 non-null  int64              
+ 15  day                                   87874 non-null  int64              
+ 16  month                                 87874 non-null  int64              
+dtypes: bool(1), category(1), datetime64[ns, UTC](1), int32(1), int64(4), object(9)
+memory usage: 9.9+ MB
+
+
+
+
+Code +
n_all_sessions = all_events.content_translation_session_id.nunique()
+n_all_events = all_events.shape[0]
+
+n_valid_sessions = events.content_translation_session_id.nunique()
+n_events_from_valid_sessions = events.shape[0]
+
+pct_invalid_sessions = 100 - round(n_valid_sessions / n_all_sessions * 100, 2) 
+
+print(f'- all sessions: {n_all_sessions}; all events: {n_all_events}')
+print(f'- valid sessions: {n_valid_sessions}; events from valid sessions: {n_events_from_valid_sessions}')
+print(f'- percentage of sessions with potentially erroneous events: {pct_invalid_sessions}%')
+
+
+
- all sessions: 29365; all events: 277212
+- valid sessions: 15143; events from valid sessions: 87874
+- percentage of sessions with potentially erroneous events: 48.43%
+
+
+
+
+
+

Analysis: Entry Points & Sources

+
+

Dashboard Open

+

As the goal is to understand how users reach the translation dashboard, this part of the analysis only includes events where users navigate to main dashboard from an external source. For example, after adding a segement, a user can come back to dashboard for another translation, and these events are currently being recorded as direct acess (T353799), such dashboard_open events are not considered for this part.

+
+

Overall

+
+
+Code +
# dashboard open events
+
+dopen_events = (
+    events
+    .query("""(event_type == 'dashboard_open') & (content_translation_session_position == 0)""")
+    .reset_index(drop=True)
+)
+
+dopen_edit_counts = (
+    dopen_events[['content_translation_session_id', 'user_global_edit_count_bucket']]
+    .user_global_edit_count_bucket
+    .value_counts(normalize=True)
+    .reset_index()
+    .rename({
+        'user_global_edit_count_bucket': 'Edit Bucket',
+        'proportion': 'Percentage'
+    }, axis=1)
+    .sort_values('Edit Bucket')
+)
+
+dopen_edit_counts_table = (
+    gt
+    .GT(dopen_edit_counts)
+    .fmt_percent(columns='Percentage')
+    .tab_header(title='Frequency of Users\' Edit Buckets', subtitle='that initiated dashboard open events')
+    .tab_source_note(f'across {dopen_events.content_translation_session_id.nunique()} sessions from {start_dt} to {end_dt}')
+)
+
+dopen_edit_counts_table
+
+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Frequency of Users' Edit Buckets +
that initiated dashboard open events +
Edit BucketPercentage
0 edits49.25%
1-4 edits8.35%
5-99 edits15.37%
100-999 edits9.57%
1000+ edits17.46%
across 15007 sessions from 2023-10-02 to 2023-12-31
+ +
+ +
+
+
+
+Code +
# frequency of entry points usage
+
+entry_points_freq = (
+    dopen_events.event_source
+    .value_counts(normalize=True)
+    .reset_index()
+    .rename({
+        'event_source': 'entry_point',
+        'proportion': 'percent'        
+    }, axis=1)
+)
+
+# generate table from dataframe
+entry_points_table = (
+    gt
+    .GT(entry_points_freq)
+    .fmt_percent(columns='percent')
+    .cols_label(
+        entry_point='Entry Point',
+        percent='Percentage'
+    )
+    .tab_header('Overall Distribution Of Entry Points That Users Navigate From', 'To Reach the Content Translation dashboard')
+    .tab_source_note(f'across {dopen_events.content_translation_session_id.nunique()} sessions from {start_dt} to {end_dt}')
+)
+
+entry_points_table
+
+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Overall Distribution Of Entry Points That Users Navigate From +
To Reach the Content Translation dashboard +
Entry PointPercentage
frequent_languages72.69%
content_language_selector15.94%
direct6.37%
invite_new_article_creation3.22%
direct_preselect1.33%
contributions_page0.25%
recent_translation0.21%
across 15007 sessions from 2023-10-02 to 2023-12-31
+ +
+ +
+
+
+
+

by Edit Bucket

+
+
+Code +
warnings.filterwarnings('ignore')
+
+# usage of entry points by various edit buckets
+
+entry_by_edit_bucket = (
+    dopen_events
+    .groupby(['user_global_edit_count_bucket', 'event_source'])
+    .size()
+    .reset_index()
+    .rename({
+        'user_global_edit_count_bucket': 'edit_bucket',
+        'event_source': 'source',
+        0: 'count'
+    }, axis=1)
+    .sort_values(['edit_bucket', 'count'], ascending=[True, False])
+    .reset_index(drop=True)
+)
+
+# total by each edit bucket
+entry_by_edit_bucket['total'] = (
+    entry_by_edit_bucket['edit_bucket']
+    .map(entry_by_edit_bucket
+         .groupby('edit_bucket')
+         .agg({'count': sum})
+         .to_dict()['count']
+    )
+)
+
+# percantage of usage by edit bucket
+entry_by_edit_bucket = entry_by_edit_bucket.astype({'total': int})
+entry_by_edit_bucket['percent'] = entry_by_edit_bucket['count'] / entry_by_edit_bucket['total']
+
+
+
+
+Code +
# only display annonations if entry point accounts for more than 5%
+entry_by_edit_bucket['percent_annot'] = (
+    entry_by_edit_bucket['percent']
+    .apply(lambda x:f"{x:.0%}" if x > 0.05 else None)
+)
+
+# bar graph
+fig = px.bar(entry_by_edit_bucket, 
+             x='percent', 
+             y='edit_bucket', 
+             color='source',
+             labels={
+                 'percent':'% of Total Events', 
+                 'edit_bucket': 'Edit Bucket', 
+                 'source': 'Entry Points'
+             },
+             color_discrete_sequence=px.colors.qualitative.T10,
+             title='Usage of Entry Points by User Global Edit Bucket',
+             text='percent_annot', 
+             # display in increasing edit bucket order
+             category_orders={
+                 'edit_bucket': edit_buckets, 
+                 'source': entry_points_freq.entry_point.values.tolist()
+             }
+            )
+
+# relative stacks the bars
+fig.update_layout(barmode='relative', height=550, width=max_width)
+fig.update_xaxes(tickformat='.0%')
+fig = fig.update_traces(
+    textfont_color='white', 
+    hovertemplate="<br>".join([
+        "Edit Bucket: %{y}",
+        "Percent of Total Events: %{x:.0%}"
+    ])
+)
+
+iplot(fig, config=iplot_config)
+
+
+ +
+
+
+
+
+
+ +
+
+Summary +
+
+
+
    +
  • 85% of the newcomers opened the translation dashboard by navigating from frequent language selector, which surfaces missing languages to translate for an article.
  • +
  • As users gain more editing experience, more users tend to reach the dashboard increasingly through content language selector, which they can search for missing language to translate for an article.
  • +
  • Also, experienced users tend to open the dashboard directly as compared to newcomers.
  • +
  • For users with 1000+ edits, frequent languages selector is only 40% of the time to navigate to the dashboard.
  • +
+
+
+
+
+

By Wiki size: Target Language

+

comparative sizes are based on wiki-comparison data

+
+
+Code +
# wiki comparision data
+wiki_comp = pd.read_csv('https://raw.githubusercontent.com/wikimedia-research/wiki-comparison/main/data-collection/snapshots/Jan_2023.tsv', sep='\t')
+wp_comp = (
+    wiki_comp[wiki_comp['project code'] == 'wikipedia']
+    .reset_index(drop=True)
+    .reset_index()[['index', 'database code', 'language code', 'language name', 'monthly active editors']]
+    .rename({
+        'index': 'rank', 
+        'database code': 'db_code', 
+        'language code': 'lang_code', 
+        'language name': 'lang_name',
+        'monthly active editors': 'active_editors'
+    }, axis=1)
+)
+
+wp_comp['rank'] = wp_comp['rank'] + 1
+
+rank_bin_edges = [0, 5, 10, 20, 50, float('inf')]
+rank_bin_labels = ['1-5', '6-10', '11-20', '21-50', '51-max']
+
+wp_comp['rank_bin'] = pd.cut(
+    wp_comp['rank'], 
+    bins=rank_bin_edges, 
+    labels=rank_bin_labels
+)
+
+# add wiki comparision data to dashboard open events
+dopen_events = (
+    dopen_events
+    .merge(
+        wp_comp[['lang_code', 'rank_bin']],
+        how='left',
+        left_on='translation_target_language',
+        right_on='lang_code'
+    )
+    .rename(columns={'rank_bin': 'target_wp_rank'})
+    .drop('lang_code', axis=1)
+)
+
+
+
+
+Code +
# usage of entry points by translation target language wiki size
+entry_by_target_wp_size = (
+    dopen_events
+    .groupby(['target_wp_rank', 'event_source'])
+    .size()
+    .reset_index()
+    .rename({
+        'event_source': 'source',
+        0: 'count'
+    }, axis=1)
+    .sort_values(['target_wp_rank', 'count'], ascending=[True, False])
+    .reset_index(drop=True)
+)
+
+# total by each rank
+entry_by_target_wp_size['total'] = (
+    entry_by_target_wp_size['target_wp_rank']
+    .map(entry_by_target_wp_size
+         .groupby('target_wp_rank')
+         .agg({'count': sum})
+         .to_dict()['count']
+    )
+)
+
+entry_by_target_wp_size = entry_by_target_wp_size.astype({'total': int})
+entry_by_target_wp_size['percent'] = entry_by_target_wp_size['count'] / entry_by_target_wp_size['total']
+
+
+
+
+Code +
# only display annonations if entry point accounts for more than 5%
+entry_by_target_wp_size['percent_annot'] = (
+    entry_by_target_wp_size['percent']
+    .apply(lambda x:f"{x:.0%}" if x > 0.05 else None)
+)
+
+# bar graph
+fig = px.bar(entry_by_target_wp_size.query("""target_wp_rank != '1-5'"""), 
+             x='percent', 
+             y='target_wp_rank', 
+             color='source',
+             labels={
+                 'percent':'% of Total Events', 
+                 'target_wp_rank': 'Target Language WP Size', 
+                 'source': 'Entry Points'
+             },
+             color_discrete_sequence=px.colors.qualitative.T10,
+             title='Usage of Entry Points by Comparitive Wikipedia Size (of the Target Language)',
+             text='percent_annot', 
+             category_orders={
+                 'target_wp_rank': [i for i in rank_bin_labels if i != '1-5'], 
+                 'source': entry_points_freq.entry_point.values.tolist()
+             }
+            )
+
+
+# stack the bars
+fig.update_layout(barmode='relative', height=550, width=max_width)
+fig.update_xaxes(tickformat='.0%')
+fig.update_traces(textfont_color='white')
+
+iplot(fig, config=iplot_config)
+
+
+ +
+
+
+
+
+
+ +
+
+Summary +
+
+
+
    +
  • On larger Wikipedias, frequent languages selector was most used to navigate to the translation dashboard. +
      +
    • Among the top 20 Wikipedias, it was used 85% of the time to access the dashboard.
    • +
  • +
  • On smaller Wikipedias, although frequent language selector remains the most accessed, usage of content language selector is more compared to larger Wikipedia.
  • +
  • This is related to the observations from the user edit bucket, larger Wikipedias tend to have more newcomers compared to smaller Wikipedias.
  • +
+
+
+
+
+
+

Translation Start

+
+

The next step after opening the translation dashboard is the translation start page, which appears after a user confirms their choice of article/section to translate. This step occurs before the translation editing screen. In this section, various sources through which users reach the translation start page have been analyzed. This step can take place in two scenarios:

+
    +
  • When users from an external source navigate to the dashboard (i.e. entry points such as frequent languages and content language selector), the opening of the translation dashboard is immediately followed by the translation start screen. In such cases, the event_source for dashboard_translation_start will be the same as the source for dashboard_open. For example, if a user clicks on a link from the frequent languages selector, dashboard_open and dashboard_translation_start events are consecutively triggered, with both having event source as frequent_languages. This is because the selection of the article/section has already happened.
  • +
  • When users reach the main dashboard either by directly opening, or returning after editing/completing a translation, there are multiple ways users are shown suggestions, and upon selection, sources specific to dashboard_translation_start get logged.
  • +
+

For this section, only events generated from the second scenario are considered, as the first scenario is caused due to the sources of dashboard_open.

+
+
+
+Code +
# filter translation start events that occurred independently of dashboard open
+dopen_sources = events.query("""event_type == 'dashboard_open'""").event_source.unique().tolist()
+dtstart_self_events = events.query("""(event_type == 'dashboard_translation_start') & (event_source != @dopen_sources)""")
+dstart_sources_freq = dtstart_self_events.event_source.value_counts()
+pct_dopen_independent_events = round(dtstart_self_events.shape[0] / events.query("""(event_type == 'dashboard_translation_start')""").shape[0] * 100, 2)
+
+
+
+
+Code +
# frequency of sources for dashboard translation start
+dstart_sources_freq_by_bucket = (
+    dtstart_self_events[['event_source', 'user_global_edit_count_bucket']]
+    .value_counts()
+    .reset_index()
+    .rename({
+        'count': 'n_events',
+        'event_source': 'Source',
+        'user_global_edit_count_bucket': 'Edit Bucket'
+    }, axis=1)
+    .sort_values(['Source', 'Edit Bucket'])
+)
+
+dstart_sources_freq_by_bucket['source_total_events'] = dstart_sources_freq_by_bucket['Source'].map(dstart_sources_freq.to_dict())
+dstart_sources_freq_by_bucket['Percentage'] = (dstart_sources_freq_by_bucket['n_events'] / dstart_sources_freq_by_bucket['source_total_events']).apply(lambda x:f"{x:.2%}")
+
+dstart_sources_freq_by_bucket_tbl = dstart_sources_freq_by_bucket.pivot(index='Edit Bucket', columns='Source', values='Percentage').fillna(0).to_markdown()
+display(Markdown(dstart_sources_freq_by_bucket_tbl))
+
+
+ ++++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Edit Bucketcontinue_publishedsearch_resultsuggestion_nearbysuggestion_no_seedsuggestion_recent_edit
0 edits08.13%32.26%55.61%1.06%
1-4 edits04.88%22.58%8.42%3.72%
5-99 edits70.00%8.48%29.03%16.63%22.87%
100-999 edits012.43%16.13%8.84%17.55%
1000+ edits30.00%66.09%010.50%54.79%
+
+
+
+
+
+ +
+
+Summary +
+
+
+
    +
  • Only 7% of the dashboard_translation_start occurred independently of dashboard_open events. +
      +
    • That indicates that most of the users start the translations by already selecting an article/section to translate from an external entry point.
    • +
  • +
  • Among the ones who initiate dashboard_translation_start independently +
      +
    • Majority of the newcomers start a translation by accepting suggestions by the API in the absence of a seed article.
    • +
    • Majority of the experienced users start a translation by choosing the results of a search, followed by accepting a translation suggested because it is related to one of their recent edits.
    • +
  • +
+
+
+
+
+
+

Analysis: User Flows (Funnel)

+
+
    +
  • For the majority of the funnel analysis, we will be looking at three main event types, which account for more than 97% of the events: +
      +
    • dashboard_open: user opens the translation dashboard
    • +
    • dashboard_translation_start: proceeding from the dashboard to the start screen
    • +
    • editor_segment_add: user adds a segment of content to the translated version in the editor
    • +
  • +
  • While there are several other events instrumented (mostly related to how users interact with the suggestions), they account for less than 3% of the events, including them in the main analysis, adds a lot of noise, making it hard to derive insights. However, there will be a section at the end of to understand interactions with those events.
  • +
+
+
+
+Code +
# main events list
+main_events = ['dashboard_open', 'dashboard_translation_start', 'editor_segment_add']
+
+# function to plot funnel of user flows
+# by default return a Plotly Sankey plot for a given a dataframe
+#     https://plotly.github.io/plotly.py-docs/generated/plotly.graph_objects.Sankey.html
+# optional: add a table with distribution of edit buckets that tiggered the events
+# optional: return dataframe with transition data, instead of the plots
+def plot_funnel(df, 
+                return_transition_data=False,
+                chart_title=None,
+                events_scope=main_events, 
+                incl_session_end=True,
+                incl_edit_bucket_table=False,
+                font_size=12, 
+                width=iplot_width,
+                height=iplot_width/2.25):
+    
+    warnings.filterwarnings('ignore')
+        
+    df = df.query("""event_type == @events_scope""")    
+    df = df.sort_values(by=['content_translation_session_id', 'content_translation_session_position'])
+    
+    # next event in order within a session
+    df['next_event_type'] = df.groupby('content_translation_session_id')['event_type'].shift(-1)
+
+    # consider as session ended if there no next event
+    if incl_session_end:
+        df['next_event_type'].fillna('session end', inplace=True)
+    else:
+        df.dropna(subset=['next_event_type'], inplace=True)
+    
+    transition_counts = df.groupby(['event_type', 'next_event_type']).size().reset_index(name='count')
+    total_transitions_by_source = transition_counts.groupby('event_type')['count'].sum()
+    transition_counts['total_by_source'] = transition_counts['event_type'].map(total_transitions_by_source)
+    transition_counts['percentage'] = (transition_counts['count'] / transition_counts['total_by_source']) * 100
+    
+    # subplots of table addition, if needed
+    if incl_edit_bucket_table:
+        fig = sp.make_subplots(rows=1, cols=2, column_widths=[0.7, 0.3], 
+                               specs=[[{"type": "sankey"}, {"type": "table"}]])
+    else:
+        fig = sp.make_subplots(rows=1, cols=1, 
+                               specs=[[{"type": "sankey"}]])
+
+    
+    if return_transition_data:
+        return transition_counts
+    else:
+        event_types = pd.concat([transition_counts['event_type'], transition_counts['next_event_type']]).unique()
+        all_event_types = pd.concat([transition_counts['event_type'], transition_counts['next_event_type']]).unique()
+        label_mapping = {label: i for i, label in enumerate(all_event_types)}
+
+        sources = transition_counts['event_type'].map(label_mapping)
+        targets = transition_counts['next_event_type'].map(label_mapping)
+        weights = transition_counts['count']
+
+        sankey = go.Sankey(
+            node=dict(
+                pad=15,
+                thickness=20,
+                line=dict(color="black", width=0.5),
+                label=[label if label != 'session end' else '<i>session end</i>' for label in all_event_types]
+            ),
+            link=dict(
+                source=sources,
+                target=targets,
+                value=weights,
+                hovertemplate='Events: %{value}<br />' +
+                              'Percentage: %{customdata:.2f}%<extra></extra>',
+                customdata=transition_counts['percentage']
+            )
+        )
+        
+        fig.add_trace(sankey, row=1, col=1)
+
+        if incl_edit_bucket_table:
+            agg_events_by_bucket = (
+                df
+                .user_global_edit_count_bucket
+                .value_counts()
+                .reset_index()
+                .rename({
+                    'user_global_edit_count_bucket': 'Edit Bucket',
+                    'count': '# Events'
+                }, axis=1)
+                .sort_values('Edit Bucket')
+            )
+            
+            agg_events_by_bucket['% of Events'] = (
+                agg_events_by_bucket['# Events'] / agg_events_by_bucket['# Events'].sum()
+            ).apply(lambda x:f"{x:.0%}")   
+            
+            table = go.Table(
+                columnwidth = [4, 3, 4],
+                header=dict(values=list(agg_events_by_bucket.columns),
+                            align='left'),
+                cells=dict(values=[
+                    agg_events_by_bucket['Edit Bucket'], 
+                    agg_events_by_bucket['# Events'], 
+                    agg_events_by_bucket['% of Events']],
+                           align='left', 
+                           height=25)
+            )
+            
+            fig.add_trace(table, row=1, col=2)
+        
+        fig.update_layout(title_text=chart_title, font_size=font_size, height=height, width=width)
+        return fig
+
+
+
+
+Code +
iplot(
+    plot_funnel(
+        events, 
+        chart_title='Flow of Users Through CX Workflows & Number of Events Generated by Edit Bucket', 
+        incl_edit_bucket_table=True, 
+        width=max_width, 
+        height=max_width/2.25), 
+    config=iplot_config
+)
+
+
+ +
+
+
+
+
+
+ +
+
+Summary +
+
+
+

(main events: dashboard_open, dashbaord_translation_start, editor_segment_end)

+
    +
  • In most cases (77%), those who opened the dashboard transitioned to the translation start screen. +
      +
    • This is because for users navigating to the dashboard from an external entry point, both events occur consecutively.
    • +
    • 13% ended the session and 8% refreshed the dashboard or came back to it later before the session expired.
    • +
  • +
  • Among the users who proceeded to the start screen, only in 15% of the cases they progressed to the editor and made an edit. +
      +
    • In 46% of the cases, users went back to the main dashboard, and 30% ended the session.
    • +
    • As most of the events were generated by users with 0 edits (newcomers), this is largely influenced by those events.
    • +
  • +
  • Among users who made at least one edit, in 80% of the cases, they continued to make additional edits, while 9% went back to the main dashboard, and the rest ended the session.
  • +
+
+
+
+

By Edit Bucket

+
+
+Code +
n_events = events.query("""(user_global_edit_count_bucket == '0 edits') & (event_type == @main_events)""").shape[0]
+iplot(
+    plot_funnel(events.query("""user_global_edit_count_bucket == '0 edits'"""), 
+                chart_title=f'Flow of Users Through CX Workflows Having 0 Global Edits ({n_events} events)'), 
+          config=iplot_config)
+
+
+ +
+
+
+
+ +
+
+

Among the users who opened the dashboard:

+
    +
  • in 80% of the cases, they proceeded to translation start screen.
  • +
  • in 12% of the cases, they ended the session.
  • +
  • in 8% of the cases, they refereshed the dashboard or came back to it later before the session expired.
  • +
+

Among the users who reach the translation start screen:

+
    +
  • in 12% of the cases, they transitioned to the editor and made an edit.
  • +
  • in 42% of the cases, they went back to the main dashboard.
  • +
  • in 35% of the cases, they ended the session.
  • +
+

Among users who made at least one edit:

+
    +
  • in 69% of the cases, they continued to make additional edits.
  • +
  • in 11% of the cases, they went back to the main dashboard.
  • +
  • in 11% of the cases, they ended the session.
  • +
+
+
+
+
+
+Code +
n_events = events.query("""(user_global_edit_count_bucket == '1-4 edits') & (event_type == @main_events)""").shape[0]
+iplot(plot_funnel(events.query("""user_global_edit_count_bucket == '1-4 edits'"""), 
+                      chart_title=f'Flow of Users Through CX Workflows Having 1-4 Global Edits ({n_events} events)'), 
+          config=iplot_config)
+
+
+ +
+
+
+
+ +
+
+

Among the users who opened the dashboard:

+
    +
  • in 82% of the cases, they proceeded to translation start screen.
  • +
  • in 12% of the cases, they ended the session.
  • +
  • in 5% of the cases, they refereshed the dashboard or came back to it later before the session expired.
  • +
+

Among the users who reach the translation start screen:

+
    +
  • in 8% of the cases, they transitioned to the editor and made an edit.
  • +
  • in 60% of the cases, they went back to the main dashboard.
  • +
  • in 27% of the cases, they ended the session.
  • +
+

Among users who made at least one edit:

+
    +
  • in 75% of the cases, they continued to make additional edits.
  • +
  • in 9% of the cases, they went back to the main dashboard.
  • +
  • in 8% of the cases, they ended the session.
  • +
+
+
+
+
+
+Code +
n_events = events.query("""(user_global_edit_count_bucket == '5-99 edits') & (event_type == @main_events)""").shape[0]
+iplot(plot_funnel(events.query("""user_global_edit_count_bucket == '5-99 edits'"""), 
+                      chart_title=f'Flow of Users Through CX Workflows Having 5-99 Global Edits ({n_events} events)'), 
+          config=iplot_config)
+
+
+ +
+
+
+
+ +
+
+

Among the users who opened the dashboard:

+
    +
  • in 77% of the cases, they proceeded to translation start screen.
  • +
  • in 14% of the cases, they ended the session.
  • +
  • in 9% of the cases, they refereshed the dashboard or came back to it later before the session expired.
  • +
+

Among the users who reach the translation start screen:

+
    +
  • in 13% of the cases, they transitioned to the editor and made an edit.
  • +
  • in 64% of the cases, they went back to the main dashboard.
  • +
  • in 28% of the cases, they ended the session.
  • +
+

Among users who made at least one edit:

+
    +
  • in 80% of the cases, they continued to make additional edits.
  • +
  • in 10% of the cases, they went back to the main dashboard.
  • +
  • in 6% of the cases, they ended the session.
  • +
+
+
+
+
+
+Code +
n_events = events.query("""(user_global_edit_count_bucket == '100-999 edits') & (event_type == @main_events)""").shape[0]
+iplot(plot_funnel(events.query("""user_global_edit_count_bucket == '100-999 edits'"""), 
+                      chart_title=f'Flow of Users Through CX Workflows Having 100-999 Global Edits ({n_events} events)'), 
+          config=iplot_config)
+
+
+ +
+
+
+
+ +
+
+

Among the users who opened the dashboard:

+
    +
  • in 71% of the cases, they proceeded to translation start screen.
  • +
  • in 18% of the cases, they ended the session.
  • +
  • in 11% of the cases, they refereshed the dashboard or came back to it later before the session expired.
  • +
+

Among the users who reach the translation start screen:

+
    +
  • in 16% of the cases, they transitioned to the editor and made an edit.
  • +
  • in 49% of the cases, they went back to the main dashboard.
  • +
  • in 30% of the cases, they ended the session.
  • +
+

Among users who made at least one edit:

+
    +
  • in 85% of the cases, they continued to make additional edits.
  • +
  • in 7% of the cases, they went back to the main dashboard.
  • +
  • in 5% of the cases, they ended the session.
  • +
+
+
+
+
+
+Code +
n_events = events.query("""(user_global_edit_count_bucket == '1000+ edits') & (event_type == @main_events)""").shape[0]
+iplot(plot_funnel(events.query("""user_global_edit_count_bucket == '1000+ edits'"""), 
+                      chart_title=f'Flow of Users Through CX Workflows Having 1000+ Global Edits ({n_events} events)'), 
+          config=iplot_config)
+
+
+ +
+
+
+
+ +
+
+

Among the users who opened the dashboard:

+
    +
  • in 72% of the cases, they proceeded to translation start screen.
  • +
  • in 17% of the cases, they ended the session.
  • +
  • in 10% of the cases, they refereshed the dashboard or came back to it later before the session expired.
  • +
+

Among the users who reach the translation start screen:

+
    +
  • in 32% of the cases, they transitioned to the editor and made an edit.
  • +
  • in 40% of the cases, they went back to the main dashboard.
  • +
  • in 23% of the cases, they ended the session.
  • +
+

Among users who made at least one edit:

+
    +
  • in 86% of the cases, they continued to make additional edits.
  • +
  • in 8% of the cases, they went back to the main dashboard.
  • +
  • in 6% of the cases, they ended the session.
  • +
+
+
+
+
+

All Edit Buckets

+
+
+Code +
# consolidated view of users flows by edit bucket
+
+transition_by_bucket = pd.concat([
+    plot_funnel(events.query(f"user_global_edit_count_bucket == '{bucket}'"), return_transition_data=True)
+    .query("percentage > 5")
+    .assign(edit_bucket=bucket)
+    for bucket in edit_buckets
+])
+
+steps = ['dashboard_open', 'dashboard_translation_start', 'editor_segment_add']
+transition_by_bucket = transition_by_bucket.assign(
+    event_type=pd.Categorical(transition_by_bucket['event_type'], categories=steps, ordered=True),
+    next_event_type=pd.Categorical(transition_by_bucket['next_event_type'], categories=steps + ['session end'], ordered=True)
+)
+
+labels_map = {
+    'dashboard_open': 'main dashboard',
+    'dashboard_translation_start': 'translation start screen',
+    'editor_segment_add': 'made an edit',
+    'session end': 'session ended'
+}
+
+transition_by_bucket = transition_by_bucket.assign(
+    event_type=transition_by_bucket['event_type'].replace(labels_map),
+    next_event_type=transition_by_bucket['next_event_type'].replace({k: f'➔ {v}' for k, v in labels_map.items()})
+)
+
+transition_by_bucket = (
+    transition_by_bucket
+    .sort_values(['event_type', 'next_event_type'])
+    .pivot_table(
+        index=['event_type', 'next_event_type'], 
+        columns='edit_bucket', 
+        values='percentage',
+        sort=False
+    )
+    .reindex(edit_buckets, axis='columns')
+    .reset_index()
+)
+
+
+
+
+Code +
transition_by_bucket_tbl = (
+    gt
+    .GT(
+        transition_by_bucket,
+        groupname_col='event_type', 
+        rowname_col='next_event_type',
+    )
+    .tab_header('Transitions to Various Stages by Edit Bucket')
+    .fmt_percent(edit_buckets, decimals=0, scale_values=False)
+    .tab_style(
+        style=gt.style.text(size="16px"), 
+        locations=gt.loc.body(columns=edit_buckets)
+    )
+    .tab_style(
+        style=gt.style.borders('right', '#bdbdbd'), 
+        locations=gt.loc.body(columns=edit_buckets)
+    )
+)
+
+transition_by_bucket_tbl
+
+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Transitions to Various Stages by Edit Bucket +
0 edits1-4 edits5-99 edits100-999 edits1000+ edits
main dashboard
➔ main dashboard7%6%8%11%10%
➔ translation start screen80%82%77%71%73%
➔ session ended12%12%14%18%17%
translation start screen
➔ main dashboard43%59%54%49%40%
➔ translation start screen9%7%6%6%
➔ made an edit12%8%13%16%32%
➔ session ended35%26%28%30%23%
made an edit
➔ main dashboard11%9%10%7%8%
➔ translation start screen9%6%
➔ made an edit69%77%80%86%86%
➔ session ended12%8%6%6%6%
+ +
+ +
+
+
+
+
+ +
+
+Summary +
+
+
+

(main events: dashboard_open, dashbaord_translation_start, editor_segment_end)

+
    +
  • Across all edit count buckets, most of the users (>70%) who opened the dashboard proceeded to the translation start screen. +
      +
    • The percentage is higher for newcomers compared to experienced users. This is because most newcomers reach the dashboard through external entry points rather than directly opening the dashboard, in which case, both dashboard_open and dashboard_translation_start are consecutively triggered (with no user action in between), whereas, among experienced users, more users open the dashboard directly and then click to proceed translation start screen.
    • +
  • +
  • Among users who reached the translation start screen +
      +
    • Newcomers tend to end/abandon the session or return to the main dashboard
    • +
    • Only in 12% of the cases, newcomers continued to make an edit from this stage, whereas users with 1000+ made an edit in 32% of the cases. +
        +
      • With higher the editing experience, the more likely that users will continue to make an edit
      • +
    • +
  • +
  • Among users who made at least one, with increasing editing experience, the more likely that users will continue to make additional edits to the machine-translated content, and less likely to end the session or return to the dashboard.
  • +
+
+
+
+
+
+

By Entry Point

+
+
+Code +
dopen_sources = events.query("""event_type == 'dashboard_open'""").event_source.unique().tolist()
+
+# plot funnel for a given source
+# identifies sessions starting with the specificed source
+# uses the original plot_funnel functions
+# includes edit bucket table by default 
+def plot_funnel_for_source(source, incl_edit_bucket_table=True):
+    
+    sessions_with_source = (
+        events
+        .query(f"""(event_source == '{source}') & (event_type == 'dashboard_open') & (content_translation_session_position == 0)""")
+        .content_translation_session_id
+        .unique()
+        .tolist()
+    )
+    
+    n_events = events.query("""(event_source == @source) & (event_type == @main_events)""").shape[0]
+    
+    iplot(plot_funnel(events.query("""content_translation_session_id == @sessions_with_source"""), 
+                      chart_title=f'Flow of Users Through CX Workflows; Source: {source} ({n_events} events) & Number of Events Generated by Edit Bucket', 
+                      incl_edit_bucket_table=incl_edit_bucket_table, width=max_width, height=max_width/2.25), 
+          config=iplot_config)
+
+
+
+
+Code +
plot_funnel_for_source('frequent_languages')
+
+
+ +
+
+
+
+ +
+
+

Among the users who opened the dashboard:

+
    +
  • in 82% of the cases, they proceeded to translation start screen.
  • +
  • in 11% of the cases, they ended the session.
  • +
  • in 6% of the cases, they refereshed the dashboard or came back to it later before the session expired.
  • +
+

Among the users who reach the translation start screen:

+
    +
  • in 13% of the cases, they transitioned to the editor and made an edit.
  • +
  • in 46% of the cases, they went back to the main dashboard.
  • +
  • in 32% of the cases, they ended the session.
  • +
+

Among users who made at least one edit:

+
    +
  • in 79% of the cases, they continued to make additional edits.
  • +
  • in 8% of the cases, they went back to the main dashboard.
  • +
  • in 8% of the cases, they ended the session.
  • +
+
+
+
+
+
+Code +
plot_funnel_for_source('content_language_selector')
+
+
+ +
+
+
+
+ +
+
+

Among the users who opened the dashboard:

+
    +
  • in 75% of the cases, they proceeded to translation start screen.
  • +
  • in 19% of the cases, they ended the session.
  • +
  • in 6% of the cases, they refereshed the dashboard or came back to it later before the session expired.
  • +
+

Among the users who reach the translation start screen:

+
    +
  • in 23% of the cases, they transitioned to the editor and made an edit.
  • +
  • in 45% of the cases, they went back to the main dashboard.
  • +
  • in 25% of the cases, they ended the session.
  • +
+

Among users who made at least one edit:

+
    +
  • in 75% of the cases, they continued to make additional edits.
  • +
  • in 13% of the cases, they went back to the main dashboard.
  • +
  • in 9% of the cases, they ended the session.
  • +
+
+
+
+
+
+Code +
plot_funnel_for_source('direct')
+
+
+ +
+
+
+
+ +
+
+

Among the users who opened the dashboard:

+
    +
  • in 36% of the cases, they proceeded to translation start screen.
  • +
  • in 35% of the cases, they ended the session.
  • +
  • in 27% of the cases, they refereshed the dashboard or came back to it later before the session expired.
  • +
+

Among the users who reach the translation start screen:

+
    +
  • in 36% of the cases, they transitioned to the editor and made an edit.
  • +
  • in 40% of the cases, they went back to the main dashboard.
  • +
  • in 14% of the cases, they ended the session.
  • +
+

Among users who made at least one edit:

+
    +
  • in 90% of the cases, they continued to make additional edits.
  • +
  • in 5% of the cases, they went back to the main dashboard.
  • +
  • in 3% of the cases, they ended the session.
  • +
+
+
+
+
+
+Code +
plot_funnel_for_source('invite_new_article_creation')
+
+
+ +
+
+
+
+ +
+
+

Among the users who opened the dashboard:

+
    +
  • in 80% of the cases, they proceeded to translation start screen.
  • +
  • in 9% of the cases, they ended the session.
  • +
  • in 10% of the cases, they refereshed the dashboard or came back to it later before the session expired.
  • +
+

Among the users who reach the translation start screen:

+
    +
  • in 14% of the cases, they transitioned to the editor and made an edit.
  • +
  • in 57% of the cases, they went back to the main dashboard.
  • +
  • in 25% of the cases, they ended the session.
  • +
+

Among users who made at least one edit:

+
    +
  • in 76% of the cases, they continued to make additional edits.
  • +
  • in 13% of the cases, they went back to the main dashboard.
  • +
  • in 9% of the cases, they ended the session.
  • +
+
+
+
+
+
+Code +
plot_funnel_for_source('direct_preselect')
+
+
+ +
+
+
+
+ +
+
+

Among the users who opened the dashboard:

+
    +
  • in 87% of the cases, they proceeded to translation start screen.
  • +
  • in 6% of the cases, they ended the session.
  • +
  • in 6% of the cases, they refereshed the dashboard or came back to it later before the session expired.
  • +
+

Among the users who reach the translation start screen:

+
    +
  • in 11% of the cases, they transitioned to the editor and made an edit.
  • +
  • in 50% of the cases, they went back to the main dashboard.
  • +
  • in 35% of the cases, they ended the session.
  • +
+

Among users who made at least one edit:

+
    +
  • in 82% of the cases, they continued to make additional edits.
  • +
  • in 7% of the cases, they went back to the main dashboard.
  • +
  • in 11% of the cases, they ended the session.
  • +
+
+
+
+
+
+Code +
plot_funnel_for_source('recent_translation')
+
+
+ +
+
+
+
+ +
+
+

Among the users who opened the dashboard:

+
    +
  • in 87% of the cases, they proceeded to translation start screen.
  • +
  • in 8% of the cases, they ended the session.
  • +
  • in 5% of the cases, they refereshed the dashboard or came back to it later before the session expired.
  • +
+

Among the users who reach the translation start screen:

+
    +
  • in 31% of the cases, they transitioned to the editor and made an edit.
  • +
  • in 44% of the cases, they went back to the main dashboard.
  • +
  • in 17% of the cases, they ended the session.
  • +
+

Among users who made at least one edit:

+
    +
  • in 79% of the cases, they continued to make additional edits.
  • +
  • in 10% of the cases, they went back to the main dashboard.
  • +
  • in 10% of the cases, they ended the session.
  • +
+
+
+
+
+
+Code +
plot_funnel_for_source('contributions_page')
+
+
+ +
+
+
+
+ +
+
+

Among the users who opened the dashboard:

+
    +
  • in 39% of the cases, they proceeded to translation start screen.
  • +
  • in 49% of the cases, they ended the session.
  • +
  • in 7% of the cases, they refereshed the dashboard or came back to it later before the session expired.
  • +
+

Among the users who reach the translation start screen:

+
    +
  • in 7% of the cases, they transitioned to the editor and made an edit.
  • +
  • in 64% of the cases, they went back to the main dashboard.
  • +
  • in 14% of the cases, they ended the session.
  • +
+

Among users who made at least one edit:

+
    +
  • in 50% of the cases, they continued to make additional edits.
  • +
  • in 20% of the cases, they went back to the main dashboard.
  • +
  • in 30% of the cases, they ended the session.
  • +
+
+
+
+
+

All Entry Points

+
+
+Code +
# consolidated view of transitions by entry point
+
+transition_by_source = pd.DataFrame()
+
+for source in dopen_sources:
+    
+    sessions_with_source = (
+        events
+        .query(f"""(event_source == '{source}') & (event_type == 'dashboard_open') & (content_translation_session_position == 0)""")
+        .content_translation_session_id
+        .unique()
+        .tolist()
+    )
+    
+    transition_data = (
+        plot_funnel(
+            events
+            .query("""content_translation_session_id == @sessions_with_source"""),
+            return_transition_data=True)
+        .query("percentage > 5")
+        .assign(source=source)
+    )
+    
+    transition_by_source = pd.concat([transition_by_source, transition_data])
+    
+steps = ['dashboard_open', 'dashboard_translation_start', 'editor_segment_add']
+transition_by_source = transition_by_source.assign(
+    event_type=pd.Categorical(transition_by_source['event_type'], categories=steps, ordered=True),
+    next_event_type=pd.Categorical(transition_by_source['next_event_type'], categories=steps + ['session end'], ordered=True)
+)
+
+transition_by_source = transition_by_source.assign(
+    event_type=transition_by_source['event_type'].replace(labels_map),
+    next_event_type=transition_by_source['next_event_type'].replace({k: f'➔ {v}' for k, v in labels_map.items()})
+)
+
+transition_by_source = (
+    transition_by_source
+    .sort_values(['event_type', 'next_event_type'])
+    .pivot_table(
+        index=['event_type', 'next_event_type'], 
+        columns='source', 
+        values='percentage',
+        sort=False
+    )
+    .reindex(entry_points_freq.entry_point.tolist(), axis='columns')
+    .reset_index()
+)
+
+
+
+
+Code +
transition_by_source_tbl = (
+    gt
+    .GT(
+        transition_by_source,
+        groupname_col='event_type', 
+        rowname_col='next_event_type',
+    )
+    .tab_header('Transitions to Various Stages of Translation Funnel by Source of Entry to the Dashboard')
+    .fmt_percent(dopen_sources, decimals=0, scale_values=False)
+    .tab_style(
+        style=gt.style.text(size="16px"), 
+        locations=gt.loc.body(columns=dopen_sources)
+    )
+    .tab_style(
+        style=gt.style.borders('right', '#bdbdbd'), 
+        locations=gt.loc.body(columns=dopen_sources)
+    )
+)
+
+transition_by_source_tbl
+
+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Transitions to Various Stages of Translation Funnel by Source of Entry to the Dashboard +
frequent_languagescontent_language_selectordirectinvite_new_article_creationdirect_preselectcontributions_pagerecent_translation
main dashboard
➔ main dashboard7%6%27%10%6%7%8%
➔ translation start screen82%75%36%80%87%39%88%
➔ session ended11%19%35%9%6%49%
translation start screen
➔ main dashboard46%45%40%57%50%64%45%
➔ translation start screen8%7%12%14%7%
➔ made an edit13%23%34%14%11%7%31%
➔ session ended33%25%14%25%36%14%17%
made an edit
➔ main dashboard8%13%5%13%7%20%10%
➔ made an edit79%75%91%76%81%50%79%
➔ session ended8%9%9%11%30%10%
+ +
+ +
+
+
+
+
+ +
+
+Summary +
+
+
+

(main events: dashboard_open, dashbaord_translation_start, editor_segment_end)

+
+

The rate of transition between various stages of the funnel by the source of entry is highly correlated to the usage of the respective entry point by various user experience levels.

+
+
    +
  • Among users who navigated through frequent languages menu, which was most frequently accessed by newcomers: +
      +
    • In 82% of the cases, they proceeded to the translation start screen.
    • +
    • From the translation start screen, users made at least one edit in 13% of the cases.
    • +
  • +
  • Among users who navigated through content language sector, which was frequently accessed by both newcomers and experienced users alike: +
      +
    • In 75% of the cases, they proceeded to the translation start screen.
    • +
    • From the translation start screen, users made at least one edit in 23% of the cases.
    • +
  • +
  • Among users who directly opened the dashboard, most frequently by experienced users: +
      +
    • In 36% of the cases, they proceeded to the translation start screen.
    • +
    • From the translation start screen, users made at least one edit in 34% of the cases.
    • +
  • +
  • Among users who navigated from an invitation shown on a non-existent page, which was frequently accessed by both newcomers and experienced users alike: +
      +
    • In 80% of the cases, they proceeded to the translation start screen.
    • +
    • From the translation start screen, users made at least one edit in 14% of the cases.
    • +
  • +
  • Among users who directly opened the dashboard with link to specific translation, most frequently by experienced users: +
      +
    • In 87% of the cases, they proceeded to the translation start screen.
    • +
    • From the translation start screen, users made at least one edit in 11% of the cases.
    • +
  • +
  • Among users who navigated from contributions page, which was frequently accessed by both newcomers and experienced users alike: +
      +
    • In 39% of the cases, they proceeded to the translation start screen.
    • +
    • From the translation start screen, users made at least one edit in 7% of the cases.
    • +
  • +
  • Among users who navigated from notice on recently translated articles to review/expand the translation, most frequently by experienced users: +
      +
    • In 88% of the cases, they proceeded to the translation start screen.
    • +
    • From the translation start screen, users made at least one edit in 31% of the cases.
    • +
  • +
+
+
+
+
+
+
+

User Flows: Other Events

+
+
+Code +
# users flows and interactions with events apart from the main events (open, start, edit)
+other_event_transitions = (
+    plot_funnel(
+        events, 
+        events_scope=events.event_type.unique().tolist(), 
+        return_transition_data=True)
+    .query("""event_type != @main_events""")
+    .sort_values(['event_type', 'percentage'], ascending=[True, False])
+    .drop('total_by_source', axis=1)
+)
+
+
+other_event_transitions['next_event_type'] = (other_event_transitions['next_event_type']
+                                              .replace({i:f'➔ {i}' for i in events.event_type.unique().tolist()+['session end']}))
+other_event_transitions['event_type'] = (other_event_transitions['event_type']
+                                              .replace({i:f"""{i} ({events[events['event_type'] == i].shape[0]} events)""" for i in events.event_type.unique().tolist()}))
+
+other_event_transitions_tbl = (
+    gt
+    .GT(
+        other_event_transitions,
+        rowname_col='next_event_type',
+        groupname_col='event_type'
+    )
+    .fmt_percent('percentage', scale_values=False, decimals=1)
+    .cols_label(
+        count='# Events',
+        percentage='Percentage'
+    )
+    .tab_header('Transitions Between Other Event Types', 'apart from dashboard_open, dashboard_traslation_start, editor_segment_add')
+)
+
+other_event_transitions_tbl
+
+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Transitions Between Other Event Types +
apart from dashboard_open, dashboard_traslation_start, editor_segment_add +
# EventsPercentage
dashboard_discard_suggestion (677 events)
➔ dashboard_discard_suggestion54079.8%
➔ dashboard_translation_start7010.3%
➔ session end345.0%
➔ dashboard_open162.4%
➔ dashboard_search60.9%
➔ dashboard_translation_continue60.9%
➔ dashboard_refresh_suggestions30.4%
➔ dashboard_translation_discard10.1%
➔ editor_segment_add10.1%
dashboard_refresh_suggestions (110 events)
➔ dashboard_refresh_suggestions3733.6%
➔ dashboard_translation_start2320.9%
➔ session end1816.4%
➔ dashboard_open1715.5%
➔ dashboard_search76.4%
➔ dashboard_discard_suggestion43.6%
➔ dashboard_translation_continue43.6%
dashboard_search (958 events)
➔ dashboard_translation_start79182.6%
➔ dashboard_open11612.1%
➔ session end333.4%
➔ dashboard_translation_continue141.5%
➔ dashboard_search30.3%
➔ editor_segment_add10.1%
dashboard_translation_continue (440 events)
➔ dashboard_open29466.8%
➔ editor_segment_add7015.9%
➔ session end6414.5%
➔ dashboard_translation_continue51.1%
➔ dashboard_translation_discard30.7%
➔ dashboard_translation_start30.7%
➔ dashboard_search10.2%
dashboard_translation_discard (132 events)
➔ dashboard_translation_discard7657.6%
➔ dashboard_search1813.6%
➔ session end1410.6%
➔ dashboard_open129.1%
➔ dashboard_translation_continue86.1%
➔ dashboard_translation_start32.3%
➔ dashboard_discard_suggestion10.8%
+ +
+ +
+
+
+
+
+ +
+
+Summary +
+
+
+
    +
  • In cases where users discarded a suggested translation (677 occurrences), in 80% of the cases they continued to discard the next translation show as well, and 10% proceeded to the translation start screen.
  • +
  • In cases where users requested that the list of suggestions be regenerated (110 occurrences), in 33% of the cases they refreshed the suggestions again, and 20% proceeded to the translation start screen.
  • +
  • In cases where users initiated a search (958 occurrences), in 82% of the cases they proceeded to the translation start screen, and 12% returned to the dashboard.
  • +
  • In cases where users selected an in-progress translation (440 occurrences), in 67% of the cases they returned to the dashboard, and 16% made an edit to the translation.
  • +
  • In cases where users discarded an in-progress translation (132 occurrences), in 58% of the cases they discarded additional in-progress translations, and 13% initiated a search.
  • +
+
+
+
+ +
+ + +
+ + + + \ No newline at end of file