Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to track #requests per user agent? #6

Open
robisonboxedup opened this issue Nov 25, 2022 · 11 comments
Open

Is it possible to track #requests per user agent? #6

robisonboxedup opened this issue Nov 25, 2022 · 11 comments

Comments

@robisonboxedup
Copy link

One metric I'd like to track is the user agent accessing the application, so I can divide my users into cohorts per device (desktop vs mobile). Is there any way to do this already? Is that in the roadmap?

@alexisbernard
Copy link
Member

This is something we have in mind. We do not plan. We do when we have time :-) However, feel free to submit a PR, or discuss how you see it.

@MarcusRiemer
Copy link

I would like to tackle this in a PR. The most straightforward way seems to be to add a counter for every specific user agent property that could be of interest and increment those. I would therefore propose to add the following columns:

  • total_desktop
  • total_tablet
  • total_phone
  • total_bot

Alternatively I could see an appeal to use more branded column names, but this would mean we would have to do database migrations every time a operating system of interest is launched. Something along the lines of:

  • total_linux
  • total_macos
  • total_windows
  • total_ios
  • total_android
  • total_bot

For browser I do very much see the appeal of having branded names, but this will be hard to get right. I wouldn't want to add a new column for every chromium fork. I would therefore propose to just mention the browser engine:

  • total_gecko
  • total_webkit
  • total_blink

Alternatively I would love to use a PostgreSQL jsonb or hstore field (which can be properly indexed). But I have no idea how support for this looks outside my beautiful PostgreSQL bubble and I'm not really eager to find out how MySQL or SQLite deals with this.

@robisonboxedup
Copy link
Author

robisonboxedup commented Feb 27, 2024 via email

@MarcusRiemer
Copy link

The current schema works with a single table with a single row per counted day and navigation target. Most of the aggregation is done immediately when tracking, not afterwards when evaluating. This does have the benefit of being comparatively fast, not requiring any regular maintenance and an overall very simple code base. Introducing more columns that pile up more rows per day seemed ... against this spirit ... to me.

But the more I think about it the more I lean into the direction of "there are already many rows with very small totals for a single target due to navigation, maybe this doesn't really matter after all".

So in this case I would propose the following additional columns:

  • device_type which would be desktop, mobile, tablet or bot
  • operating_system which would be linux, macos, ios, android or windows. Or possibly the name of the search engine when encountering a bot?
  • browser or browser_engine. I'm still not keen on counting every browser engine individually.

Technically the device type could almost be computed from the operating system, but this gets muddy for the phone and tablet distinction.

This approach could also allow developers to provide their own function to express the desired level of granularity for these fields. That way people who would like to also track things like versions could do so.

@alexisbernard
Copy link
Member

Thanks for your suggestions. Is it important for you to know the browsers, devices ans OS per page ? In my case, knowing there are X% of Firefox version Y today across the whole site is enough. I'm asking because I can't see a case where that would be important. Maybe for a marketing person of a big company this detail matters. But ActiveAnalytics does not target people who might consider all the fancy features of GA for example.

This is the schema I have in my mind by adding 3 more tables :

DevicesPerDay(name)
BrowsersPerDay(name, version)
OperatingSystemsPerDay(name, a version maybe ?)

That should cover all cases you listed. Let me know if I miss one.

Regarding the bots, I try to skip them from ActiveAnalytics because I am interested by natural trafic only. But that is really up to you. Indeed it could make sens to have a device named bot, or search engine X.

But I have no idea how support for this looks outside my beautiful PostgreSQL bubble and I'm not really eager to find out how MySQL or SQLite deals with this.

Since January 2024 SQLite supports JSONB and JSON for much longer. I prefer to avoid the non standard types to keep the code very simple.

Thanks again to both of you. I am happy to see people willing to bring new features!

@MarcusRiemer
Copy link

Thanks for your suggestions. Is it important for you to know the browsers, devices ans OS per page ? In my case, knowing there are X% of Firefox version Y today across the whole site is enough. I'm asking because I can't see a case where that would be important.

I personally don't need versions at all.

This is the schema I have in my mind by adding 3 more tables :

DevicesPerDay(name)
BrowsersPerDay(name, version)
OperatingSystemsPerDay(name, a version maybe ?)

That should cover all cases you listed. Let me know if I miss one.

It does indeed cover all use cases, but it also means suddenly have to do up to 4 upserts per visit. Although I do imagine these could be fired simultaneously instead of serially. And when reading the data it requires 3 joins unless a person wouldn't be interested in seeing this data by default. Not sure whether keeping the single table would be premature optimization on my part. But I would still prefer to add 3 columns instead of 3 tables.

Regarding the bots, I try to skip them from ActiveAnalytics because I am interested by natural trafic only. But that is really up to you. Indeed it could make sens to have a device named bot, or search engine X.

Ultimately we can document this and leave it to the user with (imho) negligible impact on complexity.

But I have no idea how support for this looks outside my beautiful PostgreSQL bubble and I'm not really eager to find out how MySQL or SQLite deals with this.

Since January 2024 SQLite supports JSONB and JSON for much longer. I prefer to avoid the non standard types to keep the code very simple.

Not sure how to read this: Would you be open to using JSONB or not?

Thanks again to both of you. I am happy to see people willing to bring new features!

Works the other way around as well: I was happy to find your project that almost does exactly what I needed :-)

One more thing from my part: I would pull in the browser gem to do the actual user agent detection. Is pulling in a very mature gem okay with you or do you prefer to keep this practically dependency free?

MarcusRiemer added a commit to MarcusRiemer/active_analytics that referenced this issue Feb 28, 2024
@MarcusRiemer
Copy link

Ahem, I added a MR of my initial attempt and only at the end I properly understood why you asked whether the user agent stuff should be trackable per page. I somehow jumped directly to the version question ... I stilll wouldn't see 3 additional tables, but maybe one table with the three new columns and a date to group by would've also done the trick.

@alexisbernard
Copy link
Member

The browser version is very important to check on https://caniuse.com the supported CSS and JS features. For example, if too many people use the X browser before version Y, I might not use all the new fancy JS and CSS stuff. I still don't know how to store the versions. Indeed strings cannot handle all cases : "1.2.3" < "1.2.10" # => false.

Yes, more tables means more writes. But writes are delayed and aggregated with ActiveAnalytics.queue_request. That means currently there is 1 write per page instead of pages * views. Thus the cost of ActiveAnalytics is very small even for sites with a large trafic. With a large trafic it's not reasonable to perform a write for each view.

By adding 3 columns the table views_per_days will grow exponentially. Currently, in the worst case ViewsPerDay.count == days * pages which is reasonable. By adding more columns the worst case becomes ViewsPerDay.count == days * pages * browser_name * browser_version * device * os_name * os_version. It's scary, because we are close to have one row per view or only a few.

I also have in mind to add geolocation which will add more combinations. By adding 2 more columns such as country and city the worst case becomes even more exponential ViewsPerDay.count == days * pages * browser_name * browser_version * device * os_name * os_version * cities.

By adding more tables, (devices_per_days, os_per_days, browsers_per_day) the worst case is ViewsPerDay.count + DevicesPerDay.count + OsPerDay.count + BrowserPerDay.count == days * (pages + browser_name * browser_version + device + os_name * os_version). That will grow slower and should produce less writes.

Not sure how to read this: Would you be open to using JSONB or not?

In this cas no.

One more thing from my part: I would pull in the browser gem to do the actual user agent detection. Is pulling in a very mature gem okay with you or do you prefer to keep this practically dependency free?

I try to avoid adding dependencies, but in this case it makes sens. I don't want to write a user agent parser, and all the code won't be binded to it.

After writing this, I just thought the models DevicesPerDay and OperatingSystemPerDay could be merged without increasing too much the numbers of rows. Indeed, if the OS is iOS this is a either a phone or a tablet and same for Android.

BrowsersPerDay(name, version)
PlatformsPerDay(name, version, device)

I am very motivated to work on it but I would like to finish a big improvement on ActiveHashcash first. I don't know if you finally agree with my idea after explaining in details.

@robisonboxedup
Copy link
Author

robisonboxedup commented Feb 29, 2024 via email

@MarcusRiemer
Copy link

The browser version is very important to check on https://caniuse.com/ the supported CSS and JS features. For example, if too many people use the X browser before version Y, I might not use all the new fancy JS and CSS stuff. I still don't know how to store the versions. Indeed strings cannot handle all cases : "1.2.3" < "1.2.10" # => false.

My personal use is to distinguish form factors of devices, I don't care about anything else. But I figured I could throw in the other metrics for other people.

By adding more tables, (devices_per_days, os_per_days, browsers_per_day) the worst case is ViewsPerDay.count + DevicesPerDay.count + OsPerDay.count + BrowserPerDay.count == days * (pages + browser_name * browser_version + device + os_name * os_version). That will grow slower and should produce less writes.

Yeah, I have wandered off too far down the "but I don't want to overly emphasize people who are visiting exactly once" road. That might have worked if I had only pulled in the form factor, but hindsight is always 20/20.

BrowsersPerDay(name, version)
PlatformsPerDay(name, version, device)

So just to make this explicit: You are proposing to count every user agent characteristic once per day?

I am very motivated to work on it but I would like to finish a big improvement on ActiveHashcash first. I don't know if you finally agree with my idea after explaining in details.

I do agree now. My MR could be salvageable with not too many issues, but I'm currently struck down by rather annoying health issues and am not sure when I have time to revisit that MR.

@alexisbernard
Copy link
Member

BrowsersPerDay(name, version)
PlatformsPerDay(name, version, device)

So just to make this explicit: You are proposing to count every user agent characteristic once per day?

Yes, as views are counted one per day and per page, browsers will be counted one per day, per version and per device.

I'm sorry to hear about your health issue. Take care!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants