Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute histograms for numeric attributes #3

Open
joaquinvanschoren opened this issue Jan 29, 2018 · 1 comment
Open

Compute histograms for numeric attributes #3

joaquinvanschoren opened this issue Jan 29, 2018 · 1 comment

Comments

@joaquinvanschoren
Copy link
Contributor

Currently the website shows a box plot for numeric attributes. This does not always look good, plus it hides a lot of information.

It would be better to store a histogram of the distribution. This can be computed beforehand.
I.e. Something like this: https://www.mathworks.com/help/examples/matlab/win64/AdjustHistogramPropertiesExample_01.png

For categorical targets we could also compute it per class value: https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2014/03/histograms.png

Looking at the code, we could extend models.AttributeStatistics
with a new function that returns something of the form
[[b1,b2,b3],[123],[234],[354]], where b1, b2 are the bucket values.

For categorical targets, we could compute something like
[[b1,b2,b3],[123,12,23],[234,23,34],[354,34,45]] for a 3-class dataset.

What do you think would be the best way to implement this?

@janvanrijn
Copy link
Member

If you want to take this really to the next step, please consider a cdf rather than a histogram.

https://www.andata.at/en/software-blog-reader/why-we-love-the-cdf-and-do-not-like-histograms-that-much.html

There are two ways to implement this. 1) On Evaluation Engine level, 2) On ES level.

My preference goes to (2), and I can also add a reason when I have a bit more time, but what is your opinion and why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants