Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML notebook: Add interpretation section #47

Open
panchiwalashivani opened this issue Oct 5, 2020 · 0 comments
Open

ML notebook: Add interpretation section #47

panchiwalashivani opened this issue Oct 5, 2020 · 0 comments

Comments

@panchiwalashivani
Copy link

What features are being used to make the classification?

_Type of Classification Description Example_
Categorical (Nominal) Classification of entities into particular categories. That thing is a dog.That thing is a car.
Ordinal Classification of entities in some kind of ordered relationship. You are stronger than him.It is hotter today than yesterday.
Adjectival or Predicative Classification based on some quality of an entity. That car is fast.She is smart.
Cardinal Classification based on a numerical value. He is six feet tall.It is 25.3 degrees today.

Categorical classification is also called nominal classification because it classifies an entity in terms of the name of the class it belongs to. This is the type of classification we focus on in this document.

Why are those features important?

Let’s imagine that you’ve landed a consulting gig with a bank who have asked you to identify those who have a high likelihood of default on the next month’s bill. Armed with the machine learning techniques that you’ve learnt and practiced, let’s say you proceed to analyze the data set given by your client and have used a random forest algorithm that achieves a reasonably high accuracy. Your next task is to present to the business stakeholders from the client’s team how you achieved these results. What would you say to them? Will they be able to understand all the hyperparameters of the algorithm that you tweaked in order to land on your final model? How will they react when you start talking about the number of estimators and Gini criterion of the random forest?
Although it is important to be proficient in understanding the inner workings of the algorithm, it is far more essential to be able to communicate the findings to an audience who may not have any theoretical / practical knowledge of machine learning. Just showing that the algorithm predicts well is not enough. You have to attribute the predictions to the elements of the input data that contribute to your accuracy. Thankfully, the random forest implementation of sklearn does give an output called “feature importances” which helps us explain the predictive power of the features in the dataset. But, there are certain drawbacks to this method that we will explore in this post, and an alternative technique to assess the feature importances that overcomes these drawbacks.

What does that say about the problem domain?

A problem domain is the area of expertise or application that needs to be examined to solve a problem. A problem domain is simply looking at only the topics of an individual's interest, and excluding everything else. For example, when developing a system to measure good practice in medicine, carpet drawings at hospitals would not be included in the problem domain. In this example, the domain refers to relevant topics solely within the delimited area of interest: medicine. This points to a limitation of an overly specific, or overly bounded, problem domain. An individual may think they are interested in medicine and not interior design, but a better solution exists outside of the problem domain as it was initially conceived. For example, when IDEO researchers noticed that patients in hospitals spent a huge amount of time staring at acoustic ceiling tiles, which "became a symbol of the overall ambiance: a mix of boredom and anxiety from feeling lost, uninformed, and out of control."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant