Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential book layout #14

Open
bradleyboehmke opened this issue Oct 30, 2023 · 2 comments
Open

Potential book layout #14

bradleyboehmke opened this issue Oct 30, 2023 · 2 comments

Comments

@bradleyboehmke
Copy link
Member

Fundamentals

  1. Introduction to ML
    • What is ML
    • Types of ML systems
    • ML in R
  2. Before the modeling process
    • Problem framing
    • Planning & scoping
    • Experimentation
    • Production
  3. The basic modeling process
    • Data splitting
    • Building models
    • Making predictions
    • Model evaluation
      • Understanding residuals
      • Aggregate residual metrics
      • Performance plots (i.e. ROC curve, lift chart)
  4. Data preprocessing
    • Target engineering
    • Missing values
    • Feature filtering
    • Numeric feature engineering
    • Categorical feature engineering
    • Data compression (PCA)
  5. A more robust modeling process
    • Bias-variance trade-off
    • Resampling
    • Hyperparameter tuning
  6. Model trust
    • Ethics
    • Interpretability vs. Explainability
    • Global explainability
    • Local explainability

Supervised Modeling

  1. Linear regression
  2. Logistic regression
    • Add section on Multinomial problems
  3. Regularized regression
  4. Transitioning to non-linearity
  • Polynomial
  • MARS
  • GAMS
  1. KNN
  2. Decision trees
  3. Bagging
  4. Random forests
  5. Gradient boosting
  6. Support vector machines
  7. Stacked models

Deep Learning

  1. Intro to DL
  2. The DL modeling process
  3. Transfer learning
  4. Computer vision
  5. Word embeddings
  6. Language models
@bradleyboehmke
Copy link
Member Author

@bgreenwell, I thought a lot about our recent discussions and it made me go back and reconsider the layout. Above is a proposed new TOC layout. The middle section doesn't change a whole lot but the first section adds some new content that I think would help set the book apart.

For example, ch 2 would talk about framing and scoping ML problems along with thinking about production concerns. This is where we can mention things around the lifecycle of an ML project (i.e. drift) but we mention that our book does not focus on this topic (we can point to other resources).

Also, notice that I remove the unsupervised section but add in a DL section. This modernizes the book plus, I already have a lot of DL notebooks built out that I can migrate so this is starting from scratch.

What are your thoughts?

@bgreenwell
Copy link
Member

bgreenwell commented Nov 26, 2023

Lots to discuss at our next catch-up, but here's some (very) high-level thoughts:

  • The proposed chapter 2 makes me think about the Microsoft ML checklist, which I really like. Can we try to incorporate and/or align with that? Are there others?

  • In the interest of any discussion on leakage, I think preprocessing should be introduced and precede data splitting in chapter 3; then point to the latter chapter on pre-processing methods (but this ties in STRONGLY with leakage). Maybe this is where we introduce the leakage framework?

  • Unsupervised is missing?

  • I think we need need a special chapter on additional topics up front?. E.g., missing values, collinearity in general, interpretability, variable selection and ranking, "Responsible AI", ...

    • I say up front because I think it's too critical to leave for the end, but also hard to discuss prior to the core content. Still pondering on this.
  • I don't like the idea of deep learning being separated from the rest, but perhaps it's worthwhile because of it's broader applications, like embeddings, etc.? But same goes for random forests (e.g., isolation forests for anomaly detection) and many other methods. I can be persuaded here, but that's my initial thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants