Boruta Boruta(SHAP) Does Not Work For The Reason You Think It Does! Everything you wish you knew about Boruta, and more. Yves-Laurent Kom Samo, PhD 3 May 2022 · 8 min read
Common Pitfalls Autoencoders: What Are They, and Why You Should Never Use Them For Pre-Processing Fundamental limitations you need to be aware of before using autoencoders as pre-processing step in predictive modeling problems on tabular data. Yves-Laurent Kom Samo, PhD 19 Apr 2022 · 14 min read
Principal Feature Selection How To Fix PCA To Make It Work For Feature Selection Introducing Principal Feature Selection (With Code) Yves-Laurent Kom Samo, PhD 8 Apr 2022 · 9 min read
Common Pitfalls 5 Reasons You Should Never Use PCA For Feature Selection Fundamental limitations you need to be aware of before using Principal Components Analysis for feature selection. Yves-Laurent Kom Samo, PhD 31 Mar 2022 · 7 min read
Feature Selection Feature Engineering With Game Theory: Beyond SHAP values Understanding the difference between feature importance, feature usefulness, and feature potential using Shapley values. Yves-Laurent Kom Samo, PhD 23 Mar 2022 · 7 min read
Common Pitfalls 5 Reasons Why You Should Never Use Recursive Feature Elimination Fundamental limitations you need to be aware of before using Recursive Feature Elimination (RFE) or any other feature selection algorithm based on feature importance. Yves-Laurent Kom Samo, PhD 15 Mar 2022 · 4 min read
Model Compression AutoML: How To Reduce Your Model Size by 95% While Improving Model Performance We show you how to reduce the number of features used by AWS' AutoGluon tabular models in Python by 95% while improving model performance. Yves-Laurent Kom Samo, PhD 8 Mar 2022 · 10 min read
Model Compression Random Forest: How To Reduce Your Production Model Size by 95% We show you how to reduce the number of features used by your Random Forest model in Python by 95%, at no performance cost. Yves-Laurent Kom Samo, PhD 8 Mar 2022 · 7 min read
Model Compression XGBoost: How To Reduce Your Production Model Size by 95% We show you how to reduce the number of features used by your XGBoost model in Python by 95%, at no performance cost. Yves-Laurent Kom Samo, PhD 7 Mar 2022 · 7 min read
Model Compression LightGBM: How To Reduce Your Production Model Size by 95% We show you how to reduce the number of features used by your LightGBM model in Python by 95%, at no performance cost. Yves-Laurent Kom Samo, PhD 6 Mar 2022 · 7 min read
Model Compression How To Seamlessly Compress Any Tabular Model in Python Train your favorite predictive models in Python (e.g. LightGBM, XGBoost, Scikit-Learn, Tensorflow, and PyTorch models) using at least 80% fewer features, at no performance cost. Yves-Laurent Kom Samo, PhD 4 Feb 2022 · 18 min read
Automating Feature Engineering Effective Feature Selection: Beyond Shapley Values, Recursive Feature Elimination (RFE) and Boruta We explain why feature selection matters, why RFE, Boruta, and SHAP values aren't good enough, and we propose an information-theoretical alternative. Yves-Laurent Kom Samo, PhD 21 Jan 2022 · 12 min read
Automating Feature Engineering Feature Engineering: Why It Matters, and How To Do It Right We explain why every great model needs great features, and we arm you with tools and principles that will help you create better features. Yves-Laurent Kom Samo, PhD 16 Dec 2021 · 6 min read