tidy.outliers

tidymodels
code
package
Author

Bruno Testaguzza Carlin

Published

January 31, 2023

What is tidy.outliers?

tidy.outliers is a pet project of mine born out of a project when I saw a coworker manually implement an outlier detection algorithm and wondered if tidy.models had an easy way to do it.

I knew that scikit had up to 5 established ways of detecting outliers and even a great page talking about outlier removal

This situation left me wondering why no one had written something similar for the incredible tidymodels ecosystem. So I decided to do it myself.

What v 0.2.0 adds?

Univariate methods

I have added a method for users to pass a function to outlier steps, so now if you can have your custom univariate based rules to score outliers, using

step_outliers_univariate

h2o integration

With the new step step_outliers_h2o.extendedIsolationForest, you can read more about the function on Extended Isolation Forest .

outForest method

The new step_outliers_outForest uses the main function from the package, you can read more about it here outForest.

Improvements to CI/CD

The package reached 100% test coverage, and it currently passes four out of five setups for a possible cran release. I even posted my first covering it.

My first video

I also upgraded the package to the v2 action framework of the rstudio team, and you can read more about it actions