source("https://git.io/de-iris-my-repos")
<- de_iris_my_repos() iris_issues
It’s time for iris
to go. Use de_iris_my_repos() to help find references to iris
in your public GitHub code so you can replace it with something better.
It only takes two lines to get started. First, check the source code on https://git.io/de-iris-my-repos. Then run these two lines in your R console:
Follow @gadenbuie Star de-iris-my-repos
Fork de-iris-my-repos
Last week, motivated by the Black Lives Matter movement and protests around the United States, Daniela Witten wrote a long and insightful Twitter thread about the origins of an often-used and completely boring dataset: iris
.
I’ve long known about Ronald Fisher’s eugenicist past, but I admit that I have often thoughtlessly turned to iris
when needing a small, boring data set to demonstrate a coding or data principle.
But Daniella and Timothée Poisot are right: it’s time to retire iris.
Like many people, I have spent the last 10 days watching so much tragedy unfold. So much anguish from Black colleagues here on twitter.
— Dr. Daniela Witten (@daniela_witten) June 4, 2020
And so I’ve been trying to think of ways that I can improve my tiny corner of the world.
A thread on why change is hard in academia 1/
Other Options
I read Daniella’s thread and Timothée’s blog post and immediately realized that I needed to be more thoughtful in my choice of datasets. There is absolutely no need for iris
in my examples; there are plenty of other options available.
I’m particularly excited about a new penguins dataset announced on Twitter by the amazing Allison Horst.
The Iris dataset feels really gross now.
— Chris Albon (@chrisalbon) June 4, 2020
Here’s a short list of other data sets you can turn to instead:
Anything else in
data()
.ggplot2::mpg
ggplot2::diamonds
dplyr::starwars
nycflights13
fivethirtyeight
Any of the many #TidyTuesday datasets
De-Iris Your Repos
To help us move on into an iris
-free world, I’ve created a small command-line utility to de_iris_my_repos().
The code is available on GitHub at gadenbuie/de-iris-my-repos, and it only takes two lines in your console to find any references to iris
in your repositories and open an issue in each repo reminding you to kick iris
out.
de_iris_my_repos()
won’t do anything without your explicit consent, but you should still probably check the R script before your source it.
source("https://git.io/de-iris-my-repos")
<- de_iris_my_repos() iris_issues
When you run de_iris_my_repos()
it searches your public code for mentions of iris
and asks you if you want to open an issue in each repo. If you do, it opens an issue using the template in the screen shot below so that you can remember to remove iris
.
Options
A few options are available in de_iris_my_repos()
Choose which GitHub
user
name to review, by default the user associated with the GitHub PAT used by ghSet
dry_run = TRUE
to return results without doing anythingSet
ask = FALSE
to go ahead and open issues in all repositoriesUse
extensions
to provide a list of file types whereiris
might be found.