Dev, data, deploy — summer vibes, Posit style

My 2025 summer internship experience at Posit

Published

August 20, 2025

This summer, I worked at Posit (formerly RStudio), focusing on machine learning and software development as part of the tidymodels team. I primarily worked on {filtro}, an R package for filter-based supervised feature selection.

My team at tidymodels is responsible for the creation and maintenance of the {tidymodels} 1 framework, a collection of R packages for modeling and machine learning. It is similar in scope and purpose to {scikit-learn} in Python.

Over the summer, I contributed to:

The Perks of Being a Posit Intern

Though the internship was entirely remote, I had the pleasure of meeting Jules, the other intern on the Python open-source team, as well as engineers based in Europe and across the U.S. They were all very friendly and generous with tips and guidance.

The most exciting though initially intimidating experience was the monthly tidy team meetings, where the tidyverse, tidymodels, and r-lib 3 teams all gathered on Zoom once a month. It felt like seeing all the idols at Posit all at once!

I also enjoyed the weekly tidymodels meetings with Max, Emil, Hannah, and Simon. The depth of the team’s many packages could be a lot to absorb at times, but it offered a glimpse into collaborative software development, the challenges of managing a large ecosystem, and effective communication in open-source environment. Plus, Simon’s work on AI assistance through LLM integration (RAG) was always refreshing to hear.

The Journey So Far

filtro

As mentioned earlier, I primarily worked on {filtro}, an R package for filter-based supervised feature selection. Feature selection is important because it helps models prioritize the most relevant information, reducing noise, speeding up computation, and improving performance and interpretability.

At its core, the package allows users to compute and rank feature relevance, then select the most important features based on these scores. It also supports multiparameter optimization, enabling users to optimize multiple scores at once. If you want to learn more about {filtro}, check it out here.

tidymodels

{tidymodels} supports the full modeling workflow, from preprocessing and optimization to evaluation, postprocessing, and prediction. {filtro} fits into the preprocessing and optimization step. It is just one of several packages available, with others complementing different steps of the workflow.

But, why {tidymodels} instead of {scikit-learn}? I’ve found myself drawn to {scikit-learn} because of Python’s vast machine learning ecosystem, and since I already use {pytorch} for deep learning tasks. At the same time, I really appreciate {tidymodels} when I’m working in R. Its syntax feels natural, the workflows are well-structured, and it reflects the kind of statistical rigor and thoughtful design that statisticians and data scientists had been shaping long before “machine learning” became popular. If you want to learn more about {tidymodels}, check it out here.

In a rush? Jump to the End.

Still reading? Stay for Behind the Scenes.

Behind the Scenes

PR #1: All the Things, All at Once

My first 2 weeks at Posit were packed with learning and collaboration. Max brings deep expertise in modeling, machine learning and software design, and Emil is highly knowledgeable about software development, from Positron (a VS Code–like IDE, but better!) and R’s S3 object-oriented programming (OOP), to Posit’s developer tools, Git and GitHub Actions CI/CD. I later realized that Hannah excels at pull requests (PRs), unit tests, and software design best practices, and I learned a lot from her as well!

One important lesson from Max is that, while it is tempting to jump straight into coding, it’s far more important to step back and think through the software design — often, the real challenge lies in the methodology, not the implementation. Meanwhile, Emil and I did some pair programming, which led to two valuable learning sessions from him:

  • “Everything Positron” (Thanks to my advisor Charlotte for already giving me a brief intro! 4)
  • “Everything S3 OOP”.

S3 OOP Brush-Up

Prior to this internship, I had little to no prior experience with R’s S3 OOP system. This is because while Xiaotian’s {mtd} 5 package is quite well-organized, it makes minimal use of OOP — though I would argue in defense that the most challenging aspect of extending {mtd} lies more in model development than in software engineering.

So here I am, pretty much glued to the Advanced R by Hadley and R Packages books by Hadley and Jenny, while poking around {dials}, {yardstick}, {tune} and a few other packages through weeks 1 to 4 to level up my S3 OOP game. (I am also glued to Tidy Modeling with R and Applied Predictive Modeling, both authored by Max with other co-authors, but for other reason.)

A Turning Point

Week 5 began with a meeting between Max and Hadley to discuss the API design for the {desirability2} package, the package that powers {filtro} for multiparameter optimization. During the discussion, Hadley demonstrated some code using S7 6 and suggested incorporating it. 😱 At the time, it felt like a daunting task, and I was both thrilled and frightened — thrilled to meet him, yet nervous about living up to expectations!

To cap off an exciting week, Emil and I prepared and submitted {filtro} 7 to CRAN 8.

CRAN Horror Story to Acceptance and Dual Branches

After resubmitting for minor errors, the reviewer sent a vague revision request with no details, leaving me in the dark. (I had this unrealistic view of the reviewer as some strict gatekeeper lurking in the shadows, always ready to reject.) Following advice from Max and Emil, I replied back for clarification, and shortly after received an apology along with the great news that {filtro} is now on CRAN. 🚀

The S3 branch was now on CRAN, while the S7 branch lived on as the development version, with many branches ahead in features and updates. In practice, though, I was juggling multiple branches, as I got used to merging, resolving conflicts, and navigating real-world Git and GitHub workflows, though I still find CI/CD concepts a bit abstract and challenging.

The Great S7 Migration 🛥

Since meeting with Hadley in week 5, my weeks 5 to 8 were largely focused on implementing S7 for all scoring methods. I also spent a lot of time rewriting tests to match the new setup and helping with code refactoring, while Max laid the groundwork with 1-2 methods.

These four weeks were the most challenging yet rewarding part of my internship. I had the opportunity to work closely with Max. I learned a lot just by observing how he approached problems, structured code, and made decisions during refactoring. It was a steep learning curve, but I grew a lot — from writing cleaner, more maintable code to thinking more critically about software design and user experience.

Write, Write, and Write…

From week 9 onward, it’s all about writing, and thankfully, not the dissertation kind. The vignette pages for all other tidy pages are phenomenal: clear, concise, and accessible. That sets the tone and adds a bit of pressure to deliver my best.

It also feels like just a blink from developing, refactoring, and deploying the package to now writing documentation, demos, and tutorials. But at the very least, the package website for {filtro} is up running!

Onward & Upward

As I write this, I have two weeks left in my internship, but there’s still quite a bit left to do:

For {filtro}:

  • Refining documentation and writing a tidymodels developer post.
  • Updating the CRAN version to match the development version.
  • Fixing bugs and reviewing PRs for enhancements and new features.

For {important} 9:

  • Working with Max and Emil to develop the receipt steps.

And of course:

  • Finalizing this very post about my internship. 😀

I look forward to wrapping up these projects and seeing them contribute to the broader tidymodels ecosystem.

A big thank you to everyone at Posit for an invaluable experience, and special thanks to Max, Emil, Hannah, Simon, and Hadley for their guidance, support, and inspiration throughout this journey.

Thanks for reading this too!

Footnotes

  1. tidymodels https://www.tidymodels.org.↩︎

  2. I owe Jules, the other intern, the idea to start this summary with bullet points.↩︎

  3. Some of the most useful developer tools we used daily, such as {devtools}, {usethis}, {testthat}, and {rlang}, were all developed by the r-lib team https://github.com/r-lib.↩︎

  4. I want to thank my co-advisor, Charlotte, for encouraging me to write and organize my dissertation using Positron. But, it wasn’t until this internship that I unlocked its full potential for software development and Git integration, though it was also Charlotte who helped me get this internship. 🤩↩︎

  5. This is the package I build upon in my PhD dissertation. See Xiaotian’s package at https://github.com/xzheng42/mtd. My extended version will be available on GitHub in late December or later. 🤞↩︎

  6. S7: a new OO system for R https://rconsortium.github.io/S7/.↩︎

  7. We’ve considered names like {filter}, {filters}, {filtrr}, {flitrrr}, and {filtron}, but unfortunately, most of them are already taken.↩︎

  8. Comprehensive R Archive Network https://cran.r-project.org.↩︎

  9. I thought Max did a great job coming up with this name!↩︎