thumbnail.jpg

Photo by Brandable Box on Unsplash

This is the second of three blogs on package management.

Registration for our webinar on Managing Packages for Open-Source Data Science on February 17 is now open.

If you’re a data scientist, you’ve been hired to generate insights and create assets – not manage R and Python package environments. But “I spend my day managing packages,” or even worse, “I spend my day fighting IT for the packages I need,” is an all-too-common refrain.

It doesn’t have to be this way. With a little forethought and planning, your organization can adopt a package management strategy that will drastically reduce the amount of hassle data scientists have to endure managing packages.

In this blog post, we’ll explore the frustration your data scientists probably feel if your package management plan doesn’t provide both flexibility to get work done and structure to ensure reproducibility. Then we’ll dig into the first step to make it better: determining your organization’s package management requirements.

When Package Management is Pain

When package management isn’t going well, data scientists or engineers are usually the first ones to feel the sting. Here are some of the ways data scientists experience bad package management plans:

As we discussed in the first blog in this series, successful package management requires attention from both IT/Admins and data scientists as the process spans both the shared repository and the private library.

That means that there’s no single solution to package management.

But, these issues are solvable by developing a package management plan for your organization. The first step is to clearly identify how packages are managed in your environment and who’s responsible.

Discovering Package Management Requirements

Your organization’s package management requirements depend on your organization’s size and complexity. In some organizations, package management involves stakeholders from the data science, IT/Ops, security, and other teams.

Virtually all environments share a few requirements. To successfully manage open source packages for data science, your organization needs:

And depending on your organization, you might need the ability to:

It’s worth taking a minute to think about how your organization currently manages packages and whether you have a way to meet the requirements you face in your organization.

In the (forthcoming) final blog in this series, we’ll dive into how to take the requirements you’ve identified and create your organization’s package management plan, including divvying up responsibility for package management between IT Admins and data scientists, and how to use tools like renv, python virtual environments, and public and private RStudio Package Manager to execute your plan.

Please sign up for our free webinar on February 17 to learn more about managing open source packages for R and Python.