“We all deserve better than brittle, error-prone Excel dashboards on somebody’s laptop.” – Comment from a public health colleague
Public health professionals often find themselves trapped in complex Excel workflows. We maintain beautifully formatted spreadsheets with intricate formulas that look impressive but become maintenance nightmares. The truth is, these Excel workbooks – often unsecured on a single laptop – can’t scale to our growing data needs. In an era where accuracy and timeliness trump visual polish, there’s a compelling need to move beyond Excel and embrace more robust, automated tools. But making that leap can be intimidating, especially if you don’t consider yourself a “coder.” I understand this challenge firsthand – and I want to share how you can incrementally build your analytics capacity by learning R, even under tight budgets and limited time.
Several years ago, I was working as a public health informaticist responsible for monitoring data collection at 30 hospitals across three provinces. Every day I manually downloaded data and ran checks in Excel. The process took hours and was barely keeping up with the project’s demands. I foresaw limitations in Excel for tasks like connecting to specialized databases, merging multiple datasets, and performing complex cleaning. The wake-up call came when an engineering friend helped me reproduce one of my elaborate Excel reports using Python in a matter of seconds. That eye-opener made me realize: if I learned to script, I could save enormous time and reduce errors.
I decided to learn programming myself. After exploring options, I chose R – inspired by a colleague’s R workflows and a mentor’s advice to take the Johns Hopkins Data Science Specialization on Coursera. Getting started was hard. I remember feeling overwhelmed during the first courses, but I also knew I had no choice if I wanted to keep up with the data. My Excel-based approach was not sustainable for the project’s scale and complexity. So I started small. I continued with my Excel routine, but each day I picked one repetitive task and tried to replace it with an R script. In the beginning, Excel was still in the mix, but gradually the heavy lifting shifted to R. Over a month or two, I scripted most of the key steps that I used to do by hand. For example, instead of manually cleaning and merging CSV files, I wrote an R script to import all files and perform the cleaning in one go. Instead of copy-pasting charts, I created them with ggplot2. Bit by bit, my workflow transformed – tasks that once took an afternoon in Excel ran in minutes or seconds in R. Equally important, the R code made my work reproducible and easy to update. If new data came in, I could re-run the script and get updated results without rebuilding spreadsheets from scratch.
Eventually, I had scripted and automated my entire reporting workflow. As soon as I switched on my computer, I went to make my coffee while scripts automatically downloaded the latest data, performed cleaning, and generated reports. These reports gave me immediate situational awareness of what happened the day before at all the data collection sites, helping me know exactly where to provide support. This automation freed up significant time, allowing me to write additional scripts to monitor and check the data for new issues that might arise. The transformation was remarkable – from spending hours manually processing data to having a fully automated system that delivered insights before I even finished my morning coffee.
This incremental approach was key. It’s daunting to imagine rewriting everything in R, but you don’t have to do it overnight. Focus on one small win at a time. As I automated more tasks, I freed up time to learn and improve my coding. Before long, I was spending more time thinking about analysis and less time fighting with Excel. And I still used Excel for certain things (e.g. quick ad-hoc looks at data or final formatting of a report), but it was no longer the backbone of my process. R became that backbone.
You might ask, why choose R (and not Python or another tool)? The choice can depend on your context, but R has a strong community in public health and a rich ecosystem of packages for epidemiology and biostatistics. More importantly, learning any programming tool is an investment in your own capacity. It enables automation, reproducibility, and scalability that Excel can’t match. In public health analytics, those advantages translate to faster insights and fewer errors – which ultimately means better decisions and outcomes. Upskilling staff in data programming is something even resource-strapped health departments are beginning to prioritize, because it pays off in efficiency and impact. (In my last article, I noted how doing more with less often requires new skills and partnerships. This is one way to cultivate those skills.)
An additional benefit: skills you learn in one programming language transfer to other languages - making it easier to become proficient in multiple languages. The core concepts of data manipulation, visualization, and automation remain consistent across languages. Once you’ve learned R, picking up Python or SQL becomes much more intuitive. Even better, if you build a repository of code in R to perform specific tasks, modern AI tools can translate that code to other languages like Python or SQL as needed. This flexibility gives you options as your career and needs evolve, making the initial investment in learning R even more valuable.
In a previous article, I emphasized a principle that seems to resonate: “Prioritize automation over aesthetics in your workflows… focus on script-driven reports that might look plainer but update automatically and reliably… In public health analytics, accuracy and timeliness trump visual polish every time.” I was pleased to see a LinkedIn connection quote and reshare this sentiment, as it captures a fundamental truth about modern public health data work.
Yes, an R script might initially produce a report that looks simpler than a handcrafted Excel dashboard. But the time saved and errors avoided through automation far outweigh decorative flair. And with the right tools (Markdown, Quarto, styling packages), you can eventually make R outputs just as polished as you want. The bottom line: by learning to code, you’re building a capability that enables you to do better work, faster.
This automation-first mindset reflects a maturing perspective on public health analytics. The field is increasingly recognizing that reproducible, script-driven workflows provide more sustainable value than manually crafted dashboards, no matter how visually impressive those one-offs might be.
For public health departments, encouraging analysts to develop these skills is strategic. It reduces dependence on fragile spreadsheets and distributes knowledge in a reproducible form (code can be versioned and shared, whereas that Excel file often lives and dies with its creator). Whether you’re an early-career professional trying to move beyond Excel, a mid-level epidemiologist aware of R/Python but unsure how to begin, or a health department leader considering staff training – investing in data programming skills is a smart move.
So, how can you start building programming capacity in R? Here are some recommended resources that I have found invaluable. These cover the foundational skills you’ll need for 80% of public health data tasks. I’ve also included tools and packages that I use in most analyses. (Many of these are free or open-source, aligning with the “do more with less” budget mindset.)
An excellent introduction to data science with R. The first five courses cover setting up R and RStudio, R programming basics, getting and cleaning data, exploratory data analysis, and reproducible research. It walks you through the entire data pipeline from raw data to published results, emphasizing good practices like tidy data and documentation. (This is the path I took – it builds a solid foundation for public health data work.)
A four-course specialization focused on biostatistics in a public health context. It starts from basic statistics and goes through linear regression, logistic regression, and survival analysis, all taught in R. You’ll learn to apply statistical thinking to real public health datasets (e.g. risk factors for diabetes, outcomes after hospitalization). This is great if you want to learn R and epidemiological stats side by side. No prior R or stats knowledge is assumed, making it beginner-friendly.
A free open-access reference manual for applied epidemiology in R. It’s essentially a massive cookbook of code examples for common tasks in outbreak analysis, surveillance, data cleaning, visualization, and more. Written by and for field epidemiologists, it has task-centered chapters (e.g. how to clean linelists, analyze survey data, plot epi curves). The handbook has been used over 3 million times by 850,000 people worldwide – a testament to its usefulness. Keep this bookmarked; when you’re wondering “How do I do X in R?”, you can likely find a worked example here.
An online forum where R users (from novices to experts) help each other. This is one of the friendliest communities for getting help with R problems. If you run into a roadblock, you can search the archives or ask a question. Chances are someone has solved a similar issue or will guide you. (I frequented this forum often before the days of AI coding assistants, and it’s still a great place for human-to-human advice.)
The tidyverse is a collection of R packages designed for data science that share a common philosophy and data structures. For everyday work, two core packages stand out: dplyr and ggplot2. dplyr is a grammar of data manipulation – it provides simple verbs like filter, select, mutate to wrangle data frames efficiently. ggplot2 is a powerful system for creating graphics based on the Grammar of Graphics, allowing you to make publication-quality charts by mapping data to visual elements. Mastering these two will cover the majority of data cleaning, transformation, and visualization tasks in public health analysis. (Other handy tidyverse packages include readr for reading data, tidyr for reshaping data, etc.)
Quarto is a modern, open-source publishing system for data analysis – essentially the next generation of R Markdown. It lets you combine text, code, and results in one reproducible document. You can generate HTML reports, PDF papers, presentation slides, even interactive dashboards from the same script. This means your analyses and reports are transparent and repeatable. Quarto is invaluable for creating automated reports (weekly surveillance updates, grant report outputs, etc.) that look professional and can be regenerated with new data with a single click.
Many of us still need to interface with Excel in our jobs – for example, to deliver results to colleagues or import legacy data. The openxlsx package allows R to read, write, and edit Excel .xlsx files directly, without needing Excel or any Java dependencies. It simplifies creating formatted spreadsheets from R. For instance, you can script the export of cleaned data or summary tables to an Excel file, rather than copying and pasting. This helps integrate your R workflow with existing Excel-based processes during the transition period.
An extremely useful package for creating publication-ready summary tables (common in epidemiology and clinical research). With gtsummary, you can turn a regression model or a data frame into a neat table of results or descriptive statistics with minimal code. It comes with sensible defaults (for example, it knows how to format continuous vs categorical variables, p-values, etc.) and is highly customizable. If you need to make a “Table 1” of patient demographics or output regression results for a report, gtsummary can do it in one line and output to Word, PDF, or HTML. No more manually assembling tables in Excel!
A GUI add-in for RStudio that provides a drag-and-drop interface to create ggplot2 charts. This is fantastic for beginners who want to explore their data visually without writing code from scratch. You can select your dataset, drag variables to the X or Y axis, choose a geometry (bar, line, boxplot, etc.), and tweak aesthetics – and esquisse will show you the plot and generate the corresponding ggplot2 code. It’s built on Shiny, so it’s interactive. Using esquisse, you can quickly prototype graphs and copy the code into your script for further refinement. It’s like having a Tableau-like tool within R, which eases the learning curve for visualization.
In 2025, we have an ace up our sleeve that I wished I had when I started learning R: AI assistants (e.g. Claude, ChatGPT, GitHub Copilot). Tools like ChatGPT can generate R code snippets, help troubleshoot error messages, or suggest how to approach a task in code. They’re particularly helpful for automating the tedious parts of coding or overcoming a problem that’s on the tip of your tongue. For example, you can ask “How do I merge two data frames by county name in R?” and get a quick answer. These AI tools can accelerate your scripting once you know the basics (and they’re great for folks who “don’t code much”). Caution: AI is not foolproof – you should still understand and verify the code it gives you – but it can be like a coach/tutor available 24/7. Leverage it to augment your learning and save time on routine coding tasks.
(Above are just a few of the resources I recommend. There are many others – for instance, other Coursera courses, YouTube tutorials, books like “R for Data Science”, etc. – but the list above covers the tools I use 80% of the time.)
Learning R (or any programming) as a public health professional is a journey. Here are some practical tips to make the transition smoother:
Start small and build gradually: Don’t try to rewrite your entire workflow all at once. Pick one repetitive task and automate it with R. For example, import a CSV and do a simple cleaning or analysis that you’d normally do in Excel. Run it side by side with your usual process to verify results. Once that works, tackle another task. Each small victory will build your confidence and capability.
Combine R with your existing tools initially: It’s okay if you still use Excel or other tools while learning R. You might generate an intermediate CSV with R and then use Excel for final formatting, for instance. Over time, as you get comfortable, you can shift more of the process into R. I found that as I scripted more, Excel naturally took a back seat. But during the learning phase, a hybrid approach is perfectly fine.
Focus on automation, not aesthetics (at first): As noted earlier, an automated report that runs in seconds is worth more than a pretty spreadsheet that consumes hours. When coding, prioritize getting the correct results and making the process reproducible. You can always improve the formatting later. R Markdown/Quarto with a bit of styling can produce very nice-looking outputs once the logic is nailed down. Remember, the primary goal is to save time and reduce errors; polish can come next.
Use version control for your scripts: If you have IT support or even just a GitHub account, consider using Git for your R scripts. It helps you track changes and revert if something breaks. This is a good practice as you start developing more code. It also facilitates collaboration if you’re working with others (or handing off scripts to colleagues in the future).
Leverage the community and ask for help: Don’t struggle alone for days on a coding problem. The R community is incredibly supportive. Sites like the Posit Community forum or Stack Overflow can be lifesavers. If you describe your problem and show what you’ve tried, people often respond with guidance or even example code. This not only fixes your issue but teaches you something new for next time.
Take advantage of AI assistants (wisely): As mentioned, tools like ChatGPT can boost your productivity. If you’re stuck on syntax or need to write a loop/apply statement and can’t remember how, try asking the AI. It can also explain error messages or suggest packages for a task. Use these assistants to supplement your learning – but always double-check their output. Think of AI as an eager intern: it works fast, but you are the expert who needs to review and ensure quality.
Practice reproducible research principles: As you code, adopt habits that make your work reproducible. This includes documenting your code (comments, meaningful object names), organizing your files, and setting seeds for randomness when needed. It might feel tedious at first, but it pays off. Reproducibility isn’t just a buzzword – in public health, it means if someone else (or you in 6 months) needs to update the analysis, they can do so confidently. The Coursera courses on Reproducible Research and using tools like R Markdown will instill these practices.
Celebrate and apply your new skills: Finally, apply your R skills to real work projects as soon as you can. There’s no better way to solidify knowledge than by using it. Each time you deliver a result faster or catch an issue because of a script, acknowledge that win. It will motivate you to keep going. Over time, these wins accumulate – maybe you prevented an error in an official report, or you were able to analyze a new dataset that was previously too cumbersome to handle. Make sure your team or supervisor knows how you achieved it. This builds the case within your organization that coding skills are worth cultivating.
Transitioning from Excel to R (or any programming language) is a significant change. It takes time, patience, and practice. I spent many late evenings early on, wrestling with code that “just wouldn’t work.” But I promise you, it gets easier. Each new function you learn, each bug you debug, makes the next one easier. And the payoff is huge: you’ll be able to do things that were practically impossible in Excel, whether it’s handling larger data, automating complex analyses, or integrating data from multiple sources seamlessly.
For early-career public health professionals, learning R can set you apart and accelerate your impact. You’ll be the one who can quickly turn raw data into insights without manual drudgery. For mid-level analysts or epidemiologists who have been meaning to “get into R or Python,” it’s not too late – start with the basics and you’ll find that your domain expertise gives you an edge in applying coding to real problems. And for health department leaders, consider creating space for your staff to upskill (through trainings, dedicated project time for learning, or hiring individuals with these skills). In lean budget times, having internal capacity to automate and innovate can literally save salaries worth of productivity, and prevent costly mistakes.
We may indeed be witnessing “the tail end of the era of brittle Excel dashboards” in public health analytics. The future points toward code, reproducibility, and open collaboration – a future where our analyses are more transparent and robust. Embracing R is one practical and achievable step toward that future. It has certainly empowered me to do more with the data I have, in less time, and with greater confidence in the results.
Your turn: Have you started using R or Python in place of Excel? What challenges or successes have you experienced in the transition? I’d love to hear your stories or tips. Feel free to share in the comments or connect with me on LinkedIn to continue the conversation. Learning from peers is a big part of how we grow. Together, by incrementally building our skills, we can modernize public health analytics – even on a shoestring – and deliver insights that ultimately save lives and improve community health. Good luck on your coding journey!
– André van Zyl, MPH, Founder & Principal Consultant at Intersect Collaborations LLC (former CDC Health Scientist and passionate advocate for data-driven public health in low-resource settings)