Sessions

Authors

Lars Vilhuber

Laurel Krovetz

Published

September 1, 2025

Introductions and motivation

We introduce the various modules, talk about best practices, describe what we mean by replication package, and illustrate practically using a toy example.

Lars Vilhuber

Useful tools: Why you should learn (and love) the command line

Basics

Bash
Bash scripting
Git versioning (or any versioning)
See quick walkthrough

Day 1 reproducibility

We discuss how to set yourself up for reproducibility from the very first day of your project. Anything not covered on August 26 will be covered at the beginning of the session on September 8.

Lars Vilhuber

Day T-1 reproducibility

How do you check your project for reproducibility before you submit it? We discuss various strategies.

Lars Vilhuber

Working with confidential data, when to share data and when no to

What if you have data you are not allowed to publish? How do you work with it, and how do you share the most of your work? We discuss general and specific methods and strategies.

Lars Vilhuber

Documenting it all with a README

Once you’ve covered all the bases, how do you document it all? We discuss how to use the template README to document your project. Much of this will recall elements from previous sessions.

Lars Vilhuber

How to run code reproducibly

This is a recap of the Day N-1 reproducibility module, but with a focus on how to run code reproducibly. We discuss how to create log files, use environments, and the importance of data cleaning.

Lars Vilhuber

API and AI

AI is infusing a lot of what we do. This will discuss how you can produce research that uses AI and APIs, while still being reproducible.

Lars Vilhuber

The data lifecycle: preserving raw data

If you create data, or have data that needs to be preserved, what can you do? We start by discussing the distinction between sharing and preserving data, and then look at options that you can embed within your research workflow.

Lars Vilhuber

Writing articles that combine text and code

To more efficiently combine processing of data and the writing up of results, a few methods exist. Word is not one of them. We discuss the pros and cons of various methods.

Lars Vilhuber

High-performance computing and why you should care about it

You run your code on your laptop, and it crashes. Use high-performance computing! We disuss when to use it, what the benefits and costs are, and where to use it.

Lars Vilhuber

Reproducibility, transparency, and data ethics: How and when to share data, and when not to

How do you create transparency and credibility when you cannot share the data you use? We discuss strategies both for data collection, data use, and data sharing. This is relevant if you collect your own data, or if you use confidential data.

Lars Vilhuber