Computational Tools for Social Scientists Workshop

Lars Vilhuber, Ivan Rudik, and some others
2021/8/18-2021/8/20

Location: In-person, Ives 109 (Aug 18, 19) and Ives 115 (Aug 20)

Time: 9:00 a.m. - 4:00 p.m. (some days will end earlier)

Registration | Goals | Requirements

Goal

The goal of this workshop is to make early-career researchers in the social sciences (e.g. Ph.D. students and early faculty) aware of computational tools and toolkits that will allow them to more efficiently and reproducibly conduct their research. We will showcase computer-oriented techniques and tools, from basic command line tools on Windows, Linux, and Mac, to version control, to optimization and parallelization techniques. These methods are useful not just for high-performance computing, but by necessity make research reproducible. If you touch a computer as part of your research, you should come!

More goals and non-goals

This workshop is designed to open your eyes to the possibilities, scratching the surface, but mostly not diving into any particular depths. The goal is NOT to teach a full course on SAS, Stata, Matlab, R, Python, MPI, Fortran, etc. - there are other classes for that. We will teach just enough of each programming language to be able to highlight additional techniques. There will be hands-on training on a few systems (TBD). more

You and the workshop

We highlight that this is a workshop - we will work on problems as a group, drawing on expertise in the “room” as needed. If you have a specific question, and want to work on it, we may do so. If you want to primarily listen, that's fine too.

Some parts of the workshop will be asynchronous (pre-recorded), but a significant portion will be live. You should expect to do some exercises each day.

Target group

Second year Ph.D. and higher, and faculty, in Economics or other social sciences. If you haven't taken the course in the past, or want a refresher, you should participate

Requirements

  • Working knowledge of at least one statistical programming language (R, SAS, Stata, Matlab, Gauss) - the specific languange is not important.
  • An interest in computational social science

Setup

  • Register
  • (optional) Request an account on Econ Cluster on the BioHPC account request page (add to Comment “Econ Grad Student”)
  • Various software will be installed as part of the class, as part of a standard toolkit.

Tentative Agenda - Prior to Day 1

Tentative Agenda - Day 1 - morning

  • 9:00 - 10:00 Intro and Motivation: why we do this, why we need this (it's not just high-performance computing)
  • 10:00 - 10:30 Text editors and software-agnostic development environments (+ hands-on)
    • Visual Studio Code
    • Rstudio
    • Jupyter
  • BREAK

Tentative Agenda - Day 1 - morning

  • 11:00 - 12:00 Command line or “shell” or “bash” (+ hands-on)
    • Framework: Carpentries' “The Unix Shell”
    • We'll breeze through Parts 1-3
    • We'll emphasize pipes (used in other programming languages, too), loops (the most basic of high-performance computing!), and scripts (make it all reproducible! + we'll need it on Day 2).
    • You should explore “Finding things” on your own (very useful!)
  • LUNCH (on your own)

Tentative Agenda - Day 1 - afternoon

Homework - Day 1

  • Get comfortable with version control
    • create a file, version a file, delete a file, recover a previous file, branch, merge a branch
  • try out the Jupyter report yourself
  • More info on dynamic documents:
    • RMarkdown
    • Jupyter Notebook (or Python) with Stata

Tentative Agenda - Day 2 (morning)

  • 9:00 - 9:30 Automation and reproducibility go hand-in-hand
    • Setup: A simple reproducible report in R
    • Scripting the reproducible report
  • 9:30 - 12:00 Docker/ containers (with breaks)
    • What is Docker
    • How do you create and use containers?
    • Full automation

Tentative Agenda - Day 2 (afternoon)

Tentative Agenda - Day 3

(partially joint with LDI Replicator training)

Awesome other resources

There are full 13-week graduate-level courses. The information we'll have touched on here can serve as a starting point for many of those dedicated courses, or you may be able to peruse and learn on your own. Let me just call out a few by economists (there are many, and apologies if I missed some):

Past contributors

John Abowd, Rick Mansfield, Daniel Lin, Hautahi Kingi, Flavio Stanchi, Jean-Francois Houde, Sylverie Herbert, Sida Peng, Kevin L. McKinney

Source