Computational Tools for Social Scientists Workshop

Lars Vilhuber and some others
2020/8/24-2020/8/27

Location: Online Meeting ID: Zoom meeting

Time: 2:00 - 6:00 p.m. (some days will end earlier)

Registration | Goals | Requirements

Goal

The goal of this workshop is to showcase computer-oriented techniques and tools for social science students, from basic command line tools on Linux and Mac to version control to optimization and parallelization techniques for high-performance computing, with reproducible methods training thrown in for good measure. The goal is NOT to teach a full course on SAS, Stata, Matlab, R, Python, MPI, Fortran, etc. - there are other classes for that. We will teach just enough of each programming language to be able to highlight additional techniques. There will be hands-on training on a few systems (TBD). This workshop is designed to open your eyes to the possibilities, scratching the surface, but mostly not diving into any particular depths. Follow-on short courses may solve those needs. For specific programming languages, we point to offerings elsewhere on campus, for instance at CISER. more

We highlight that this is a workshop - we will work on problems as a group, drawing on expertise in the “room” as needed. If you have a specific question, and want to work on it, we may do so. If you want to primarily listen, that's fine too.

Some parts of the workshop will be asynchronous (pre-recorded), but a significant portion will be live. You should expect to do some exercises each day, but none are required.

Target group

Second year Ph.D. and higher, and faculty, in Economics or other social sciences. If you haven't taken the course in the past, or want a refresher, you should participate

Requirements

  • Working knowledge of at least one statistical programming language (R, SAS, Stata, Matlab, Gauss) - the specific languange is not important.
  • An interest in computational social science

Setup

  • Register
  • (optional) Request an account on Econ Cluster on the BioHPC account request page (add to Comment “Econ Grad Student”)
  • Various software will be installed as part of the class, as part of a standard toolkit.

Tentative Agenda - Prior to Day 1

Agenda - Day 1

(videos are accessible only to logged-in students and faculty of Cornell University - open soon)

Homework - Day 1

  • Read an economics article (> 1 year old) and attempt to fill in the missing data citations
  • try out the Markdown report yourself
  • try out the Jupyter report yourself
  • try out the Stata reproducible report (not covered in class)

Agenda - Day 2

Homework - Day 2

  • Get comfortable with version control
    • create a file, version a file, delete a file, recover a previous file, branch, merge a branch

Agenda - Day 3

Agenda - Day 4 (optional)

  • 1:00 - 2:00 PM Coffee hour: What does the AEA Data and Code Availability policy imply for an economist's research? (Lars) (different video link)
  • 2:00-4:50 Optional themes:
    • VM machines, containers: why?
    • Virtual machine software for your own computer: VirtualBox
    • Virtual machines at Cornell: RedCloud but may also be available through CISER.
    • Container software: Docker
    • Automating processing in the cloud (Docker, etc.)
    • Example of an automated data cleaning step using Docker and Github actions: Covid-19 expectations data
    • Setting up an Amazon cluster (basics of cluster computing) (skipped)

Past contributors

John Abowd, Rick Mansfield, Daniel Lin, Hautahi Kingi, Flavio Stanchi, Jean-Francois Houde, Sylverie Herbert, Sida Peng, Kevin L. McKinney

Source