2023-09-26

The Plan

9:00

  • Konnichi wa こんにちは
  • Overview of reproducibility and replicability in economics, including replication packages (presentation)

10:00-10:50 … Let’s make a replication package (hands-on using a simple example, from directory structure to (almost) deposit at a trusted repository) 0

11:00-11:50 … Principles and structure of replication packages (including README files) 1 2 For confidential data: 1b

11:50-13:00 … Lunch time

13:00-13:50 … Foundations of reproducible programming practices (in any programming language) 3 4

14:00-14:50 … Continuation and open discussion about specific examples (if you have one, bring your own project!) 5

15:00-15:50 … Discussions on institutional support for replicability in Japanese context (including Q&A) 15:50-16:00 … Closing remarks

How to run things in the cloud

What IS the cloud??

What is the cloud?

Cloud computing is the on-demand delivery of IT resources over the Internet with pay-as-you-go pricing. Instead of buying, owning, and maintaining physical data centers and servers, you can access technology services, such as computing power, storage, and databases, on an as-needed basis from a cloud provider.

Cloud providers

Big ones:

  • Amazon Web Services (AWS) - about 30% market share
  • Azure (Microsoft)
  • Google Compute Platform (GCP)

Many smaller ones.

How big?

Nobody (except - maybe - the companies themselves) knows. A 2014 estimate for AWS was 1.4 million servers, and it and its competitors have only grown since then.

Cloud and you

Is it easy to use the cloud? Absolutely not (unless you are Certified AWS/Azure/etc. Professional (TM))

Exceptions: Some academic clouds (funded in the US by NSF, Canada: Compute Canada)

So how does it help me?

Many secondary services have sprung up to enable easy use. Here are just a few for the academic sector (many more in the private sector)

  • Codeocean.com (CO)
  • Google Colab
  • Wholetale.org (WT)
  • Github Codespace or Actions

Often a datascience-centric interface, but CO and WT enable native Stata functionality!

Python in the cloud

R in the cloud

Stata in the cloud

A tool for cloud-centric computing

Many cloud-centric tools are based on Docker containers.

We won’t go into those today, but we will leverage one.

Demo: CodeOcean

Demo: WholeTale

Demo: Github Codespaces with R and Stata

Let’s give it a try