Preparing to Run Code (Part B)

Preparing to Run Code (Part B)#

Signal that you are working on Part B#

To signal that you are starting to run the code, you will now transition the JIRA subtask to “In Progress”.

Move to Part B to In Progress

If you are the replicator who was previously assigned Part A

In this case, your working area is already prepared, and you can skip to Get the data.

Prepare your working area#

Before we can verify code, data, and documentation, we need to first get the code, then the data into your “working area”. Let’s start with the code.

Note

You now need to decide on what computer you are going to do the data analysis - that should be the place you do the next few steps. See Access Computer for details. This is because the git setup we use does not allow you to include the data files in the Bitbucket repository, so when you download the replication package from openICPSR or elsewhere, they do not get added to the Bitbucket repository.

Verify that the code is present#

With the first step, you obtained a copy of the Bitbucket repository. You should now have a folder called aearep-123 (or whatever the repository number is). All subsequent steps should be done from there.

Important

If you do not see a folder like 123456 or dropbox-xyz in the repository, then the Bitbucket Pipeline likely did not work. You will then need to populate the code (and data) manually. You will do this AFTER downloading the data, in the next step.

Verify that you can actually run the code#

Before you spend time obtaining the data, now is a good time to assess (again) if you have everything needed to run the code.

Have a look at the README again.

Is there information about Requirements?
- Is there information about the software?
- How long does the author say the code will run? Is it a reasonable time, or do we maybe need to run it on a more powerful computer?
- How much memory, or processors, does the code need? Again, is the computer you intended to choose sufficient, or do we need to get access to a more powerful computer, or even a cluster of computers?
- What operating system (Mac, Windows, Linux) does the author appear to have used?

Now fill out the Stated Requirements section of Part B of the report.

Check if the deposit has a ‘main’ or ‘master’ file and fill out the ‘MainFile’ field in Jira under the ‘Repl. info’ tab.
Mark the operating system (OS) that the authors used in the field Original OS. Leave blank if you do not know.

Be sure to use the REPLICATION-PartB.md for this section!

As part of the automated processing, the REPLICATION.md is split into two parts, REPLICATION-PartA.md and REPLICATION-PartB.md. Somebody else may be working on Part A at the same time as you are working on Part B. Please be sure to use the correct file for your work.

Prepare the code-check#

Now is a good time to understand the code in a bit more detail:

In the template, you will find code-check.xlsx.
- Use this to create a list of all Tables and Figures in the paper
- You will use this to guide later to tabulate your findings!
Fill out the “Code Description” section of the REPLICATION-PartB.md
- Provide some information about the program files (are there 3 Stata files? Are there 5 Matlab programs?). You will use this information to fill out the Software Used (in the main task) later as well, but provide details here.
  - You can use the file “generated/programs-list.txt to help you here.
- Did you have difficulty aligning the README with the files? Does the sequence suggested by the programs differ from what’s written in the README?
- Are there files in the archive not explained in the README?
- Copy-and-paste the code-check.xlsx into the code description part, listing the programs. Omit the “Reproduced?” Column in doing so. Use the Excel-to-Markdown plugin for VSCode.
  - This table will be pasted in under “Findings” again, with “Reproduced?” column, once code has been run.

Verify that you can actually run the code#

Do you think you know how to run the code in the software mentioned?

You may not have the right experience, talk to your supervisor!

If both of you agree that nobody will run the code, then

Move the subtask to “Part B is complete” via No code was run
Go straight to Part C, writing the Findings section

Describe the provided data#

The automated scripts should have filled out the “All data files provided” section, but if not, please do so here.

If the list is VERY long, put it into an appendix, but make a note in this section that there is an appendix with this info.

What if there are no data provided at all?

If there are no data at all, for instance, when data are confidential and only available through some computer system at the Census Bureau or in Sweden?

Then you will skip getting the data and running code, and you are done with Part B!

Move the subtask to “Part B is complete” via No code was run
Go straight to Part C, writing the Findings section

If there is ANY data at all present, then continue to get the data.

Get the Data#

If you think that you are ready to run the code, you need to get the data. When getting the data, please take care to distinguish

data that is part of the openICPSR deposit
data that the README tells you to download or otherwise access
data that you are provided on the L-Drive, which is typically provided under an agreement with the authors, and cannot be redistributed.

Here, we will describe the most likely first step: getting the data from openICPSR. Any data you download should also be stored on this computer. We do not explicitly describe this here. CCSS is the most likely place where you do this, but double-check with your supervisor.

What if the data are not on openICPSR?

Sometimes, data are provided on other repositories: OSF, Dataverse, Zenodo are the most frequent ones.

You may need to adapt the process below to those circumstances. See the following mapping:

Deposit	Name of download folder	Name of script	Tested?
openICPSR	`111234`	`tools/download_openicpsr-private.py`	Yes
OSF	`osf-ZX123A`	`tools/download_osf.sh`	Partially
Zenodo	`zenodo-1234567`	`tools/download_zenodo.sh`	Maybe
Dataverse	`dv-2YWLWG`	`tools/download_dv.py`	Maybe

In all cases, you should obtain the data from the deposit, and otherwise follow the same principles as are described below for openICPSR.

Did the Bitbucket Pipeline scripts work?#

In cases where the package downloaded from openICPSR is too big, or where the data do not come from openICPSR, the automated Bitbucket Pipeline scripts will not work. In this case, you will need to run some additional steps to run the automated scripts. This is best done on BioHPC or CS.

The Pipeline worked!

You don’t have to do anything additional, you can move on.

Running on CCSS

Running the automated steps on CCSS has not been fully tested yet.

Running on BioHPC

Access BioHPC, see Access Computer for details.
Change directory to the place where you downloaded the repository and the data bash cd /home2/ecco_lv39/Workspace/aearep-123

Then run the ingest scripts:

export PATH=$PATH:/usr/local/stata16/
./tools/pipeline-steps1-4.sh 123456
git push

Running on Github Codespaces (CS)

Change directory to the place where you downloaded the repository and the data, typically bash cd /workspaces/aearep-123

Then run the ingest scripts:

./tools/pipeline-steps1-4.sh 123456
git push

Ready!#

You are now ready to run code.

Preparing to Run Code (Part B)

Contents

Preparing to Run Code (Part B)#

Signal that you are working on Part B#

Prepare your working area#

Get the code via the existing Bitbucket repository#

Verify that the code is present#

Verify that you can actually run the code#

Prepare the code-check#

Verify that you can actually run the code#

Describe the provided data#

Get the Data#

Did the Bitbucket Pipeline scripts work?#

Ready!#