G Stata-related procedures
In this section, we will show you a few things related specifically to running code reproducibly with Stata. For more general debugging tips for Stata and other computer languages, see our wiki.
G.1 Using config.do in STATA
In “Verification” stage, we ask you to keep a log of what you do. Moreover, authors often use packages that are not default programs of STATA. We provide template-config.do
in the template repository you clone which addresses these problems.
G.1.1 Why do we need log files?
- Log files record each step of the analysis and its results as a text. It also records error messages if you encounter any error upon running the code.
- There are other purposes to have log files, but for us, it is to communicate with other team members.
- When a replicator submits the report, a preapprover (and an approver) needs to verify how the code ran. It is to ensure that any discrepancies we find is not due to mistakes on our end.
- Log file is crucial for this verification. Otherwise, preapprovers and approvers have to run the code again to verify which is not an ideal use of time, nor an efficient way to process the case.
G.1.2 Why do we have to install programs?
- STATA, or any statistical software, does not provide all the package that enables or facilitates the analysis. Therefore, many user-written programs or extensions are publicly available for downloads.
- We differ in installation process from many others in the sense that, we want to install programs in a specified directory that is NOT a system directory.
- This is to ensure that the package is complete. A complete replication package should be stand-alone, regardless of packages installed in the machine that program is run.
G.1.3 Explaining template-config.do
G.1.3.1 Directory paths for log files.
config.do
creates a subdirectory and saves log files in the subdirectory. Area 1 sets these directory paths. Let’s say the current working directory path is the following, since Jira issue number is AEAREP-9999
and openICPSR case number is 111111
U:/Workspace/aearep-9999/111111
- line 50,
global rootdir : pwd
sets the current working directory as a root directory, a.k.a.rootdir
. - line 59,
global logdir "${rootdir}/logs"
sets the following direcotry as a directory for log files: U:/Workspace/aearep-9999/111111/logs - Notice that there is no such directory exists. Therefore, the do file creates one new directory in line 60.
mkdir
is a command to create a directory
G.1.3.2 Opening a log file with current date and time
Since we usually run the program several times until we complete the replication, we would like to record all the instances. Therefore, we record the initial time we start running the code and use it in the name of the log file. Area 2 calls current date and time as local macro and open the log file.
- line 64-67: calls the current date and time as local macro
- line 69: start the log file, with an internal name
ldi
which prevents collision with any log files opened by authors.
G.1.3.3 System information
We require system information as part of the replication package. This is because some commands are sensitive to the OS, STATA version, machine type, etc. Area 3 calls in that information from the system and displays in the log file.
G.1.3.4 Package installation
As explained above, we often need to install packages. Even when the packages were installed in other cases before, it should be irrelevant to your current case, since we install those packages within our deposit directory so that we can verify the completeness of the replication packages. Area 4 does this job.
- The sysdir commands (in line 89-91) redirects Stata to search for, and install ado files in the directories referenced. It won’t automatically install them.
- In case where the authors provided the ado files, adding a new command to the end of the config.do would suffice. For instance, if the authors have provided ado files in the directory
packages
, then
adopath ++ "${rootdir}/packages"
- In case where the authors provided the ado files, adding a new command to the end of the config.do would suffice. For instance, if the authors have provided ado files in the directory
- Add list of packages in the quotation marks in line 37
- line 39 provides an example.
- line 97-106 installs each package if there are packages listed and these packages do not already exist.
- In some cases, the installation would fail since you have to use “
net install..
” instead of “ssc install
”. In this case, write suchnet install
commands after line 112, an example is given in line 111.
G.1.4 How to use config.do
G.1.4.1 Rename the config file.
The given name should be template-config.do
. In order to use it, rename it to config.do
and move it into the openICPSR folder (e.g. , 111111
).
G.1.4.2 Include config.do
Add the following line at the beginning of each code file:
include config.do
and add the following at the end of each code file:log close _all
If there is a master do file, you should put the above lines at the beginning and the end of the
master.do
once, and NOT include it in the individual code files.There will be cases where authors create their own log files. Do NOT comment out the log file creation here, as the named logfile will not conflict with any author-generated files.
G.2 Running Code in Stata
Although, there are plenty of ways to run code in Stata, our goal with these instructions is to show the easiest way to do it, by minimizing both the manual steps replicators have to go through and the chance of making a mistake that prevents a successful run.
In essence, these instructions show how to deal with the three most common actions that replicators have to undertake when running Stata code:
- Making sure that paths (i.e., something like "Mycomputer/Documents/Workspace/) in the .do files (Stata scripts) reflect the appropriate location of code, data, and output in the computer where the code is run.
- Installing user-written functions, programs, or packages that are necessary to do computations and produce tables/figures.
- Creating .log files (files that record, in this case, Stata output) of the replication attempts.
G.2.1 Step 1: check for a “master” .do file
[ACTION] Check the README or the repository and determine if a master .do file was provided.
A master .do file is a Stata script that will call, in the correct sequence, all the programs necessary to construct analysis datasets, do all computations, and produce figures and tables. If a master do file exists, it should be mentioned in the README. In most cases, running a single master do file is sufficient to complete the reproduction. In general, a master script does not need to be a .do file. However, we will focus on cases where all work done in Stata is reduced to executing a single .do file.
G.2.1.1 When a master .do file is provided
If there is a master do file, continue with Step 2.
G.2.1.2 When a master .do file is not provided
If a master .do file is not provided, you should create a one.
To create a do file follow the following steps:
Check that the README for specific instructions about the order in which each program is supposed to be run. If there are no such instructions, or they are not obvious by the name of the programs, is probably best to not create a master do file.
Assuming that the sequence of programs is clear to you, open stata and click on the “New do file editor” (you can also work on Visual Studio Code):
To open the do file editor:
In the first line write
include "config.do"
Write the command
do
and the path of each program that needs to be run. Write them in the correct sequence.
Example:
include "config.do"
* Assuming scenario "A"
do "${rootdir}/code/0_first_program.do"
do "${rootdir}/code/1_second_program.do"
do "${rootdir}/code/2_third_program.do"
do "${rootdir}/code/appendix_code/appendix.do"
- Save your master file.
At the end your master .do file may look like this:
With your master do file done, continue with Step 2.
G.2.2 Step 2: place config.do where the master .do file is located
[ACTION] Copy the file
template-config.do
and paste it into the folder where the master file is located. Change the name fromtemplate-config.do
toconfig.do
The folder with the code, whether is the root directory or a subfolder, should look something like this:
G.2.3 Step 3: include config.do in the master .do file
[ACTION] Open the master .do file. In the beginning, add the line:
1 include "config.do"
2
3 /* This is Master do file */
Save.
More information about config.do
can be found in Appendix F of the training materials.
In summary, config.do
does 4 things:
- Creates a global variable called “rootdir” with the local path to the root directory.
- Creates a logs files.
- Sets a path to save the packages to be installed in the replication repository, and
- It allows you to install the packages simply by listing their names.
A crucial function of config.do
is that it allows for the local installation of Stata packages, which is important for two reasons. First, it will enable us to check for the completeness of replication materials. Second, when running code in servers, we often do not have the necessary permissions to install Stata packages freely.config.do
allow us to installed packages in the replication directory.
G.2.4 Step 4: modifying paths if necessary
[ACTION]
- Check the Readme and determine if (and where) the root directory should be modified.
- Open the .do file to be modified (probably the master .do file) and set the global variable
$rootdir
as the path.- Save.
To run the code, we need to make sure that Stata can access the locally-saved data, access the packages that will be installed, and save the output in the computer where you are running the code. To do that, we often need to change some directory paths defined in the .do files provided. This step may vary in each replication package, so you need to look at the README instructions closely. Some packages may not require any change, while others may require a little more work.
However, the typical case will only require one modification, either to the master .do file or to a program called by the master .do file, where you define the path of the location of the replication package. This location is what we refer to as the “root directory”. Once this change is made, the code provided (if it follows good practices) will define every other path relative to the root directory.
G.2.4.1 Example
In the author’s master file, a global variable “maindir” defines the path of the root directory as:
/* This is Master do file */
global maindir "C:\Users\Author\Dropbox\Project1" // this is the path to the repository
global data "$maindir/data" // path to data folder
global figures "$maindir/figures" // path to figures folder
You would add config.do
and change the global.
After the change:
include "config.do"
/* This is Master do file */
global maindir "$rootdir" // this is the path to the repository
global data "$maindir/data" // path to data folder
global figures "$maindir/figures" // path to figures folder
G.2.5 Step 5: Check the location of the master .do file and modify config.do
[ACTION]
- If the master .do file is directly placed in the root directory, set the parameter
scenario
to beB
and save.- If the master .do file is inside a folder, open
config.do
and set the parameterscenario
toA
and save. (This is the default, so really no action is necessary.)- If the replication package includes a folder with Stata packages, add the line
adopath ++
followed by the path of the location of that folder and save. See Appendix F for details.- Add packages that need to be installed to config.do. See Appendix F for details.
G.2.5.1 Scenario A
A simplified directory structure that correspond with scenario “A” look like this:
directory/
code/
main.do
01_dosomething.do
data/
data.dta
otherdata.dta
G.2.5.1.1 Example
- A Master .do file is inside a folder and you have placed
config.do
in that same folder. The packageestout
needs to be installed:
* Template config.do */
local scenario "A"
* *** Add required packages from SSC to this list ***
local ssc_packages "estout ivreg2"
// Example:
// local ssc_packages "estout boottest"
// If you need to "net install" packages, go to the very end of this program, and add them there.
G.2.5.2 Scenario B
A simplified directory structure that correspond to scenario “B” looks like this:
directory/
main.do
scripts/
01_dosomething.do
data/
data.dta
otherdata.dta
G.2.5.2.1 Example
- A Master .do file is in the main directory, and you have placed
config.do
in the main directory. The packageestout
andivreg2
need to be installed:
/* Template config.do */
local scenario "B" // around line 30
*** Add required packages from SSC to this list ***
local ssc_packages "estout"
// Example:
// local ssc_packages "estout boottest"
// If you need to "net install" packages, go to the very end of this program, and add them there.
G.2.6 Step 6: Run the Code
G.2.6.1 Windows
[ACTION] Right click on the master .do file and select the option
Execute (do)
.
This option will set the working directory to the location where the master.do
is. It opens Stata and will show the processes in the Stata window.
G.2.6.2 Mac/Linux
On Unix-style systems, the preferred way is to use the command line to run Stata code.
Mac-specific one-time setup: Open Stata on your Mac, go to the “Stata” tab at the top of your screen and click “
Install Terminal Utility…
”
Open up a terminal in the folder where the master.do
file is located - this may differ depending on your system, and may involve using “cd /path/to/code
” commands. Confirm with “ls
” that you see the same files you might see in Finder / File Explorer. Refer to the command line training in the initial training.
Identify which Stata version you have installed (some systems have only one, some have multiple):
which stata
which stata-se
which stata-mp
Each version is increasingly powerful. Choose the most powerful one installed on your system. (We will assume that you have stata-mp
but adjust accordingly)
Then type “stata-mp -b do master.do
”.
G.2.6.3 Checking for a complete run, debugging and running the master in pieces
After running the code, the log files will need to be checked for a complete run. Use Visual Studio Code to open and inspect log files. Any bugs that prevents a complete run will also show up in the log files.
If a you find a bug that is simple enough to fix, you can make changes to the do files. Then, you can right click on the master file and select Execute (do)
as this option will open Stata, allowing to run the code interactively.
If you decide the code needs to be run in pieces. In the master .do file, you can comment out (using the symbol *) the programs that are not to be run and save the master. Then, you can right click on the master and select the option `Execute (do)
.
When debugging is complete, you can uncomment all programs in the master and make a clean run, using again Execute Quietly (run)
.
Consider how much time a complete run would take before you run everything one last time. If it would take too long, you may want to skip a complete run, but ensure that you have log files for all partial runs.
G.3 Using scan_packages.do
In “Writing Preliminary Report” stage, we ask you to check the completeness of the information on system requirements. Often, authors do not list out packages they installed that are not default packages in STATA. The authors should list them in the README (even when they provide ado files!), but it does not always happen. To help you identify these packages, we provide an useful tool for this exercise.
- Locate a directory named “tools/Stata_scan_code/”.
- Change the following command in line 11 with your system information:
global codedir "XXXCODEDIRXXX"
You should locate the directory where the codes are (typically112233
, the openICPSR space number):global codedir "../../112233"
This will be the directory where the output excel file will be saved. - Execute the dofile.
- Locate the file “
candidatepackages.xlsx
”, use the information there, and remember to push the file to the repository.