HowTo Post-Process the CG2 Output

The post-processing backs out the constant (a=y-xb) and imposes the identification rules. The person and firm effects are only identified within connected groups. The first step involves identifying the groups present in your data. The group information is combined with the CG2 parameter estimates and a final SAS dataset is produced.

The Groups Program

The grouping program identifies the set(s) of persons and firms that are connected to each other. Connectedness is most easily defined through an example. Pick any firm in the data and identify all the workers ever employed at the firm. Then identify all of the firms each employee ever worked at. For the new expanded set of firms identify all of the workers ever employed at those firms. Repeat the algorthm until no more firms/workers can be added. Labor market data typically has 95+ percent of the workers and firms in the first group with the rest of the workers in many small groups.

The groups program is crucial since the persons and firm effects are only identified within a group (unless you are willing to make some assumptions). This information must be available before any identification rule can be implemented.

Go to the 02_runcg_out/groups directory.
Run the firmcells.sas program, creating the firmcells file. The firmcells file is the same as cellsout, but it is sorted by firm ID, person ID.
Open the rungroups.ksh file with a text editor. At the bottom of the file make sure that the location of the groups binary is correct (use an explicit path. Some versions of Unix have a system groups program). Run rungroups.ksh
Examine groups.log for any errors
Run groupstats.sas if you are interested in the number and size of the groups (the synthetic data should only have two groups).

Identification

The final stage involves calculating the constant, bringing in the parameter estimates, imposing the identification rule, and decomposing earnings into various components (constant, xb, experience, person, firm, h=person + exper).

Go to the 03_cgpost directory
Open the config_param.sas file with a text editor. Set the depvar, persid, and firmid macro variables. Set the betadir macro variable to the location where you ran cg2. Skip down a few lines and set the rhs macro variable appropriately.
Run the 00_setup.sas program. Make sure you run it twice the first time or the program will not finish properly. The program automatically creates the cg.coef file used by other programs and sets up links to cg.betas, cg.in, and cg.means.
Run the 01_rhs.ksh shell script with the first argument cg. The script generates a SAS program that creates a SAS dataset (rhs.sas7bdat) containing the covariates (betas) from the CG2 run.
Run the 02_means_2v3.ksh script with the argument cg. The script generates a SAS program that creates a SAS dataset (means02.sas7bdat) containing the means of the dependent and right hand side variables.
Run the 03_constant.sas program. Creates the constant using the property that a regression goes through the means.
Run the 04_fe_read.sas program. Reads in the groups, person effects, and the firm effects and creates SAS datasets for each of them in the same location where CG2 was run (groups.sas7bdat, theta.sas7bdat, psi.sas7bdat).
Run the 05_xb.sas program. Calculates the Xb and experience index for each observation (stored in xb.sas7bdat). Depending on the specification of experience in your model you may need to modify this program.
Run the 06_join_all.sas program. Brings all of the components (groups, person, firm, xb, exper) together into one file (hcest1.sas7bdat).
Run the 07_identify.sas program. The first step is identifying the model. The person effects are set to mean zero within each group. In contrast, the firm effects are assumed mean zero within each group, and the extra degree of freedom is used to estimate an additional firm effect. The firm effects are set to mean zero for the entire sample only. Everything is almost ready except for groups there are usually some groups where we cannot separately identify the person and firm effect (only one person and one firm). For these groups I randomly draw a person and firm effect from a distribution similar to the overall distributions. Our measure of human capital (h) and the residuals are calculated.
YOU ARE FINISHED!!!

Return to the HowTo or Main page.