CG2

Fixed Effects Estimation Software

Experienced users can skip directly to the Downloads/HowTo Section

Overview

CG2 is a package of Fortran 90 and SAS programs that estimate non-nested 2 component fixed effect models.  The programs have been successfully compiled and tested on multiple *nix platforms (IA32 (SuSe and Redhat Linux), Itanium (SuSe Linux), and Sparc (Solaris)), but the code should work fine on other platforms as well (ie. IA32 (Windows)). A port to Stata is maintained by Amine Ouzad and available on this site as well (see the Stata branch in the repository).

The estimation algorithms were developed to solve large-scale fixed person and firm effect wage models.  In the typical scenario, a person's earnings and place of employment are observed over time, with mobility of persons across firms.  This mobility or "connectedness" of persons enables the estimation of the model, although for problems with millions of persons and firms, obtaining results require a substantial amount of computing power.  For users with large problems to solve, the main constraint will be acquiring a computing platform with sufficient  physical memory or RAM.  For example, the largest problem we have solved required 137GB of physical memory and took zzzz hours of CPU time on a Sun Fire 12k server.  However, this does not imply that CG2 is inefficient, since no known software can solve the same size problem using fewer system resources.  To  further reduce single system image memory requirements, a cluster aware version of CG2 is under development.

If you need additional information, please see the References section below.

Acknowledgements

John Abowd, Robert Creecy, Kevin McKinney, and Lars Vilhuber.  Census Bureau and the rest of the LEHD staff.

Getting Started

In order to use the CG2 package, certain basic software must be available on a current *nix platform (The software will likely port to Windows just fine, but we have not devoted time to testing this assertion.  If anyone succeeds in using this software on IA32 Windows, please let us know and we will post the information here).

The first requirement is a suitable Fortran 90/95 compiler.  The software is known to work with Intel Compilers on IA32 and Itanium and with the Sun compilers on Sparc/Solaris.  The Intel compilers are available for Linux and Windows and are free of charge for non-commercial users.  The Sun compilers are available for Linux and Sparc/Solaris, but are NOT available free of charge.  Other Fortran compilers will likely work as well, but are not supported.

The second requirement is the SAS statistical/data management software package from the SAS Institute.  SAS is available for a wide range of hardware/software combinations.  If SAS is not available for your platform, other software packages MIGHT be used to pre and post-process the data, but this is not supported.

A suitable shell environment such as Bash or Ksh, while not strictly required is very desirable.  Either shell should be available on virtually all *nix platforms.

Even if the above requirements can be met, the user must insure that their computing platform has sufficient physical memory.  Not surprisingly, the amount of memory required depends directly on the size of the problem you would like to solve.  To determine the size of the problem four characteristics of the input data must be calculated; the number of cells, persons, firms, and covariates.  The number of persons and firms as well as the number of covariates or right hand side variables should be relatively easy to ascertain (although make sure to give the last value some thought and use an upper bound, since memory usage increases almost linearly with the number of covariates).  The cells total refers to the sum over all persons of the number of unique firms each person has worked for.  A sample SAS program to calculate this value is available here.

Once you have obtained the number of cells, persons, firms, and covariates, plug them into the calculator and press the calculate button.  In the bottom row, you will receive the minimum memory (in megabytes) required to run CG2.

CG2_4v3 Memory Calculator
Person/Firm Cells:
Persons:
Firms:
Covariates:
Minimum Memory Required in MB:
If the value returned by the calculator is less than about .7*(Physical Memory) and you can meet all of the other requirements, then CONGRATULATIONS you are ready to begin installing CG2 (Keep in mind that the .7 figure is only an estimate based on our experience and will likely vary across platforms depending on how much memory is used by the operating system, other users, daemons, etc.  Feel free to replace my estimate with a number appropriate for your situation).

Downloads

The CG2 software is available in the VirtualRDC/LDI> GitHub repository. You can either download the entire source tree, including this documentation, at https://github.com/labordynamicsinstitute/cg2/ or individual packets in the subdirectories thereof, as specificed on the downloads page.

HowTo

Various HowTo's are available to guide you through installing and then using CG2.

References

The prime reference is Abowd, Creecy, and Kramarz: "Computing Person and Firm Effects Using Linked Longitudinal Employer-Employee Data," LEHD Technical Paper TP-2002-06 (also available here)

Contact Us

If you have any questions or comments please contact Kevin McKinney at kevinm@ccrdc.ucla.edu.
The home page of CG2 is at http://www.vrdc.cornell.edu/cg2/.
$Date$ $Rev$ $HeadURL$ $Author$