Experienced users can skip
directly to the Downloads/HowTo Section
Overview
CG2 is a package of Fortran 90 and SAS programs that estimate
non-nested 2 component fixed effect models. The
programs have been successfully compiled and tested on multiple *nix
platforms (IA32 (SuSe and Redhat Linux), Itanium (SuSe Linux),
and Sparc (Solaris)), but the code should work fine on other
platforms as well (ie. IA32 (Windows)). A port to Stata is maintained by Amine
Ouzad and available on this site as well (see the Stata branch in the repository).
The estimation algorithms were developed to
solve large-scale fixed person and firm effect wage models.
In
the typical scenario, a person's earnings and place of employment are
observed over time,
with mobility of persons across firms. This mobility
or "connectedness" of persons enables the estimation
of the model, although for problems with millions of persons and firms,
obtaining results require a substantial amount of computing power.
For users with large problems to solve, the main constraint
will be acquiring a computing platform with sufficient
physical memory or RAM.
For example, the
largest problem we have solved required 137GB of physical memory and
took
zzzz hours of CPU time on a Sun Fire 12k server. However,
this
does not imply that CG2 is inefficient, since no
known software can solve the same size problem using fewer
system resources.
To further reduce
single system image memory requirements, a cluster aware version of CG2
is under
development.
If you need additional information, please see the References section below.
Acknowledgements
John Abowd, Robert Creecy, Kevin McKinney, and Lars Vilhuber.
Census Bureau and the rest of the LEHD staff.
Getting Started
In order to use the CG2 package, certain basic software must be
available on a current *nix platform (The software will likely port to
Windows just fine, but we have not devoted time to testing this
assertion. If anyone succeeds in using this software on IA32
Windows, please let us know and we will post the information here).
The first requirement is a suitable Fortran 90/95 compiler.
The
software is known to work with Intel Compilers on IA32 and Itanium and
with the Sun compilers on Sparc/Solaris. The Intel
compilers are available for Linux and Windows and are free of
charge for non-commercial users. The Sun
compilers
are available for Linux and Sparc/Solaris, but are NOT available free
of charge. Other Fortran compilers will likely work as well,
but
are not supported.
The second requirement is the SAS statistical/data management software
package from the SAS Institute. SAS is available for a wide
range
of hardware/software
combinations.
If SAS is not available for your platform, other software
packages MIGHT be used to pre and post-process the data, but this is
not supported.
A suitable shell environment such as Bash or Ksh, while not
strictly required is very desirable. Either shell should be
available on virtually all *nix platforms.
Even if the above requirements can be met, the user must insure that
their computing platform has sufficient physical memory. Not
surprisingly, the amount of memory required depends directly on the
size of the problem you would like to solve. To determine the
size of the problem four characteristics of the input data must be
calculated; the number of cells, persons, firms, and covariates.
The number of persons and firms as well as the number of
covariates or right hand side variables should be relatively
easy
to ascertain (although make sure to give the last value some thought
and use an upper bound, since memory usage increases almost linearly
with the number of covariates). The cells total refers to the
sum
over all persons of the number of unique firms each person has
worked for. A sample SAS program to calculate this value is
available here.
Once you have obtained the number of cells, persons, firms, and
covariates, plug them into the calculator and press the calculate
button. In the bottom row, you will receive the minimum
memory
(in megabytes) required to run CG2.
If the value returned by the calculator is less than about .7*(Physical
Memory) and you can meet all of the other requirements, then
CONGRATULATIONS you are ready to begin installing CG2 (Keep in mind
that the .7 figure is only an estimate based on our experience and will
likely vary across platforms depending on how much memory is used by
the operating system, other users, daemons, etc. Feel free to
replace my estimate with a number appropriate for your
situation).