Economics 321:  Applied Econometrics

Prof. George Jakubson



               A Quick Primer on SAS Commands



A SAS job has two distinct components.



1.   A DATA step (or steps) readies a dataset for analysis.

2.   A PROC (for procedure) step (or steps) performs the

     analysis on the dataset created from a prior DATA step.



I'm not going to show you everything there is to know about

SAS, but enough so you can do the exercises for this course

and be able to do similar things for other courses (e.g. for

a term paper).



I.   General Comments



1.   SAS commands end with a semicolon (;).  They can span

     multiple lines - the program considers everything up to

     the next semicolon as part of the same logical

     statement.  A common source of errors is forgetting a

     semicolon.



2.   In all environments,  comments start with an asterisk

     (*) and end with a semicolon, as follows:

          ** this is a comment ;

     In some but not all environments, the following syntax

     will also work:

          /* this is also a comment sometimes*/



3.   There are two kinds of variables.  Numeric variables

     contain numerical values (numbers) and character

     variables contain text.  Therefore, the number 1 and

     the character '1' are different.  Means, regressions,

     etc., can only be calculated on numerical values.

     Frequency distributions are the exception - they can be

     calculated on character variables (but it's a real

     nuisance).

     





II.  The DATA Step version 1 (reading in raw data from

     within the program)



DATA NEW ; /* create the temporary dataset called new */

 INPUT A $ B C ;

          /* read in the variables A, B, and C. A is

          character, B and C numeric */

 CARDS ;

          /* The data are on the lines following this

          statement */

George 1 2

Jen    3 4

;             /* semicolon marks the end of the data */

RUN ; /* run command marks the end of a DATA or PROC step */



III. The DATA Step version 2 (reading in data from an

     existing dataset and creating new variables)



DATA NEW2; /* create a new temporary dataset called new2 */

SET NEW ; /* read in the data from a dataset called new.  If

          both names are the same, this will overwrite the

          original dataset */

LNB = LOG(B) ;

          /* take the natural logarithm of variable B */

IF (C GT 3) THEN DUM1 = 1 ; ELSE DUM1 = 0 ;

          /* Create a dummy variable which takes the value 1

          if variable C is greater than 3 and 0 otherwise.

          You have the following operators available to you:

          EQ for equals, NE for not equal to, GT for greater

          than, GE for greater than or equal to, LT for less

          than, LE for less than or equal to, AND for a

          logical and, OR for a logical or, NOT for a

          logical not, MIN and MAX for minimum and maximum,

          respectively. */

D = (B+LNB)**3 ;

          /* create a new variable D using arithmetic

          operations on existing variables.  You have the

          following symbols available:  + for addition, -

          for subtraction, * for multiplication, / for

          division, and ** for exponentiation.  You can use

          parentheses to group operations as I did above. */

IF (A EQ 'Jen') ;

          /* this is a subsetting if statement - only keep

          those observations for which the variable A takes

          the value 'Jen'  Note that this is a character

          variable, so the value we match must be character

          and not numeric */

IF (B LT 2) ;

          /* Keep the observation if variable B has a value

          less than 2.  Note that successive subsetting if

          statements will have a cumulative effect - in this

          example, only those observations for which A

          equals 'Jen' and B is less than 2 will be kept. */

RUN ; /* end the data step */





IV.  The PROC Step



PROC steps perform analyses.  The syntax varies with the

procedure.  They all start

  PROC procname DATA=dataset ;



I'll sketch out means, correlations, frequencies, and

regression for you:



     A.   PROC MEANS to get means, standard deviations, etc.



PROC MEANS DATA=NEW;

          /* take means of variables in dataset new */

 VAR B C ;

          /* only analyze variables B and C.  If not

          included, the default action is to analyze all

          numeric variables */

  RUN ; /* end the PROC step */



     B.   PROC CORR to get correlations



PROC CORR DATA=NEW ;

  VAR B C ;

  RUN ;



     C.   PROC FREQ to get frequency distributions



PROC FREQ DATA=NEW ;

  TABLES DUM1 A DUM1*A ;

          /* TABLES tells the procedure which variables to

          analyze.  To get a frequency distribution on a

          variable, include its name in the tables command.

          To get a 2-way crosstabulation of the values of

          DUM1 against the values of A, use the DUM1*A

          syntax. */

 RUN ;



     D.   PROC REG to run regressions



PROC REG DATA=NEW ;

  MODEL LNB = C DUM1 ;

          /* The MODEL command specifies a regression

          equation to be estimated from the data.  It starts

          with the word MODEL.  The next element is the name

          of the dependent (Y) variable.  Then there is an

          equals sign.  Then come the names of the

          explanatory (X) variables.  By default SAS will

          include an intercept for you. */

  RUN ;



V.   Temporary and Permanent SAS datasets



SAS datasets are either permanent or temporary.



     a.   A temporary dataset has a one-level name, for

          example, new.  Temporary datasets are erased when

          the job has completed.

     b.   A permanent dataset has a two-level name, for

          example, sasdat.new.  Permanent datasets remain in

          existence until they are explicitly deleted.  The

          first level of the name (sasdat, above) refers to

          the location of the directory which contains the

          dataset.  That is specified using a LIBNAME

          statement:

     

               LIBNAME sasdat 'directory location' ;

               

          so to make the current directory the location you

          could put '.' (Un*x speak for the current

          directory) or to make it /usr2/gj10 you would put

          '/usr2/gj10'.   My examples are using temporary

          datasets - to use permanent datasets just use two

          level names.



There's lots more that one can do, but these basics will

cover the vast majority of the tasks you'll ever need.  If

you start using SAS more regularly, or for an honors thesis,

the manuals are a reasonable investment.  Alternatively, check
out the information under "Introduction to SAS" for a middle ground between
this primer and the (expensive) investment in manuals.