C&C Short Course
SAS for C&C Computers R344
Related: SAS Graphics R354)
Section I. Overview and Summary
SAS Software
SAS (once "Statistical Analysis System") is a software package used for DATA ACCESS, MANIPULATION, STATISTICAL ANALYSIS, REPORT WRITING and the production of high resolution GRAPHICS.
Availability of SAS
SAS is available on the C&C Unix cluster Mead/Goodall
Windows (8.2) and Mac versions (6.12) are available via a campus site license.
See the Summary on http://depts.washington.edu/sasclass/
SAS Language/Commands
SAS LANGUAGE consists of LIBRARIES of COMMANDS (SAS statements, keywords etc.) and SYNTAX rules about how the commands may be combined. The command libraries are invoked in a SAS DATA or PROC STEP. The SAS DATA STEP is used to DEFINE data as a matrix of VARIABLES and OBSERVATIONS (cases) and to MANIPULATE DATA using the PROGRAMMING language. The PROC STEPS are used to perform a specific TASK (sort,print,correlation).
A SAS PROGRAM contains DATA and/or PROC STEP(s), each with commands to perform some TASK on DATA. Data are referred to using variable names which may begin
with a letter or underscore (_) and may contain up to 8 alphanumeric characters. SAS statements end with a SEMI-COLON (;). Spacing of commands and statements is up to you and can be used to make the program
easier to read.
Sample SAS PROGRAM:
The following SAS program creates a working SAS dataset called "Students" with variables "age, height, weight and sex". Observations with incomplete data
are deleted from the dataset. The dataset is Printed and descriptive statistics and graphics are requested:
OPTIONS LINESIZE=80 NOCENTER; /*linesize 80 left justified */
DATA students;
INPUT age 1-2 height 4-6 weight 8-10 sex $ 12 ;
IF age= . OR height = . OR weight = . or sex = ''
THEN DELETE;
/*one line of data as example */
CARDS;
14 64 110 M
;
PROC SORT;
BY sex;
PROC PRINT;
VAR sex age height weight;
PROC MEANS n mean min max std maxdec=2;
VAR age height weight;
BY sex;
PROC CORR;
VAR AGE;
WITH HEIGHT WEIGHT;
GOPTIONS DEVICE=TEK4010;
PROC GPLOT;
PLOT HEIGHT*AGE=sex WEIGHT*AGE=SEX;
SYMBOL1 V='M' I=JOIN L=1;
SYMBOL2 V='F' I=JOIN L=2;
TITLE1 F=DUPLEX C=BLUE 'Height & Weight by Sex';
Invoking SAS
SAS commands can be entered interactively or a FILE of SAS commands can be created with an editor and then submitted to the SAS software.
Submitting a sas program (batch mode):
Mead% sas filename <cr>
For example on Mead (Goodall uses the same syntax):
mead% sas prog1
submits file "prog1.sas" to SAS control returns to the operating system when the program ends. Sas creates a log file with a report of the run (prog1.log) sas creates a lst file with output from the run (prog1.lst)
SAS OUTPUT
SAS creates a LOG file containing errors, notes, warnings and other information from the run of the program. OUTPUT is placed in a LISting file.
Examples:
Unix
Program name: pgm1.sas Log name: pgm1.log Listing Name: pgm1.lst
Summary - Using SAS in non-interactive mode -
o Create a SAS program e.g.,try1.sas using any editor
o Run the SAS program using the sas command: sas try1
sas outputs a log of the run and and any output to fn.lst
e.g.,try1.log and try1.lst
o Edit the .log (e.g.,try1.log) to check run for errors, debug.
Interactive SAS
Interactive SAS (best on a PC or Mac) is started by clicking on the SAS Icon or by starting SAS from the pull down menus. Once SAS has started you can choose the task from the Pull down menus (e.g., FILE READ) or start SAS ASSIST (GLOBALS ASSIST).
More about SAS LANGUAGE and PROGRAMS
SAS Program Components
Most SAS programs will contain four parts:
1. File definitions - define input and output files
Create a key word (fileref or libref) pointing to a file or directory name
2. Options and/or Goptions statements - set options
Set the linesize for output and other options
3. DATA step(s) - read, create, combine data
Name a sas data set and describe data using input statement
4. PROC step(s) - act on data: Utilities, Statistics, Graphics.
proc print; proc sort; proc means ; proc freq; etc....
The four parts are combined into a program to read data:
/* comments can be placed in a file between slash star - star slash */
/* use the Filename statement to define the input file */
filename in 'student.data';
/* use the data statement to name the dataset and start the datastep */
data students;
/* the infile statement points to the input dataset */
infile in;
/* the input statement describes the format of the input data */
input name $ 1-8 age 10-11 grade 13;
/* print the data */
proc print data=students;
/* get the mean age */
proc means data=students;
var age;
/* sort by grade so we can analyze by grade */
proc sort data=students;
by grade;
/* get the mean age for each value of grade */
proc means data=students;
by grade;
var age;
The same program as it might be typed into a file e.g.,"rdstdat.sas".
Note: SAS statements end in semicolons ";"
spacing and indentation can be used to make code
easier to read. Case is also up to you and is only
important for the value of an variable (e.g.,'New York').
/* rdstdat.sas */
filename in 'student.data';
data students;
infile in;
input name $ 1-8 age 10-11 grade 13;
proc print data=students;
proc means data=students;
var age;
proc sort data=students;
by grade;
proc means data=students;
by grade;
var age;
o Running the program.......... (at the "Mead% " prompt)
Mead% sas rdstdat
or to run in background (unix) add an ampersand:
Mead% sas rdstdat &
When the program finishes......
Examine the log: rdstdat.log for notes, errors and warnings.
Examine the .lst: rdstdat.lst for the output from proc print and means
More Detail on the SAS DATA step
The SAS DATA step is used to access, manage and manipulate data.
Use the sas data step to:
1. Enter Data into a SAS data set
2. Name SAS data sets (temporary or permanent)
3. Create sub-sets, limit or separate cases.
4. Combine (interleave and/or concatenate) data sets
5. Drop, rename, create, label variables.
6. Modify, manipulate data using the programming language.
The SAS DATA step processes each line of data read (or created) one CASE at a time and executes all the statements in the data step once for each observation or case.
SAS Dataset
The SAS DATA step creates a SAS dataset which stores the data by variable name along with any associated labels or formats.
A SAS dataset may be pictured as ROWs of CASES or observations in
which COLUMN fields represent the VARIABLES.
SAS dataset Naming
A sas data set is created by naming it on the DATA statement:
DATA students;
The dataset name must start with a letter or underscore (_)
The dataset name must be 1-8 alphanumerics in length
Temporary data sets
SAS names each dataset using the convention WORK.name where you provide "name". SAS refers to the data set as "WORK.name" (e.g.,WORK.students) on the .log file. You may omit "WORK" when naming or refering to a SAS data set
in a SAS program. e.g.,proc print data=students; Temporary datasets are erased at the end of a sas run.
Permanent data sets - saving a sas dataset
The data statement is used to save a sas dataset.
(data sets can also be output from sas procedures)
A double name is used e.g.,data mylib.students;
A Libname statement is used to define 'mylib':
libname mylib 'mydirectory';
libname mylib '~jamieson/sasdata';
Note that the libname points to the directory only *not*
to a specific filename. The libname statement
tells sas where to write the file...which in this
example will be named: students.ssd02
'mylib', created by libname, is just a pointer..
or library name.
The symbol created with the libname statement is used on
the data statement to name the dataset, e.g.,mylib.name:
Sample Program to save a sas dataset
The sample program is repeated (this time as 'mkstdat.sas')
The program reads a 'raw' or ascii file: student.data
and writes students.ssd02.
/* mkstdat.sas */
libname mylib '~jamieson/sasdat';
filename in 'student.data';
data mylib.students;
infile in;
input name $ 1-8 age 10-11 grade 13;
Sample Program to read a saved sas dataset
The libname statement is used to indicate where the datasets are stored. The double name (e.g.,mylib.students) is used on the "proc" statements to refer to the data. The following program analyzes a previously saved sas dataset:
/* stdmns.sas */
libname mylib '~jamieson/sasdat';
proc print data=mylib.students;
proc means data=mylib.students;
var age;
proc sort data=mylib.students;
by grade;
proc means data=mylib.students;
by grade;
var age;
More sas data step notes:
Default SAS Language
The SAS system provides DEFAULT LANGUAGE underlying the DATA and PROC steps. The user may take advantage of these defaults or override and/or add to them. The defaults allow quick construction of simple programs without the need to learn all of the programming language.
Programming Language
The SAS DATA step offers an extensive PROGRAMMING LANGUAGE with a vocabulary of commands and functions similar to those found in such languages as Fortran, C, and Basic. Additional programming capacity is available using SAS MACRO language. PROC IML provides an Interactive MATRIX Language. Custom GRAPHICS can be produced using SAS DATA step and/or MACRO language in combination with a library of ANNOTATE graphics commands and macros.
PROCedures "PROC" Library.
The PROC library contains a set of procedures called by name (e.g.,PROC PRINT) which perform a specific task on a SAS dataset. SAS PROCs (except CONVERT) work only on SAS datasets. PROCs act on the last data set created in a SAS program or a named data set. (see examples).
Data is MODIFIED (subset or recoded etc.) in a SAS DATA step, then passed to a PROC. PROC syntax is limited to specific set of commands unique to each PROC. PROC commands refer to data using SAS variable names. Data can be passed from
back and forth between PROC and DATA steps. The results from a PROC can be OUTPUT for printing, passed to another PROC or DATA step or written to disk or tape.
The PROC Library includes:
Utilities: PRINT print
SORT sort
TRANSPOSE invert a SAS Dataset.
CONTENTS directory of SAS dataset
COPY copy, export
DELETE delete SAS Dataset or Library
DATASETS Utilities...replaces DELETE
CONVERT Convert some datasets to SAS
FORMAT create formats for variable Values
OPTIONS or GOPTIONS show current settings
GTESTIT test graphics device
Report writing: PRINT, CHART, PLOT, CONTOUR, CALENDAR, FORMS
Descriptive Statistics: MEANS, FREQ, UNIVARIATE, SUMMARY,
TABULATE, CORR
Statistical Analysis: ANOVA, GLM, TTEST, FSQUARE, FACTOR,
CLUSTER, CATMOD
Graphics: GCHART, GPLOT, G3D, GMAP, GSLIDE, ANNOTATE.
OPTIONS and GOPTIONS:
OPTIONS sets program parameters such as LINESIZE, PAGESIZE,
OBS, effecting printing and the number of observations
processed. PROC OPTIONS; shows current default options.
e.g.,OPTIONS OBS=0; /*check syntax only no observations */
OPTIONS ls=72 ps=24; /* linesize 72 pagesize 24 */
GOPTIONS sets graphics options such as DEVICE, GPROTOCOL,
GSFNAME, GSFMODE effecting the type of graphics
output which is created and where it is directed.
TITLE, FOOTNOTE, NOTE statements are used to write text
on output and can be part of a DATA or PROC step
or they can stand alone.
SYMBOL and PATTERN statements specify symbols and patterns
used in SAS/Graph (e.g GPLOT, GCHART).
File Definitions - define input and output files for SAS.
Command syntax varies by Operating system. Input and Output
files can be defined for reading and writing 'Raw' data and
SAS datasets. Filedefs can also indicate output files for graphics.
Filedefs are covered in later sections.
Section II. BASICS Getting Started with SAS
STEPS and STATEMENTS
A SAS program is a series of SAS DATA and/or PROC STEPS, each containing a series of SAS STATEMENTS. With few exceptions all SAS code will be part of a DATA Step or PROC STEP.
Use the DATA step language to DEFINE and MODIFY (e.g.,subset, recode) DATA. Use the PROC step for specific TASKS (e.g., sort, print, statistics, graphics).
The DATA step is begun with the sas DATA statement and optionally
the name you assign to the data set:
e.g.,DATA students; or DATA A B C;
The PROC step is begun with the SAS PROC statement and the name
of the PROC: e.g.,PROC MEANS or PROC MEANS DATA=students;
SAS DATA STEP LANGUAGE is composed of STATEMENTS which contain
a SAS KEYWORD, FUNCTION or OPERATOR.
SAS statements END with a SEMICOLON (;).
SAS code can be written in UPPER or LOWER CASE.
Case is observed only in character variable values.
KEYWORDs and VARIABLE names must be separated by at least one SPACE unless an operator or delimiter is used (e.g.,X=MEAN(X,Y); Spacing is otherwise not important. You can indent and space your SAS code to make it easier to read. Statements may be continued on the next line.
SAS VARIABLE NAMES begin with a LETTER or Underscore (_) and
may contain up to EIGHT alphanumeric characters.
SAS COMMENTS are not executed and allow you to
annotate your program or turn statements on and off.
Comments take two forms:
The first type begins with with an asterisk (*) and ends with the next semicolon (;). e.g.,*this is a comment;
*PROC PRINT; *off ;
A second form of comment begins with '/*' and ends with '*/' and may occupy any valid space.
e.g.,/* this is a valid comment */
Example using comments and variable spacing:
The following are all acceptable and yield the same result,
but vary in the spacing of statements:
/*1*/ DATA a;SET b;y=2*x;street='Main';PROC PRINT;
/*2*/ DATA a; /*start a data step and name the data set 'a' */
SET b; /* get the SAS dataset 'b' to create 'a' */
y=x*2; /*create variable y equal to 2 times x */
street='Main'; /*create a character variable called */
/* 'street' set the value of street */
/* to 'Main' for all observations */
PROC PRINT; /*print the data */
/*3*/ DATA a; SET b; y = 2 * x ; street= 'Main' ;
PROC
PRINT;
Multiple commands and statements may occur on one line or a single statement may continue to the next line in a file. The SEMICOLON ends SAS statements.
SAS DATA STEP is a LOOP
SAS executes each statement in your program in the order written (some exceptions noted later e.g.,KEEP). SAS executes the statements in your DATA step once for each observation read in or created Thus the DATA step in SAS is itself a loop performing each statement in the DATA step over and over until all observations are processed. SAS stops when the last observation is processed.
ALL VARIABLES are set to MISSING at the start of each DATA STEP loop before a new observation is read or created. This is important to remember if you write your own program for summing. (See RETAIN in section 4).
It is important to remember that SAS reads DATA in sequence, one observation at a TIME. So if SAS is on observation 3 (3rd case) it sees only that case. LAG and DIF functions (see section 4) are available to compare the values of variables from the current observation with their values in a previous observation. The RETAIN statement can also be used to hold variable values throughout a SAS DATA step.
START with the DATA step to input your data.
Unless you are starting with SAS datasets which you received on tape or from another user or you are using PROC CONVERT or FILE IMPORT to convert data from another format (e.g.,SPSS, Excel, OSIRIS) you will start with a SAS DATA step and INFILE and INPUT statements to read in your data and create a SAS dataset.
The DATA statement names the dataset(s) to be created: DATA students; the infile statement names the raw datafile: infile 'student.data'; or if used with filename:
filename in 'student.data';
data students;
infile in;
the input statement describes the columns of data: input name $ grade;
MODIFY DATA and CREATE variables in the DATA step.
You would also use the SAS DATA step to make any modifications in the data, create subsets of the data, new variables and the like. For example, the creation of a dataset which contains only males from a larger dataset is done with the DATA step.
The DATA STEP begins with the DATA STATEMENT and ends when the program encounters another DATA step, a PROC step, a RUN statement, an OPTIONS or GOPTIONS statement or the end of your SAS program.
The DATA STEP may contain any number of STATEMENTS from the DATA step library. Each statement will be executed once for each observation read in.
EXAMPLES
The first example uses the DATA statement to name the
data set and load the SAS DATA step library.
The INPUT statement describes the format of the data to be
read in and assigns variable names.
The CARDS statement indicates that the data is in the program
file and follows the CARDS statement:
/* example 1 data in the file, free form list input */
DATA physical; /*create data set 'physical' */
INPUT age height weight sex $; /*input 4 variables '$' indicates */
CARDS; /*character data */
14 55 120 M /*CARDS indicates data follows */
12 46 110 F
19 66 150 M
13 52 136 M
;
PROC PRINT; /* print the data set */
PROC MEANS; /* call PROC MEANS */
VAR age; /* get the means for variable AGE
The DATA statement names the SAS dataset(s) being created.
Any number of DATA sets can be created in one statement e.g.:
DATA a b x males females; /*creates five data sets */
If you do not name the DATA set SAS uses a DATAn naming
convention to name the first dataset created DATA1, the second
DATA2, and so on. If you do not need to create a data set but
want to use the DATA step language you can use the special
designation _NULL_. e.g.,DATA _NULL_;
If you name a dataset which has already been created SAS will
overwrite the old data set. e.g.,
DATA a;
INPUT x y z;
DATA a;
INPUT c d e;
The data from the first DATA set 'a' containing x, y and z will be
overwritten and lost.
SAS names each data set you create 'WORK.name'
e.g.,'WORK.a' above. You refer to the data set omitting 'WORK'.
e.g.,PROC PRINT DATA=a;
The dataset exists for the duration of the program and is deleted from memory or disk when the program finishes.
To write out a SAS dataset you must give it a double name replacing WORK with your own DD or data definition.
Under Unix you first define a "DDname" using the LIBNAME command. A 'DDname' is merely a symbol which represents the directory where you want sas to store or read sas datasets.
e.g.,LIBNAME MYDISK '~user/sasdat';
DATA MYDISK.students;
will create a SAS data set students.ssd02 in directory user/sasdat.
More about writing out SAS and RAW data will be included in
a subsequent section covering VMS and DOS as well.
The INPUT statement
The INPUT statement can only be used in the DATA step and describes to SAS the format of the data to be read from a file (on disk or tape) and assigns variable names which can be 1-8 EIGHT alphanumeric characters but must begin with a letter (a-z) or underscore (_).
The INPUT statement can take three forms:
list or free form INPUT A B C $;
column INPUT A 1-2 B 4-5 C $ 6-7;
formatted. INPUT @1 A 2. @4 B2. @ 6 C $2.;
CHARACTER DATA is indicated with a DOLLAR SIGN ($).
Character data are fields which contain letters or symbols.
Numeric data contain only numbers, signs (+-) and decimals
The formats can be combined or mixed on an input statement.
LIST INPUT lists the variables in the order they occur in the data. Each column of data must be separated by at least one blank. Missing values must be represented by some code or, by default, SAS moves to the next value encountered and reads it and your data may be misread. SAS assigns the default
format and length (8) for variables unless overridden with an INFORMAT or LENGTH statement.
input age height weight sex $ ;
COLUMN INPUT names the variables and the columns in which the data can be found. The data in example 1 could be read with the following column style input:
INPUT age 1-2 height 4-5 weight 7-9 sex $ 11;
The advantage of column input is that missing data may be left blank as SAS will only look for the data in the columns indicated. The data must be in column form; no spaces between variable values are necessary. Fields may be skipped and data input in any order. (i.e. you can input from columns 20-30 then from columns 6-9).
FORMATTED INPUT names the variables, the starting position and the format to be used to read the data (i.e. length and type). The data in example 1 could be read with the following formatted input style:
INPUT @1 age 2. @4 height 2. @7 weight 3. @11 sex $2. ;
(a variable with a total of six digits 2 of which are decimal
places would be read as: INPUT @1 pay 6.2; ).
Formatted input requires a starting column, variable name and format. As with column input data can be read from a given line in any order and fields can be read more than once e.g.:
INPUT @1 FIRSNAME $8. @9 LASTNAME $12. @1 WHOLNAM $CHAR20.;
(the $CHAR character format allows embedded blanks in the field
e.g.,NEW YORK, JOHN SMITH)
All three INPUT STYLES can be COMBINED in one input statement.
Abbreviations can be used in the input statement such as:
INPUT (X1-x1000) (1.); input 1000 single column variables.
SAS also provides pointer control language allowing precise control
of the input statement. Some examples include:
/ skip to next card
# n go to card n
@ column go to column
@@ hold the card for further input
(see SAS Language Guide for more input syntax).
CARDS and CARDS4 statement. Use the CARDS statement to indicate that the data to be read is included in the SAS program file. The end of the data can be indicated by a semicolon in column 1, a DATA statement, or a PROC statement. Use CARDS4 if the data contain semicolons and indicate the end of the data with
four semicolons in columns 1-4.
DATA in a Separate file: INFILE.
INFILE allows you to bring in data from a file on tape or disk.
The INFILE statement is part of a DATA step and refers to a file definition as the source of data. The infile statement follows the DATA statement and
precedes the INPUT statement. The file definition varies by operating system:
Unix (Mead) example reads the data from
unix file 'physical.dat'
FILENAME bbb 'physical.dat';
DATA a;
INFILE bbb;
INPUT age height weight sex $;
PROC PRINT;
VMS (MAX) example reads the data from
VMS file 'physical.dat;1':
FILENAME bbb 'physical.dat;1';
DATA a;
INFILE bbb;
INPUT age height weight sex $;
PROC PRINT;
SET, MERGE Accessing SAS datasets (Concatenate, Interleave)
SET and MERGE are part of the DATA step language and are used to bring SAS data set(s) (not raw data) into a DATA step.
SET is used to concatenate data sets (one after the other adding cases)
MERGE is used to interleave data sets by common variable(s)(adding variables) To use MERGE, the data sets must first be sorted by the common variable.
Use PROC SORT to make sure the data are properly sorted.
Example using SET to concatenate two datasets:
DATA males;
INPUT name $ age height sex $;
CARDS;
Smith 41 72 M
Jones 38 78 M
Brown 33 70 M
;
DATA females;
INPUT name $ age height sex $;
CARDS;
Burns 32 67 F
James 49 59 F
Allen 30 70 F
;
DATA both;
SET males females;
PROC PRINT;
run;
Data set 'both' would look like this:
Smith 41 72 M
Jones 38 78 M
Brown 33 70 M
Burns 32 67 F
James 49 59 F
Allen 30 70 F
run;
Example using MERGE to interleave two datasets by a common variable:
DATA scores;
INPUT name $ score;
CARDS;
Smith 14.7
Jones 12.6
Brown 14.2
Burns 13.1
James 11.6
Allen 14.0
;
PROC SORT; /* sorts dataset 'scores' by name */
BY name;
PROC SORT DATA=both; /* sorts dataset 'both' by name */
BY name;
DATA combo;
MERGE both scores;
BY name;
PROC PRINT;
The dataset 'combo' would look like this:
Allen 30 70 F 14.0
Brown 33 70 M 14.2
Burns 32 67 F 13.1
James 49 59 F 11.6
Jones 38 78 M 12.6
Smith 41 72 M 14.7
;
Section III. BASICS Manipulating SAS datasets
Subsetting, Renaming Variables, Recoding Variable Values,
Dropping Variables, Deleting Cases.
This section describes some of the more fundamental DATA step
programming language used to:
SUBSET data sets by limiting or deleting observations,
DROP, ADD and RENAME variables
change variable values
SAS keywords used include RENAME, DROP, KEEP, OUTPUT
DELETE, IF...THEN..ELSE, DO, END and SELECT..WHILE.
Most SAS OPERATORS are also used in this section
DATA step language
All of the language described in this section is a part of the SAS DATA step and must be invoked after a DATA statement. A DATA step ends when a PROC, RUN, or DATA statement is encountered or the end of a file of code is reached.
In the examples the SAS KEYWORDS are in UPPERCASE and user defined variables are in lowercase. In SAS programming both may be used. Examples are commented using the SAS comment syntax /* comment */
In order to Subset or act on SAS datasets the most commonly used SAS statement is a conditional IF to test variable values for inclusion or exclusion.
TESTING VARIABLE VALUES
The general form for testing SAS variable values is:
IF varname operator value SasStatement;
Example Meaning
IF ID = 0 THEN DELETE; if ID is 0 then delete the case in the dataset IF ID NE 0; if ID is not 0 include the case in the dataset
note the second example which illustrates the special SAS convention for testing a variable value for inclusion into the Dataset. The two statements above result in the same action.
example:
DATA males; /*create data set 'males' , start datastep*/
SET master; /*bring in data set 'master' */
IF sex='M'; /*if the value for the variable sex is 'M'*/
/*include the observation in the data set */
more examples below... first a few more SAS CONVENTIONS
used in TESTING Variable Values.
MISSING DATA
NUMERIC MISSING data is represented by a period (.):
e.g.,IF N = . THEN DELETE;
CHARACTER MISSING data is represented by two single quotes (''):
e.g.,IF NAME = '' THEN DELETE;
SAS OPERATORS and comparison symbols:
OPERATION SYMBOL EXAMPLE
equal = or EQ A = B; IF A EQ B;
not equal ^= or NE IF A ^= . THEN DELETE; IF A NE B;
less than < or LT IF A < B THEN OUTPUT; IF A LT B;
greater than > or GT IF A > 0 THEN X=1; IF A GT 0 THEN X=1;
less than or = <= or LE IF A <= 1; IF A LE 1;
greater than or = >= or GE IF A >= 0; IF A GE 0;
and & or AND IF A = 1 & B =2; IF A = AND B=2;
or | or OR IF X = 1 | X=2; IF X =1 OR X=2;
addition + A=B + C;
subtraction - D=X - Y;
multiplication * Z=Q * 4;
division / Q=Z/4;
power ** T=(X)**3;
(NOTE: Use the ALPHA comparison form to TEST variable values but NOT to ASSIGN values e.g.:
A = 1; is valid
A EQ 1; is not
IF X EQ 10 THEN Y = 4; is valid
IF X EQ 10 THEN Y EQ 4; is not vaild)
SUBSETTING - Limiting a data set to specific cases
Multiple data sets can be created and combined in one data step. You can selectively OUTPUT the datasets named on the DATA statement according to variable values. Use IF...THEN with OUTPUT to conditionally OUTPUT cases to each dataset. (Could also be done with SELECT..WHEN...OTHERWISE).
DATA A B C D E REST;
MERGE ONE TWO THREE;
BY ID;
IF SCORE='A' THEN OUTPUT A;
ELSE IF SCORE='B' THEN OUTPUT B;
ELSE IF SCORE='C' THEN OUTPUT C;
ELSE IF SCORE='D' THEN OUTPUT D;
ELSE IF SCORE='E' THEN OUTPUT E;
ELSE OUTPUT REST;
LIMITING CASES according to variable values :
IF age NE . ; /*let in only if age is not missing */
IF age = . THEN DELETE; /* same result */
IF age NE . AND sex NE . AND name NE ' ';
/* missing for age, sex and name are excluded */
Example:
DATA a; /*create dataset 'a' and start data step*/
SET b; /* bring in data set 'b' (created above)*/
IF age ne . AND name ne ''; /* include only cases for which age*/
/* and name (character) are not missing*/
RECODING Variables
Simple assignment:
e.g.,
data a;
set b ;
age=dbirth - dinterv;
x=y =2;
Conditional assignment:
data a;
set b;
IF 1 LE age LE 30 THEN agecat=1;
ELSE IF 31 LE AGE LE 60 THEN AGECAT=2;
ELSE IF AGE > 60 THEN AGECAT=3;
/* if 1 is less than age and age is less than 30 then set variable*/
/*agecat to the value of 1 if this statement is true the next two */
/* are ignored, if false the next two are checked and if 2 is true*/
/* the third is ignored if 2 is false the third is checked. */
Some notes about IF...THEN and IF...THEN ELSE...:
IF THEN statements can cause trouble because the computer is more
literal than we humans expect. It is best to spell out each comparison.
Some examples of common problems:
IF age=30 AND age=20 THEN agecat=2;
/*agecat will never be set to two as age cannot be 20 and 30 at the*/
/*same time, SAS looks at each observation one at a time, what */
/*should be used is the following: /*
IF age=30 OR age=20 THEN agecat=2;
IF score=1 or score=2 then scorcat=1;
IF score=3 or score=4 then scorcat=2;
IF score=5 or score=6 then scorcat=3;
IF score=7 or score=8 then scorcat=4;
IF score=9 or score=10 then scorcat=5;
ELSE scorcat= 0;
/* this will result in values for scorcat of 5 and 0 only!!!*/
/* the ELSE works with the LAST IF statement in this case */
/* so that any value for score which is not 9 or 10 will yield*/
/* a value for scorecat of 0 ! */
/* there are many solutions to this, one of which follows */
scorcat=0;
IF score=1 or score=2 then scorcat=1;
IF score=3 or score=4 then scorcat=2;
IF score=5 or score=6 then scorcat=3;
IF score=7 or score=8 then scorcat=4;
IF score=9 or score=10 then scorcat=5;
An alternative to IF...THEN...ELSE constuction uses
SELECT...WHEN...OTHERWISE to recode variables or assign new ones.
SELECT WHEN OTHERWISE........e.g.,
DATA NEW;
SET OLD; X=UNIFORM(0);
SELECT (A);
WHEN (1) X=X*10;
WHEN (2);
WHEN (3) X=X*100;
OTHERWISE X=1; /* only those not fitting ALL other*/
END; /* WHEN statements...see above IF */
/* THEN ELSE problems!*/
Multiple Assignments
While any number of conditions may be specified on the left side of an IF..THEN clause the right side may contain only assignment statement:
e.g IF.... THEN A=1 AND B=2; is not valid...
Use a DO and END statement to make multiple assignments:
(see also SELECT WHEN above)
e.g.,
DATA a;
SET b;
IF X =1 THEN DO;
a=1;
b=2;
END;
IF x =2 THEN DO;
a=4;
b=6;
END;
LIMITING VARIABLES in a dataset
DROP and KEEP
DROP removes variables listed.
e.g.,DROP age name;
KEEP keeps only those variables listed.
e.g.,KEEP age height;
If no DROP or KEEP statement is used all variables remain in the dataset. Even if variables are dropped they are available for calculation for the duration of the datastep but are not output.
Examples:
DATA a;
SET b;
age = today - dbirth;
KEEP AGE;
DATA new;
SET old;
age = x1;
DROP x1;
DROP and KEEP are also options of the DATA and SET statements:
e.g DATA A (KEEP=age height);
SET b;
or DATA A;
SET b (KEEP=age height) c (KEEP=age height);
RENAMING Variables
The RENAME statement allows variable NAMES to be changed.
RENAME OLD=NEW;
e.g.,NAM=NAME;
NOTE: DROP and KEEP statements issued in the same data step as a RENAME statement must refer to the OLD not the NEW names.
e.g.,DATA b; RENAME x1=age; KEEP X1;
Another way to change variable names (which can be less confusing) is to create a new variable from the old with an assignment statement and then drop the old variable
e.g.,
DATA a2;
SET a1;
name=x1; age=x2; sex=x3;
DROP x1 x2 x3;
is the same as
DATA a2;
SET a1;
RENAME x1=name age=x2 sex=x3;
Section IV. BASICS 2 Further DATA STEP Language
This section covers DATA step language to:
set variable lengths and formats (LENGTH, FORMAT, INFORMAT)
label variables (LABEL)
indicate special missing variable values (MISSING)
use abbreviated variable lists (e.g.,x1-x10 and firstvar--lastvar)
handle a number of variables quickly (ARRAY)
invoke functions (e.g.,SUM, ABS, INT, SQRT)
special SAS variables (e.g.,_N_, FIRST.var, LAST.var)
retain variable values for all cases (RETAIN)
branching statements (GOTO, LINK, RETURN)
VARIABLE LENGTHS AND FORMATS
If variable lengths and formats are not specified SAS sets them to 8 columns for both numeric and character data. To override these default lengths use the LENGTH, INFORMAT, and FORMAT statements. These statements can save space or allow you to read variables longer than the default.
LENGTH is used for both INPUT and OUTPUT. INFORMAT for input and FORMAT for output.
LENGTH and INFORMAT are used in the DATA step.
FORMAT can be used in the DATA step or PROC step. If used in the PROC step the formats are temporary for the duration of the proc. If used in the DATA step they are used throughout all subsequent PROCs when the particular DATA set is used.
SAS Language Guide lists the formats available. (PROC FORMAT allows you to create your own formats.)
examples:
DATA a;
LENGTH age height 2 name $ 14;
INPUT age height name $;
DATA a;
INFORMAT firsname lastname $15.; /* notice the period (.) */
INPUT firsname lastname;
PROC PRINT;
VAR firsname lastname;
FORMAT firsname lastname $15.;
reading numeric and character data. They are too numerous to cover
here. Some examples include:
$CHARn. To read character data with imbedded blanks (e.g.,NEW YORK)
DATA cities;
INFORMAT ctyname $CHAR12.;
INPUT @1 ctyname $char12.;
CARDS; /* note input. Start data col 1 */
Seattle
New York
Los Angeles
Denver
St. Louis
;
MMDDYY6. to read a date value of the form 120187 (see other Date
formats)
DATA scores;
INFORMAT score 2. date MMDDYY6.;
INPUT score 1-2 @4 date mmddyy6.;
CARDS;
31 070182
35 070282
35 070382
33 070482
;
PROC PRINT;
VAR score date;
FORMAT date MMDDYY6.; /*without the FORMAT for date the date */
/*would print as the number of days*/
/*since Jan. 1 1960 e.g.,070482 would*/
/*print as 8220 */
LABELLING Variable NAMES
(See PROC FORMAT, and and the DATA step FORMAT
statement for LABELLING Variable VALUES).
It is often convenient to use SAS abbreviated INPUT format to read data (e.g.,INPUT x1-x200;). This is quicker to program and allows abbreviations to be used in the designation of variables in VAR statements, ARRAYS (see below) KEEP, DROP LENGTH, INFORMAT, etc. In addition since SAS only allows variable names to be 8 alphanumeric characters in length you may need variable labels
to make output easier to read. The LABEL statement can be used to associate a longer name (up to 40 characters) with a variable.
example:
DATA a;
LABEL x1='Test One' x2='Test Two' x3='Test Three' x4='Test Four'
x5='Test Five' name='Student Name';
INPUT @1 name $Char12. (x1-x5) (2.);
CARDS;
Jose Jimenez 33 35 36 38 39
etc.
;
PROC MEANS;
VAR x1-x5;
MISSING
While the SAS designation for missing data is a period (.) for numeric data and a blank for character data you may want to designate your own missing code.
Use the MISSING statement to designate addition missing values. The missing values may be any of the 26 CAPITAL letters or the underscore (_). Use assignment statements to recode other missing codes to the SAS value (.). The missing statement:
MISSING values;
example
DATA a;
MISSING M;
INPUT x ;
CARDS;
1
2
3
M
4
5
7
8
;
Abbreviated Variable Lists
SAS allows abbreviated form of variable reference.
The first is of the form PREFIXn-PREFIXn
(e.g.,x1-x5 score1-score5).
The second is by postion where the first variable mentioned
precedes the second in the SAS dataset. Position is determined
by the INPUT order. Use PROC CONTENTS DATA=a POSITION; if you
do not know the variables' positions.
e.g.,
FILENAME dat 'physical.dat';
DATA a;
INFILE dat;
INPUT id sex $ age height weight score1-score10;
PROC PRINT;
VAR age--score10;
/*variables ID and SEX would NOT be printed, all the rest would*/
ARRAYS
An ARRAY represents a number of Variables in any line or case.
An ARRAY consists of a NAME, a list of ELEMENTS and an INDEX
variable. One example: ('Implicit' type see below)
ARRAY xvars (xv) x1-x10;
The ARRAY NAME is xvars, the ELEMENTS are variables x1 through x10 and the index variable is xv. When xv is 1 a reference to xvars is the same as a reference to x1. When xv is 10 the statement:
y=xvars is equivalent to y=x10. The value of the index
variable (xv here) is the SUBSCRIPT referred to below.
The ARRAY statement in SAS can be used to treat a number of variables in the SAME CASE in the same way.
e.g.,ARRAY vars (xv) age height weight;
do over vars;
if vars = 99 then vars = .;
end;
There are two forms of the array statement the explicitly subscripted array and the implicitly subscripted array. The subscript indicates which element of the array is processed. When the subscript is one the first element is processed, when it is two the second is processed and so on.
The form of the explicitly subscripted array is:;
ARRAY name{n} elements; n is the number of elements
ARRAY name{n} $ elements; (character data)
The form of the implicitly subscripted array is:;
ARRAY name (indexvar) elements; (e.g.,"xv" above)
ARRAY name (indexvar) $ elements;
Note: The DEFAULT INDEX VARIABLE for implicitly subscripted arrays
is "I". Thus for ARRAY invars x1-x10; ARRAY outvars y1-10;
the index variable for both will be "I" such that
DO I = 1 to 10; outvars = sqrt(invars); end;
will give the values y1-y10 the square roots of the
values of x1-x10.
Examples of explicitly subscripted ARRAYs:
ARRAY rain{5} janr febr marr aprr mayr;
ARRAY month{*} jan feb jul oct nov; /*num.of elements not needed*/
ARRAY x{*} _NUMERIC_; /*indicates all numeric variables */
Usage:
DATA a; /* recodes 9's to SAS missing '.' */
SET in;
ARRAY x{*} x1-x10;
DO I = 1 TO 10;
IF x{I} = 9 THEN x{I} = . ;
END;
and
FILENAME scores 'scores.data';
DATA new;
/*write scores from tests 4 and 6 to file indicated by 'scores' */
INPUT qa1-qa10 qb1-qb10;
ARRAY test{10} qa1-qa5 qb1-qb5;
FILE scores;
PUT test{4}= test{6}=;
CARDS;
/* data follows*/
Examples using Implicitly Subscripted Arrays:
DATA a; /* recodes 9's to SAS missing '.' */
SET in;
ARRAY x (I) x1-x10;
DO OVER x; /*do over is the same as do I = 1 to 10 */
IF x = 9 THEN x = . ;
END;
DATA three;
ARRAY test1 (J) t1q1-t1q10;
ARRAY test2 (J) t2q1-t2q10;
ARRAY test3 (J) t3q1-t3q10;
ARRAY answer (K) test1-test3;
INPUT t1q1-t1q10 t2q1-t2q10 t3q1-t3q10;
DO K=1 TO 3;
DO J=1 TO 10;
IF answer=. THEN answer=0;
END;
END;
CARDS;
/*data follows */
The outer DO loop determines which element of ANSWER (ARRAY TEST1, TEST2, or TEST3) is being processed. The inner DO loop determines which element in the current ARRAY (question 1 through 10) is being processed.
Arrays can also be processed with other forms of DO statements such as:
DO UNTIL(expression); /*(See the SAS BASICS for these) */
DO WHILE(expression);
SAS Functions:
SAS functions are used in the DATA step to manipulate
numeric, character and special types of data. Some examples:
FUNCTION SUMMARY DESCRIPTION EXAMPLE
-------- ------------------- -------
ABS Absolute value X=ABS(T);
MAX Maximum value TMAX=MAX(X1,X2,X3);
SQRT Square Root P=SQRT((X1-X2**2) + (Y1-Y2)**2);
ROUND Round to nearest unit RX=ROUND(R,.01);
LOG LOG BASE E LX=LOG(X);
MEAN ARITHMETIC MEAN LM=MEAN(OF L1-L10);
UNIFORM PSEUDO-RANDOM VARIATE.. Z=UNIFORM(Y);
LAGn LAGn = value of var lx3 =LAG3(x); x has value of
n observations prior x three observations prior .
MONTH MONTH FROM DATE MON=MONTH(DATE);
HOUR HOUR FROM DATETIME HR=HOUR(TIME);
MDY SAS DATE VALUE IF DATE=MDY(12,31,87);
LAG LAG VALUES L4=LAG4(L);
LENGTH LENGTH OF CHARACTER.. LS=LENGTH(WORD);
SUBSTR SUBSTRINGS CHARACTER.. FIRST=SUBSTR(NAME,1,12);
SEE SAS Language Guide for more Examples.
SAS SPECIAL Variables
SAS has a number of special variables most of which are not output as such. They can be accessed for the duration of a data step. Some examples:
Variable Description Example
-------- ----------- -------------------
_N_ Current observation IF _N_ = 1 THEN DO; N=_N_;
FIRST.var first value of variable IF FIRST.DATE THEN DO;
LAST.var last value of variable IF LAST.DATE THEN OUTPUT;
_NUMERIC_ all numeric variables PROC FREQ;TABLES _NUMERIC
_CHAR_ all character variables PROC FREQ;TABLES _CHAR_;
_ALL_ all variables
DATA _NULL_; null data set no data set created;
_infile_ the input line INPUT; put _infile_;
Some Examples:
_N_
DATA a;
SET b;
IF _N_ = 1 THEN DO;
sumscor=0;
sumscor + score;
END;
ELSE sumscor + score;
/* for first observation in dataset set the variable sumscor */
/* to 0 (initialize it) then add the first score. NOTE the SAS */
/* convention 'sumscore +1' is the same as 'sumscor = sumscor +1;*/
/*RETAIN sumscore;' otherwise each time SAS processes an observatio*/
/*sumscor would be set to missing as is done for all obs. */
/*see more under RETAIN below */
FIRST.var LAST.var
When a DATA set is SORTED by a variable then SET by that variable SAS creates FIRST.var and LAST.var indicating the first occurence of a variable value and the last occurrence of that value.
This can be useful in processing data in groups by a variable.
Example: a modification of the example using _N_....
PROC SORT DATA=b; by ID; /*sort the data by student ID */
DATA a;
SET b;
BY id; /*bring in the data sorted by student ID */
IF FIRST.ID THEN DO; /*reset sum to zero for each new student*/
sumscor=0;
sumscor + score;
END;
ELSE sumscor + score;
IF LAST.id then OUTPUT; /*only output the last observation for*/
/*each student with ID and sumscore */
PROC PRINT;
VAR id sumscor;
_NUMERIC_, _CHAR_ and _ALL_
These abreviations can be used in forming ARRAYS or in PROCS to
represent all variables (_ALL_), the numeric variables (_NUMERIC_)
or the character variables (_CHAR_) in a dataset.
e.g.,ARRAY x (I) $ _CHAR_; /*form an array of all the char. vars*/
PROC MEANS;
VAR _NUMERIC_;
PROC FREQ;
TABLES _CHAR_;
_NULL_ used in DATA statement (e.g.,DATA _NULL_) to tell SAS not to create a SAS dataset but invokes the DATA step programming. This can help conserve resources.
example:
filename out 'raw.data';
DATA _NULL_; /*punches raw data from a SAS dataset */
SET b;
FILE out;
PUT id score1-score5;
RETAIN retaining a variable value in SAS DATA step
The SAS DATA step is a loop and sets each variable to the system value for missing [period for numeric (.) blank ('') for character] before each observation is read in with INPUT SET or MERGE. When RETAIN is issued for a variable the variable is not set to missing and keeps its value from the last execution of the DATA STEP.
RETAIN can also be used to intialize a variable to a value.
examples:
DATA a;
set scores;
RETAIN SUMSCOR 0; /*set sumscore to 0 for first obs. */
sumscor = sumscor + score; /*add score to sumscor which will*/
/*not be set to missing */
GOTO (GO TO), LINK, with label, RETURN
GOTO label; can be used to send the program past statements to another part of the program (jump) and begin executing from that point. LINK ...RETURN has the same effect with the exception that when RETURN is encountered execution returns to the point just after the LINK statement. With GOTO execution continues from the label on and does not return to the point after the GOTO.
Example:
DATA a; /* divide x by 100 if too big (assume an error)*/
SET b; /*mark those not fixed and those fixed */
IF x > 100 THEN GOTO fixup;
type ='NOFIX';
OUTPUT;
FIXUP: x = x / 100;
TYPE = 'FIXED';
OUTPUT;
/*same result as above */
DATA a;
SET b;
type='NOFIX';
IF x > 100 THEN LINK fixup;
OUTPUT;
fixup: x = x / 100;
type='FIXED';
RETURN;
There are many more programming statements too numerous to cover in this introduction. SEE SAS BASICS 'Statements Used in the DATA Step' for more information, examples and help.
Section V. Sample of SAS PROCs (PROCEDURES).
SAS has a large library of PROCS or procedures with a variety of functions, including utilities, statistics and graphics. This section covers some of the PROCS from the first two categories.
In general the language for each PROC is specific to that PROC. TITLE, LABEL, and FORMAT statements may be used to write text on output, label variables and specify formats for variables respectively. Use OPTIONS to set page and linesize.
All PROCs act on the last SAS data set created in your program unless a data set is specified using the DATA= option. The statistical procs and some others (such as sort) also offer the option to OUTPUT results to a SAS dataset.
The following is by no means exhaustive in terms of the library of PROCs or what those mentioned can do. The Utilities are described in SAS USERS GUIDE BASICS and the more advanced statistical PROCS are described in SAS USERS GUIDE
STATISTICS.
PROCS
'Utility' PROCS:
PROC SORT SORTS data BY variable(s) values in ascending order unless DESCENDING is specified in the BY statement.
A BY statement MUST be used with PROC SORT; SORT will sort numeric and character data see the SAS BASICS GUIDE for sort order.
PROC SORT is needed prior to running other PROCs using a BY variable or to MERGE datasets (interleave them) by a common variable or variables.
PROC SORT is also used to SET a dataset BY a variable in order to create system variables FIRST.var and LAST.var.
Data sets may be sorted by more than one variable. Data sets may be sorted in ascending (the default) or decending order. The data set to be sorted will be replaced by the sorted data set unless an OUT=dataset option is used.
Examples:
(see sections on DATA step for examples of using MERGE, SET
and FIRST.var.)
PROC SORT DATA=a; /*obtain means for age height and*/
BY sex; /* for males and females */
PROC MEANS;
VAR age height weight;
BY sex;
PROC SORT DATA=b; /* sort the data by ascending date */
BY date DESCENDING time; /* and descending time */
PROC PRINT prints a SAS dataset. PRINT will print all the variables unless a VAR statement is used. PRINT also offers ID BY PAGEBY SUM and SUMBY statements to customize the output. [Use OPTIONS to set format for PRINT such as Linesize (width) and pagesize e.g., OPTIONS LS=80 PS=66 nocenter;]
e.g.,PROC PRINT DATA=b;
VAR age height ;
PROC FORMAT create formats for variable values. Can be used to group continuous data into discrete groups for frequency counts etc.
e.g.,PROC FORMAT;
VALUE mysexf 1='Females' 2='Males' .='N/A';
VALUE myagef 0-1='Babes' 2-9='Kids' 10-12='Younsters'
13-19 'Bad Craziness1' 20-30='Young Adults'
31-41='Folks' 42-49='Bad Crazziness2'
50-65='Til Senior' 65-88='Golden Years'
88-100='George Burns';
PROC FREQ;
TABLES age*sex ;
FORMAT age myagef. sex mysexf.;
(format libraries can be written out also.)
(formats associated with the DATA step remain
in effect for that data set, formats associated
with PROC steps are temporary for that PROC.
PROC CONTENTS lists data set directory, variable names, positions, length, and format.
e.g.,PROC CONTENTS DATA=work._ALL_ NODS;
(show all the work. datasets, NODS means NO Data Sets
only Data Set NAMES).
and
LIBNAME sasdat '[]';
PROC CONTENTS DATA=sasdat._ALL_ SHORT;
(show all the datasets on your default directory
and the variables in short format).
PROC TRANSPOSE transposes data set creating variables from values and values from variables. Transpose can be helpful when you have data in a horizontal form (e.g.,score1 score2 score3 etc.) and need it in a vertical form (e.g.,score trial 3 observations from each one of the above.) SEE SAS BASICS for examples.
PROC COPY Copies SAS data sets.
PROC CPORT/CIMPORT is used to create and EXPORT
dataset from on machine type to another.
SAS datasets can also be copied with DATA and SET statements.
PROC DATASETS Manage datasets (detete, copy, rename)
DATA COMBO;
SET A B C D;
PROC DATASETS LIB=work;
Delete A B C D;
PROC MEANS DATA=COMBO;
Descriptive Statistics PROCS
Simple Descriptive PROCS. Can output results for printing or copying or as a SAS dataset to be passed to another PROC or DATA step. A BY statment can be used to produce statistics for each value of BY variable(s).
PROC MEANS provides mean, min, max, sum, range, standard
error, standard deviation.
(see also PROC SUMMARY and UNIVARIATE)
e.g.,PROC MEANS MAXDEC=3 DATA=a;
VAR age;
BY sex;
OUTPUT OUT=sxmeans MEAN=agemean;
(means for variable age for data set a, one set of means for each value of sex is output to dataset 'sxmeans' as the variable 'agemean')
PROC FREQ provides tabluation of frequency of occurrence of variable values and cross tabs. (see also PROC TABULATE and CATMOD)
e.g.,PROC FREQ DATA=a;
TABLES choice/ out=c;
BY county;
(frequencies for the variable choice by county output
data sas dataset work.c. Variables are: County, choice
COUNT (frequencies), CUM and CUMPER.)
PROC CORR provides Pearsons and Spearmans correlation
coefficients for compared numeric variables
e.g.,PROC CORR SPEARMAN;
VAR AGE HEIGHT;
WITH WEIGHT;
BY SEX;
APPENDIX I - SAS OUTPUT, FILE DEFINITIONS, ONLINE HELP
1. SAS OUTPUT -
SAS creates work files on your DISK while running which it erases. It writes LOG and LISTING files to your disk after each run. LOG or SASLOG files contain notes, warnings and error messages. Errors are underscored (__).
Use the LOG file for debugging programs.
The LOG file is called fn.LOG try1.log
e.g.,running a file called PROG1 will create:
PROG1.LOG (Unix)
If you run a PROC the output will be put into a file called LIS or LISTING. Running PROG1 will create:
PROG1.LIS (Unix)
Use PRT to print the log and listing.
Unix: PRT filename.lst
2. FILE DEFINITIONS
While SAS syntax is the same in Unix, VMS, and DOS the file definitions vary. File defintions are used to tell SAS where to read and write files.
READING a 'RAW DATA FILE'
Unix
In Unix (Mead) the FILENAME command is used. It takes the form: FILENAME DDname 'file.name'; Where file.name is the file being defined and DDname is the keyword you choose (8 alphanumerics starting with letter or underscore).
FILENAME monk1 'monkey.data';
DATA a;
INFILE monk1;
INPUT date MMDDYY6. time LOCA $8.;
/* rest of SAS code */
WRITING SAS DATASET
Unix:
In Unix (Mead) the LIBNAME command is used. It takes the form:
LIBNAME DDname 'path or directory';
Where path or directory is the location to write the file and
DDname is the key word chosen to represent the path. The
DDname may be 8 alphanumerics starting with letter or
underscore. The sample sas code:
FILENAME monk1 'monkey.data';
LIBNAME outdat '~user/sasdat'; /* writes to ~user/sasdat */
DATA outdat.monk;
INFILE monk1;
INPUT date MMDDYY6. time loca $8.;
/* rest of sas code */
writes out SAS dataset as: monk.ssd01
/* more SAS Statements */
READING a saved SAS DATASET:
Unix:
LIBNAME outdat '~user/sasdat';
PROC MEANS DATA=outdat.monk;
VAR AGE WT;
or
LIBNAME outdat '~user/sasdat';
DATA a;
SET outdat.monk;
3. ONLINE HELP and INFORMATION
Online HELP can only be obtained via SAS DMS (Display Manager System) on a PC or Mac.
Windows or PC
Type HELP from any command line or HELP Topic
Use GOBack, END and KEYS (type KEYS to see definitions)
to navigate help windows to exit all help use: =x
Use: ENDSAS or BYE to exit DMS sas.
SAS SAMPLE PROGRAM LIBRARY
SAS provides a Library of SAMPLE Programs as Examples.
The sample programs cover BASE, STAT, GRAPH and ETS, IML etc.
Unix - (Mead)
The directory with the samples is:
/sy30/products/aix/sas612_ts020/sas612/samples
in the directory are subdirectories for each sas module:
base/ eis/ gis/ insight/ spectraview/
connect/ english/ graph/ or/ stat/
dbi/ ets/ iml/ qc/
Windows and Mac:
Start Sas and Click HELP and then Sample Programs
FINDING DEFAULT SETTINGS - OPTIONS or GOPTIONS
OPTIONS - to find the default settings for OPTIONS run a sas
program with the following line(s):
/* this file gets sas options and goptions */
proc options;
proc goptions;
Run this program. The options are printed in the .log
Or under Windows or PC Click GLOBALS then OPTIONS and Global Options.