Convert Continuous to Ordinal in Sas Jmp

SAS formats are flexible, dynamic, and have many uses. For example, you can use formats to count missing values and to change the order of a categorical variable in a table or plot. Did you know that you can also use SAS formats to recode a variable or to bin a numerical variable into categories? This can be very convenient because you do not need to create a new variable in the data set; you merely apply a format to an existing variable.

Income categories: Are you middle class?

Many people use several IF-THEN/ELSE statements in the DATA step (or in a DATA step view) to create a new discrete variable that represents binned values of a continuous variable. That is a fine technique, but an alternative technique is to create a user-defined format that bins a continuous variable. One advantage of a custom format is that you can apply the format to multiple variables in multiple data sets.

For example, suppose that you want to define income categories such as "working class," "middle class," and the ultra-rich "1 percenters." According to a 2012 Money magazine article, the following cut points divide US household income into seven categories:

              /* 2012 income categories for US according to Money magazine */              proc              format; value IncomeFmt         low   -<              23000              =              "Poverty"              /* < 23 thousand         */              23000              -<              32500              =              "Working"              /* [ 23,  32.5) thousand */              32500              -<              60000              =              "Lower Middle"              /* [ 32.5, 60) thousand  */              60000              -<              100000              =              "Middle"              /* [ 60, 100) thousand   */              100000              -<              150000              =              "Upper Middle"              /* [100, 150) thousand   */              150000              -<              250000              =              "5 Percenter"              /* [150, 250) thousand   */              250000              -   high  =              "1 Percenter";              /* > 250 thousand        */              run;

The call to PROC FORMAT creates a custom format called IncomeFmt. When you assign the IncomeFmt format to a numerical variable, SAS will look at the value of each observation and determine the formatted value from the raw value. For example, a value of 18,000 is less than 23,000, so that value is formatted as "Poverty." A value of 85,000 is in the half-open interval [60000, 100000), so that value is formatted as "Middle."

The following DATA step defines the incomes for 26 fictitious individuals. The IncomeFmt format is assigned to the Income variable:

              data              incomes;              length              Name $10.;              input              Name Income @@;              format              Income IncomeFmt.;              /* assign IncomeFmt format to Income variable */              datalines; Amy              65100              Brad              146500              Carlos              113300              Dimtri              28800              Eduardo              233300              Felicity              14600              Guo              43400              Hector              141700              Irene              53400              Jacob              170300              Katerina              43100              Liu              66800              Michael              15800              Nancy              30900              Oscar              31800              Pablo              65800              Quentin              40000              Rick              62200              Stephan              32900              Tracy              64000              Umberto              124000              Victoria              220700              Wanda              263800              Xie              9300              Yolanda              23400              Zachary              93800              ;

The Income variable is a continuous variable, but the format bins each value into one of seven discrete values. Consequently, SAS procedures can analyze the Income variable as if it were a discrete variable. For example, you can count the number of individuals in each income category by calling PROC FREQ:

              proc freq              data=incomes;     tables Income / nocum norow nocol;              run;

Use SAS format to bin a numeric variable

Assigning or unassigning formats at run time

The underlying data values are not lost. You can use a FORMAT statement in a SAS procedure to temporarily assign or unassign a format. If you remove the format, you can analyze the underlying raw data. For example, the following call to PROC UNIVARIATE analyzes the raw incomes:

              proc univariate              data=incomes;              format              Income;              /* remove the format for this analysis */              var              Income;              run;

In a similar way, if you specify the Income variable on a CLASS statement in a regression procedures, the formatted values are used for the analysis. However, if you do NOT include it on the CLASS statement, then the variable is treated as a continuous variable and the unformatted values are used.

Subset data by using formatted values

If you run PROC PRINT on the income data, it LOOKS like the Income variable is a character variable. Furthermore, it is analyzed like a character variable when used in some SAS procedures such as PROC FREQ. Consequently, you might forget that the Income variable is actually numeric. However, if you treat Income as a character variable in the DATA set or a WHERE clause, then SAS will report an error. For example, the following WHERE clause is incorrect:

              proc print              data=incomes;              where              Income              in              (              "5 Percenter",              "1 Percenter"              );              /* WRONG: Income is numeric */              run;
ERROR: WHERE clause operator requires compatible variables.          

SAS reports an error because the WHERE clause cannot compare the raw (numeric) values of the Income variable with elements of a set that contains two strings. When you see an error message like this, use PROC CONTENTS to investigate the attributes of the variable:

ods              select              Position;              proc contents              data=incomes              order=varnum;              run;

The format associated with a SAS variable

The output from PROC CONTENTS informs you that the Income variable is numeric and displays the name of the format that is attached to it.

If you know the cutoff values that are used for the format, you could create a valid WHERE clause that uses numeric values: where Income GE 150000. However, usually it makes more sense to create a valid WHERE clause by using the PUT statement to apply the format to the raw data and compare formatted values:

              /* use formatted values to subset data */              proc print              data=incomes;              where              put              (Income, IncomeFmt.)              in              (              "5 Percenter",              "1 Percenter"              );              run;

Use a format to subset data in SAS

You can use other DATA step functions when constructing WHERE clauses. A typical example is when a variable is a SAS date. For example, the Sashelp.Air data set contains a variable named Date. You can use the following WHERE clause to analyze the subset of data that corresponds to the month of January in years prior to 1955:

              proc print              data=Sashelp.Air;              where              month              (              date              )=1              and              year              (              date              )<1955;              /* all January dates prior to 1955 */              run;

Summary

As shown in this article, SAS formats are very useful and flexible:

  • You can use a custom format to bin a continuous variable into categories.
  • Within a SAS procedure, you can temporarily assign or unassign a format to change the way that the data are analyzed.
  • The WHERE clause looks at raw data values, so use the PUT function in a WHERE clause if you want to subset the data according to the formatted values.

downingevaithere1999.blogspot.com

Source: https://blogs.sas.com/content/iml/2016/08/08/sas-formats-bin-numerical-variables.html

0 Response to "Convert Continuous to Ordinal in Sas Jmp"

Postar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel