*Program to Determine Women's Labor Force Participation Rate *Sara Gieseke *November 2004 *Program name: workforce_part1; /*variables used in the model data set for 1990 for 50 states wlfp = participation rate (%) of all women over 16 yf = median earnings (in thousands of dollars) by females ym = median earnings (in thousands of dollars) by males educ = percent of female high school graduates over 24 years of age ue = unemployment rate (%) mr = marriage rate (%) of women at least 16 years of age dr = divorce rate (%) urb = percentage of urban population in state wh = percentage of females over 16 years who are */ /*DESCRIPTIVE STATISTICS*/ /*produce basic summary statistics for all variables(mean, std, min, and max)*/ proc means data=women; run; /*produce detailed summary statistics, as well as, tests of normality, stem and-leaf plots and boxplots*/ proc univariate data=women normal plot; title "Summary Statistics for Women's Labor Force Participation"; run; /*use proc insight to create boxplots of the data on graph*/ proc insight data=women; box wlfp yf ym educ ue mr dr urb wh; run; /*TESTS FOR CORRELATION*/ /*use proc insight to create a correlation matrix in order to see preliminary correlations*/ proc insight data=women; scatter wlfp yf ym educ ue mr dr urb wh* wlfp yf ym educ ue mr dr urb wh; run; /*produce correlation matrix with p-values*/ proc corr data=women; title "Correlation matrix for Women's Labor Force Participation"; var wlfp yf ym educ ue mr dr urb wh; run; /*MULTIPLE LINEAR REGRESSION*/ /*run stepwise regression using both forward and backward method make entry and exit criteria=0.05*/ proc reg data=women; title "Women's Labor Force Participation"; model wlfp = yf ym educ ue mr dr urb wh/ selection = forward slentry=0.05; model wlfp = yf ym educ ue mr dr urb wh/ slstay=0.05 selection =backward; run; /*re-run the regression using two possible final models, the best from fwd and best from bkwd in order to see which is the best*/ proc reg data=women; title "Looking for the Best Model of Women's Labor Force Participation"; model wlfp = educ ue yf ym; run; model wlfp = yf educ ue urb wh; run; /* There is multicollinearity concerns when both yf and ym are in the model becasue they are so highly correlated. Therefore, we will use model 2 as the best since it does not contain both yf and ym and it has a higher r square(.7580 vs .7252) */ proc reg data=women; title "Best Model of Women's Labor Force Participation"; /*print out predicted values and residuals*/ model wlfp = yf educ ue urb wh / p r; /*plot residuals to check model assumptions*/ plot residual.*predicted. ='o'; run; quit;