Alameda |
Los Angeles |
San Francisco |
|
Total Census Tracts |
117 |
589 |
119 |
Census Tracts in data set |
112 |
516 |
109 |
Total Precincts |
906 |
3574 |
1052 |
Precincts in data set |
906 |
3075 |
886 |
The decision to drop some of the precincts means that not all votes were counted in this procedure and that, in turn, limits how we can interpret the data. For one thing, we do not know the number of nonvoters and thus cannot assess their characteristics. Because of that I refrained from using regression coefficients to estimate the actual percentage of vote associated with any of the social categories. This might have been possible if we had full vote counts and accurate population totals at the time of the vote.
A different dataset was created to analyze the vote outside the three largest counties. The units of analysis are municipalities and nonurban areas. Vote tallies and demographic information were derived from the sources listed above for 106 municipalities with populations of more than 2,500. By subtracting these urban totals from county level data, nonurban vote and demographic numbers were obtained for 53 counties. Because of data problems Sacramento and San Diego county figures were not available. The resulting dataset has 159 cases.
Statisticians and methodologists have been arguing about the validity of various ecological inference methods for decades. Some of the newer approaches are either very complicated or very limited. Data limitations argued against trying most of them. For example Gary King’s ecological inference (EI) method depends upon knowing the number of nonvoters and is difficult to use with more than one independent variable. I use multiple regression, realizing that some experts may be critical. I do so with appropriate caution, refraining from deriving probability estimates for the independent variables. If we had full counts of voters and non voters for each census tract and if the demographic information had been from 1934, it would have been reasonable to calculate estimates on the number and percentage of Sinclair’s voters who were blue-collar males, females, African Americans etc.
Linear multiple regression involves testing equations with different mixes of variables, looking for an equation that most efficiently accounts for variations in the dependent variable, in this case the percentage of vote for Sinclair, Merriam, or Haight. After experimenting with various combinations in the three counties, I used seven independent variables in the equation.
Variables:
Blue-collar includes all jobs listed in the census occupational categories as craft, skilled, semi-skilled, labor, farm labor, and also service. White collar includes owners, managers, professional, technical, sales, and clerical positions.
I experimented with dividing adult females into those in the labor force and those not in the labor force. In one county there were small differences associated with that distinction, but it wasn’t consistent enough to warrant adding an eighth variable to the equation.
The ethnic variables have to be used cautiously. The census recorded the number of African Americans in each census tract but not the number of Black adults and thus eligible voters. The use of the three foreign-born variables rests on the challengeable assumption that the residential distribution of voters from those ethnic populations followed the pattern of first-generation immigrants whose numbers were recorded by census track. The decision to use Russian as a proxy for Jews also introduces potential error. Finally, I should note that the San Francisco ethnicity data was incompletely recorded. For the other two counties, we recorded the numbers of foreign-born Mexicans, Italians, and Russians for each census tract. For SF we recorded those numbers only if the particular ethnic group comprised at least 2 percent of the tract population. This affects the statistical variation and thus the coefficients for ethnicity in that county should be viewed cautiously. As it happens there were very few Mexicans living in San Francisco and the regression equations showed little significant effect for the other two groups.
Start at the bottom with R squared (
). This statistic evaluates the goodness of fit of the entire equation, showing the fraction of the total variation in percent vote for the candidate explained by the seven independent variables working together. In table 2 (Sinclair's % of vote), R squared in the Los Angeles column is .68 which should be interpreted as showing that the equation is very effective. The .87 and .82 statistics for the other counties give us even more confidence in those equations. Now look at Haight’s % of vote. The R square values are much lower, cautioning that the equations are less effective.The Standardized Coefficients (Beta) tell us about the effects of each of the independent variables when the other six are held constant. Positive signs mean that an increase in that variable will be associated with an increase in the percent of the vote, negative signs mean that the effect will be reversed. Thus in Sinclair’s LA vote, the negative coefficient for % of female adults (-.18) means that holding everything else constant more women voted against him than for him. But -.18 is not as strong as the .67 coefficient for % blue-collar males. If we had complete data that accounted for nonvoters, we could create probability estimates from these statistics, reading the .67 as meaning that for every 1% increase in the number of blue collar males, Sinclair's vote is likely to increase by 0.67%. For every 1% increase in the number of adult females, Sinclair's vote was likely to decrease by 0.18% holding all other variables constant.
The Standard Errors (SE) and asterisks show whether the coefficients are statistically significant. A single asterisk means that the coefficient is significant at the .05 confidence level. Still better is the .01 confidence level indicated by a double asterisk. Any value that does not have an asterisk should be regarded as unreliable.
For more on reading regression tables: http://dss.princeton.edu/online_help/analysis/interpreting_regression.htm
Two leadership samples were created, one consisting of EPIC club chairs and assembly district secretaries, the other of EPIC legislative candidates. The first includes 167 men and women who were identified as club chairs in the first two issues of Upton Sinclair End Poverty Paper (Dec-January 1933/34, February 1934), 72 others who were listed by the time of the August primary as assembly district leaders or leaders in charge of one of the eighteen campaign headquarters around the state, and 53 candidates for the state legislature were identified in the same source.
We sought information about occupation and previous voting registrations in city directories and in the Great Registers of voters maintained by many California counties, finding occupational information for 159 (two-thirds) of the leadership sample and 44 out of 53 candidates. We found 1932 voter registration data for 113 of the leaders and 23 of the candidates.