Voter Data in California and San Diego

Introduction

This page summarizes what I know about voter data and systems for accessing it and was current as of April, 2008. Things may be different now.

In general, the data are compiled by individual County Registrars of Voters, who then provide their data to the California Secretary of State. I believe that the State imposes some requirements for the minimum amount of data that must be reported, and I know that individual County Registrars can, and sometimes do, maintain more than the minimum required data.

This page has three major sections:
Aggregate Data and Election Results
Samples of Raw Data
Working with the Detailed Data

Aggregate Data and Election Results

Consolidated information about voter turnout percentages and election results are available online from the California Secretary of State and the San Diego County Registrar of Voters. There are no doubt other sources, but this is from the horse's mouth, so to speak.

California State Aggregate Data

The California Secretary of State website is at http://www.sos.ca.gov/elections/elections.htm
At the moment, that page looks like this:

CA Sec of State Voter Data Page on 4/6/2008

The pages you'll want to check out are the Voter Registration Statistics, under the Voter Registration heading, and the Election Results, under Candidates and Elections. Those will give you information about the total number of registered voters by party, and election results and turnout statistics.

San Diego Registrar of Voters Aggregate Data

The San Diego Registrar of Voters home page is here: http://www.sdvote.com
As of 4/6/2008, that page looks like this:

SDCounty Registrar Home page

You'll want to check out the link to Past Elections, for detailed reports of turnout and results of elections, and the link to Reports on line for general turnout and registration reports.

Samples of Raw Data

Most of us will never see what the data look like "under the hood" of a voter lookup system, but I think it's helpful to know what the underlying data look like.

San Diego County Registrar of Voters Raw Data

My entry in the San Diego County Registrar of Voters database looks like this:

status  Abbr  affidavit  last_voted  name_prefix  name_last  name_first  name_middle  name_suffix  house_number  house_fraction  pre_dir  street  type  post_dir  building_number  apartment_number  city  state  zip  precinct  portion  consolidation  alpha_split  party  reg_date  image_id  phone_1  phone_2  military  gender  PAV  source  birth_place  birth_date  care_of  mail_street  mail_city  mail_state  mail_zip  mail_country  ltd  language  drivers_license  email  reg_date_original  perm_category  confidential  IDRequired  Citizen  UnderAge  precinct_name  hDist  sDist  01 09/25/2007 special election 72  02 06/05/2007 city of vista special 71  03 03/06/2007 special election 70  04 11/07/2006 gubernatorial general 68  05 06/06/2006 gubernatorial primary 67  06 04/11/2006 special primary - 50th cong. district 66  07 01/10/2006 city of san diego special run-off 65  08 11/08/2005 special statewide 64  09 07/27/2005 sheriff reserve payroll 53  10 07/26/2005 city of san diego - spec muni election 63  11 06/07/2005 city of oceanside - spec muni election 61  12 05/03/2005 ramona mwd special election 62  13 03/08/2005 rainbow mwd #4 recall election 60  14 02/15/2005 city of santee special municipal election 59  15 01/04/2005 city of san diego special run-off election 58  16 11/16/2004 special municipal election 57  17 11/02/2004 presidential general 56  18 03/02/2004 presidential primary 51  19 10/07/2003 statewide special 50  20 11/05/2002 gubernatorial general 47
A  V-C29  BE083007  POWELL  STEVEN     9999 ANYSTREET ST 2 SAN DIEGO  CA  92116 255550 0 DEM  1/3/2006 1002571 8585512021 N  M  Y  CA  1/1/1901 9999 ANYSTREET 2  SAN DIEGO  CA  92116 7/27/2006 5/13/2001 PERM  N  HILLCREST  A  A(DEM)  A  A(DEM)  A(DEM)  V(DEM)  V  V

If you scroll to the right, you'll see there are voter history data for 20 past elections. You can also see that there's a code in there saying "DEM," so you can see that in THIS database, you would be able to identify swing voters by finding voters who have voted on different parties' ballots in the past. The statewide data, described in more detail below, do NOT seem to contain this information.

You'll also notice a field for email address. Mine happens to be blank, but I can tell you that about 15% of the entries DO have an email address. Not a large percentage, and many of them are no doubt wrong, but it does suggest a cheap way to contact a lot of people. I'm not sure about the legality of using this field -- I know that it's legal to use the phone numbers, so it seems that using the email address would also be legal. You'll notice that the statewide data, again, do not contain this field.

I heard from someone who works with these data a lot that the registrar doesn't often bother to update phone numbers, and my data proves this out -- the phone number is a work phone from about 10 years ago. The rest of the data are current and correct.

California Secretary of State Raw Data

My entry in the California statewide voter database as of about December 2007 looks like this:

Locality Code Registrant ID Last Name First Name Middle Name Name Suffix Addr Num Addr Num Suff St Dir Prefix Street Name Street Type St Dir Suffix Unit Type Unit Number City State Zip Telephone (Area Code) Telephone Exchange Telephone Number Mailing Address1 Mailing Address2 Mailing City Mailing State Mailing Zip Language Date of Birth Gender Party Status Status Reason Registration Date Precinct Precinct Part Registration Method Code Assistance Flag Place of Birth Section Township Range Direction Previous Registration ID Previous County Code Previous Last Name Previous First Name Previous Middle name Previous Name Suffix Previous Residence Street Number Previous Residence Street Number Suffix Previous Residence Street Name Previous Residence Street Direction Prefix Previous Residence Street Direction Suffix Previous Residence Street Type Previous Residence Unit Type Previous Residence Unit Number Previous Residence City Previous Residence State Previous Residence Zip Name Prefix Previous Name Prefix Col058 Elec1 Elec2 Elec3 Elec4 Elec5 Elec6 Elec7 Elec8 Non Standard Address
37 1002571 POWELL STEVEN   999 ANYSTREET ST 2 SAN DIEGO CA 92116 858 551 2021 01/01/1901 M DEM A 01/03/2006 258000 0 M CA GG6 GP6 SS5 PG4 PP4 SS3 GG2 PG0

The statewide database contains less detail than the San Diego County database -- note that there is no email field, and no data for local elections. If you scroll to the far right of the table above, you'll see the codes for my voting history in the last eight elections under the columns headed Elec1, Elec2, ... Elec8. The Decoder for the Election data is below.

Election Code Election Description
PP Presidential Primary
PG Presidential General
GP Gubernatorial Primary
GG Gubernatorial General
SS Special Statewide
CP Congressional District Special Primary
CG Congressional District Special General
SP Other Legislative District Special Primary
SG Other Legislative District Special General

It looks like there's no way to tell which party's ballot I pulled for primaries. Since there are entries in all 8 columns, you can tell that I voted in 8 elections, but I've looked at other people's records and they put the codes for past elections in DIFFERENT columns! For example, in the Elec08 column for me, it says PG0 (for Presidential General in 2000), but for someone else it might say GP2 (for Gubernatorial Primary 2002)! This is a crazy way to store data but that's how they do it. From looking at other records, it appears that what they store is the data for up to the last 8 times that the person voted, starting in the first column, Elec1, for the last election in which they voted, and moving across to Elec2, Elec3, etc. for the next most recent times. A sample is below:

Voter Elec1 Elec2 Elec3 Elec4 Elec5 Elec6 Elec7 Elec8
My Voter History GG6 GP6 SS5 PG4 PP4 SS3 GG2 PG0
Someone Else's History SS3 GP2 PG0 PP0 GG98 GP98 PG96

Working with the Detailed Data

For most of us, we'll want to buy or otherwise get access to a system that's already set up to allow us to query the data and produce reports by precinct or whatever our interest is. The reason for this, in a nutshell, is the size of the database. Since I do happen to know something about trying to work with the raw data, I've include some details about that at the end of this article. For now, let's just talk about using an existing system.

Commercial Voter Databases

Commercial products exist that give access to voter data, allowing you to create precinct walk lists or phone lists, search for voters matching certain criteria, etc. From what I understand at the moment, the national Democratic Party has chosen a system called VAN -- Voter Activation Network, website at http://www.voteractivationnetwork.com/. Apparently, the party has negotiated for discounted rates for state party organizations to get access to the system. I don't have details on how this works.

It seems that the California system is accessed from California VoterConnect, website at http://www.cavoterconnect.com/. This is a bit mysterious to me, since on the home page of this site it sounds like a completely independent project, but the link to sign on to the system says "Sign into the VAN," so it must be related. I've talked to embers of local democratic clubs and some of them do have access to some system or other, so I imagine it's this one. From what I gather, it's supposed to be made available at least to candidates endorsed by the party.

Growing Your Own System from the Raw Data

As mentioned above, the statewide data are assembled from reports from the individual counties, which means that the state can never have more information than the county, and the county MAY have more information than the state. In the case of San Diego, the county data are indeed more comprehensive than what's in the state database. As mentioned above, the San Diego County data contain an email address field, whereas the State database doesn't store that at this time. However, the cost for the data is quite different: When I last checked in about December 2007, you could get the database for the whole state for $30.00, whereas data for all of San Diego County cost over $400.00. Furthermore, the County data may be slightly more up-to-date, since they have to put it together first before the state can incorporate it into the statewide database. However, I have it on good authority that it only takes a week or two for the state to incorporate new data from a county, so this shouldn't be a big problem.

If cost is an issue, keep in mind that buying data from the County gets you -- you guessed it -- County data. If what you're really interested in is a Congressional District, you're going to have to contact the registrar for every county your district touches. Maps of Congressional Districts are at http://www.calvoter.org/voter/maps/index.html (Be sure to check out the link to Archived 1991 district maps at the top of that page to see how the districts were gerrymandered for the latest boundaries. Apparently this was a joint effort by both Democrats and Republicans to maintain their seats in Congress.)

OK, so you can purchase the raw data from either the Secretary of State or from the San Diego County Registrar of voters. If you go this route, you're going to have a challenge due to the number of voter records (unless of course you only order some smaller subset of the data, which you can do). San Diego County currently has about 1.3 million voters, and the California state database is over 15 million. Either of these is far too big for an Excel 2003 spreadsheet, which will only accept 65,536 rows, and even Excel 2007, which can only handle 1,048,576 rows. Microsoft Access, however, should be able to handle the whole 1.3 million records from the County database. As for the statewide database, Access 2003 has a file size limit of 2Gb. There MAY be some way to get most or all of the 15 million state records into it, since the compressed text file is only about 750Mb. However, the uncompressed text file is nearly 3 Gb, and I can tell you that trying to import it directly into Access failed. When you go to buy the data from the state, they do warn you that you're not going to be able to do anything with it unless you have a real database program and some skill. Being a geek, I have tried to work with it, with some success. Here's what I know:

Skip this paragraph if you're not a computer geek because I'm not going to try to make this entirely understandable. The California statewide database is supplied as a tab delimited text file. I tried importing it using the import wizard in SQL Server 2005, and had nothing but problems. I WAS, however, able to easily import it using the wizard in SQL Server 2000. I think I used an evaluation version of the full product, not that other stripped down version that's free to use but has limits on database size and number of concurrent users, so I'm not sure if that one will work. THEN, I copied the table from the SQL Server 2000 database into a SQL Server 2005 database. Once it was imported, being a novice user, I had major performance problems until I learned to and built some indices. Once I'd built some indices, it was no problem to use ODBC from Access to query the data. (Before building indexes, Access could FIND the table but any attempt to read it or query it would, after a couple minutes, produce nothing but an error window.) I haven't yet tried accessing the data in that original SQL Server 2000 database so I can't say with authority whether building indexes in SQL Server 2000 would also make the data accessible via ODBC from Access, but I'd guess it would. A final note: I have it on good authority that the RegistrantID field in the California database is not, on its own, a key field, since individual county registrars assign it. I was, however, successful in creating a composite key using Locality Code and Registrant ID.