The Secret Links Between Political Candidates

Political spending has been steadily rising each election cycle and it's likely this trend will continue with the upcoming presidential elections; so each year it becomes more and more important to understand exactly how this money enters our democratic process and how it influences it.

There has been great work at places like MapLight to analyze issues such as how money source correlates with representative's votes, but there has been a notable lack of analysis of the ties between candidates that can be revealed by election year contributions. From this type of analysis we could potentially learn:

  • Which candidates have the closest contribution profiles?
  • Are there groups of candidates with close ties?
  • Are there significant across the aisle correlations in campaign contributions?

To answer these questions and more we will use Federal Election Commission (FEC) data from most expensive election in history, the 2012 US presidential election cycle.[1]

Following the Money

Above we have a binary heatmap (black is a contribution, grey is no contribution) of which committees made contributions to each candidate in the 2012 election cycle.

The first decision we have to make is how to quantify contribution associations between the candidates. As you can see the by far most common scenario is that a committee did not donate to a candidate (grey), so if we used a naive correlation measure like the Pearson correlation we would find all candidates are highly correlated simply because they share a large number of committees that did not donate to them!

So we need to choose a measure that gets at what we really care about when comparing two candidates - namely what committees donated to both candidates out of the committees that donated to either candidate (committees that did not donate to either do not affect our calculation). This type of correlation is called the Jaccard index.[2]

We've calculated this Jaccard coefficient between all the candidates and clustered the columns (and rows) of the binary heatmap based on them in the figure above. As you can see our associations naturally form four candidate groups (from left to right): Republicans, Democrats, Mixed, and Sparse.

Also, as expected, committees created to serve a single candidate (principal) do not link many candidates together (bottom of heatmap, yellow) whereas lobbyist PACs strongly link politicians together (top of heatmap, green).

Using permutations of our data to determine significance[3], we can also create a list of all significant associations between candidates, available in full here and the top ten shown below:

Click here (+) to expand table.[4]

Name 1 Name 2 Party 1 Party 2 Coef Pvalue Qvalue Cont 1 Cont 2 State 1 State 2 Type 1 Type 2
1 HATCH, ORRIN G CAMP, DAVID LEE REP REP 0.47 0.00 0.01 860 752 UT MI I I
2 WHITFIELD, WAYNE EDWARD SHIMKUS, JOHN M REP REP 0.46 0.00 0.01 392 473 KY IL I I
3 BIGGERT, JUDY DOLD, ROBERT JAMES MR JR REP REP 0.45 0.00 0.01 446 415 IL IL I C
4 UPTON, FREDERICK STEPHEN CAMP, DAVID LEE REP REP 0.43 0.00 0.01 691 752 MI MI I I
5 CLYBURN, JAMES E HOYER, STENY HAMILTON DEM DEM 0.44 0.00 0.01 445 585 SC MD I I
6 UPTON, FREDERICK STEPHEN SHIMKUS, JOHN M REP REP 0.43 0.00 0.01 691 473 MI IL I I
7 TIBERI, PATRICK J. CAMP, DAVID LEE REP REP 0.42 0.00 0.01 553 752 OH MI I I
8 MCCARTHY, KEVIN CANTOR, ERIC IVAN REP REP 0.42 0.00 0.01 538 565 CA VA I I
9 BARROW, JOHN J. MATHESON, JAMES D DEM DEM 0.42 0.00 0.01 508 512 GA UT I I
10 ROSKAM, PETER TIBERI, PATRICK J. REP REP 0.42 0.00 0.01 459 553 IL OH I I

Interestingly, the vast majority of candidates (82%) had at least one significant association with another candidate, with the strongest association being between the Republicans Orrin Hatch and David Camp:

And the top association between two Democrats is between James Clyburn and Steny Hoyer:

Anecdotally, the strongest associations appear to be between candidates who have held leadership positions in their parties, suggesting that such positions often bring with them the same set of committee contributions.

Across the Aisle

Perhaps most interesting though are the "mixed" or "across the aisle" links as we somewhat expect members of the same party to have the same contributors but we don't necessarily expect this across party lines. Here are the top ten such links:

Click here (+) to expand table.

Name 1 Name 2 Party 1 Party 2 Coef Pvalue Qvalue Cont 1 Cont 2 State 1 State 2 Type 1 Type 2
1 MCCARTHY, KEVIN HOYER, STENY HAMILTON REP DEM 0.37 0.00 0.01 538 585 CA MD I I
2 HOYER, STENY HAMILTON CANTOR, ERIC IVAN DEM REP 0.36 0.00 0.01 585 565 MD VA I I
3 HATCH, ORRIN G HOYER, STENY HAMILTON REP DEM 0.35 0.00 0.01 860 585 UT MD I I
4 CROWLEY, JOSEPH MCCARTHY, KEVIN DEM REP 0.36 0.00 0.01 411 538 NY CA I I
5 HOYER, STENY HAMILTON BOEHNER, JOHN A DEM REP 0.35 0.00 0.01 585 514 MD OH I I
6 BAUCUS, MAX CAMP, DAVID LEE DEM REP 0.35 0.00 0.01 474 752 MT MI I I
7 LUCAS, FRANK D. PETERSON, COLLIN CLARK REP DFL 0.42 0.00 0.01 300 260 OK MN I I
8 UPTON, FREDERICK STEPHEN HOYER, STENY HAMILTON REP DEM 0.34 0.00 0.01 691 585 MI MD I I
9 CROWLEY, JOSEPH TIBERI, PATRICK J. DEM REP 0.35 0.00 0.01 411 553 NY OH I I
10 NEAL, RICHARD E MR. TIBERI, PATRICK J. DEM REP 0.36 0.00 0.01 353 553 MA OH I I
The strongest across the aisle correlation is between Kevin McCarthy (House Majority Leader) and Steny Hoyer (former House Majority Leader):

One thing you may notice if you look at the full data is that across the aisle associations seem to be more common with incumbent candidates than challenger or open seat candidates - looking at the median Jaccard coefficient for each type of pairing further confirms this suspicion with a median of 0.11 for associations between incumbents and a median of 0.06 between incumbents and challengers.[5]

To evaluate this idea further, we can cluster the candidates by their Jaccard coefficients[6] and create a symmetric heatmap with the candidate's party on the columns and the candidate's incumbency status on the rows:

We see that there is extremely strong clustering for incumbents! Especially interesting is the mixed group of Democratic and Republican candidates that are more clustered with each other than with their main party groups and almost entirely consists of incumbents.

There is also a compelling case for "institutional challengers" (green) who group in funding with the incumbents (purple) of their party as opposed to the "outsider challengers" who receive separate funding.

Money Talks

The next question to ask is if these contributions can really tell us about how candidates will vote, and as it turns out this looks likely to be the case!

For example, there is a single Democrat, Henry Cuellar, grouped with the main block of republicans in the heatmap above (far left) - visualized with his strongest Republican association here:

We expect Cuellar to lean Republican ideologically - and in fact we see that is the case using data from GovTrack.us:

In addition to further pursuing these leads, there are a lot of other exciting avenues for exploration, like using techniques from graph theory to identify "node" politicians who link parties both internally and externally and predicting bill cosponsorship based on contribution similarity.

Evaluating these questions helps us better understand the forces influencing our democratic process and allows us to be more informed about a major hidden factor influencing the relationships between our politicians.


Have more ideas or suggestions about how to use this data? Drop a line at suggest@shorttails.io and maybe it can be added to this blog!


All analysis code available on Github here.


  1. Data from the FEC (2011-2012 cycle). ↩︎

  2. More formally, the Jaccard distance is the length of the intersect between two vectors divided by the length of the union. We could also have chosen measures like the mutual information but the Jaccard index has a nice intuitive rational here. ↩︎

  3. Here we use a false discovery rate (FDR) of 5% based on 1,000 permutations where the columns of the data are randomized separately and associations are recalculated. ↩︎

  4. Here Coef. is the Jaccard coefficient, Qvalue is the FDR (i.e. if this association is called significant what is the FDR), Cont. is number of contributions, and Type is the incumbency status. ↩︎

  5. A difference significant using a Wilcoxon test at p < 2.2e-16. Also note the median between challenger association was 0.07. ↩︎

  6. This heatmap displays a started log base 10 of the Jaccard coefficients for easier visualization (clustering was done on the untransformed values). ↩︎