Adventures in Data

A blog about playing with data

A Subreddit Interaction Map

I’m a redditor, so naturally I was thrilled to find a data set of voting patterns from reddit users who have made their votes public. In this post I am showing a visualization of that data. What I did, is try and find groups of subreddits which are used by the same user. Specifically, for every pair of subreddits I asked if users use them both more often than would be expected by chance (a chi-squared test, with multiple test correction). Then I took the residual of the test (a measure of how many more users vote in both than expected by chance) and used that as a link between the 2 subreddits.

Next I ran a clustering algorithm (Markov clustering) to break up this graph into manageable groups of subreddits. Below is the full graph, I’ve colored a subset of the clusters, and its tough to see from this zoom, but I’m going to go through the groups in more depth below. Subreddits in gray ended up in smaller clusters.

High Resolution Version

This view contains a number of the major subreddits, in particulary subreddits relating to politics (GREEN) as well as “depth” topics (RED, PINK YELLOW).

This group has stuff like “pics”, “funny”, “aww” and I call it the lighter side of reddit (PURPLE)

This is a whole bunch of subreddits relating to computer programming (BLUE)

This is a group of pornographic subreddits (LIGHT BLUE)

I am going to call this group “safe for work porn” (ORANGE)

This is a group of computer game subreddits (YELLOW) as well as non-gaming role playing and fantasy (GREEN).

In this corner we have music related subreddits (SHADES OF ORANGE).

That’s all that I’m going to comment on, there are quite a few more groups that I don’t have enough space to go through, but I’ve attached the full list of clusters that come out of this Full list of clusters