Vancouver

Link (Zoomable)

Toronto

Link (Zoomable)

Ottawa

Link (Zoomable)

Montreal

Link (Zoomable)

What I did it to use the GPS coordinates recorded for the accidents to map them to their closest intersection. The measurement for each intersection contains accidents at that intersection as well as on the street nearby. Since the intersections with the highest number of accidents are just those with the highest amount of traffic I normalized with traffic data from the Toronto Traffic Safety Unit. In doing so we run into a well known statistical phenomenon which is that intersections with the smallest number of accidents dominate, because of the noise in the observation.

In order to deal with this we can use Bayesian statistics; I’m putting a prior on what we think the observation should be we can prevent the noise in small sample size intersections from dominating. I’m using a technique called Empirical Bayesian analysis which calculates a prior from the average of all the intersections in the dataset.

What do we end up with after doing this? The following is a list of the most dangerous intersections we recover with this technique.

Many of the intersections we see are on Bloor or Queen which are significant on the routes that do not have bike lanes.

All of these intersections are not straight.

Avenue and Lonsdale

Bloor and Parliament

Broadview and Gerrard

Stay safe, here’s a list of the top 50 most dangerous intersections that I determined using this method.

Top50 (Excel)

Top50 (csv)

As requested, a complete List.

Complete List (csv)

There was a request for the same map but with the absolute number of bikes which is Here

]]>Next I ran a clustering algorithm (Markov clustering) to break up this graph into manageable groups of subreddits. Below is the full graph, I’ve colored a subset of the clusters, and its tough to see from this zoom, but I’m going to go through the groups in more depth below. Subreddits in gray ended up in smaller clusters.

This view contains a number of the major subreddits, in particulary subreddits relating to politics (GREEN) as well as “depth” topics (RED, PINK YELLOW).

This group has stuff like “pics”, “funny”, “aww” and I call it the lighter side of reddit (PURPLE)

This is a whole bunch of subreddits relating to computer programming (BLUE)

This is a group of pornographic subreddits (LIGHT BLUE)

I am going to call this group “safe for work porn” (ORANGE)

This is a group of computer game subreddits (YELLOW) as well as non-gaming role playing and fantasy (GREEN).

In this corner we have music related subreddits (SHADES OF ORANGE).

That’s all that I’m going to comment on, there are quite a few more groups that I don’t have enough space to go through, but I’ve attached the full list of clusters that come out of this Full list of clusters

]]>