The Top 25 Movies of All Time Chosen by an Algorithm

The most influential movie of all time, Citizen Kane.
Can an algorithm critique movies? With my last blog article on algorithmic university rankings, I promised a series of blog posts answering these little questions. The movie industry has a rich history of analysis, much of it framing content of a movie in the context of others:

Like The Birth of a Nation and Citizen Kane, Star Wars was a technical watershed that influenced many of the movies that came after – Roger Ebert

Restricted to movies, Wikipedia is a structured map of comparative analysis. Does it hold an implicit movie ranking?

Film According to Wikipedia

Rank Name Relative
1 Citizen Kane 100%
2 The Wizard of Oz 96.6%
3 Gone with the Wind 84.3%
4 Titanic 77.5%
5 Star Wars Episode IV: A New Hope 76.9%
6 The Godfather 66.4%
7 Blade Runner 56.8%
8 Star Wars Episode V: The Empire Strikes Back 52.0%
9 Avatar 49.4%
10 The Matrix 48.1%
11 Casablanca 47.4%
12 The Exorcist 46.8%
13 The Passion of the Christ 46.5%
14 Snow White and the Seven Dwarfs 46.3%
15 Star Wars Episode VI: Return of the Jedi 46.0%
16 The Birth of a Nation 45.1%
17 2001: A Space Odyssey 43.7%
18 The Lion King 43.4%
19 Raiders of the Lost Ark 42.9%
20 The Lord of the Rings: The Return of the King 42.5%
21 The Dark Knight 42.2%
22 Fantasia 42.2%
23 The Silence of the Lambs 39.3%
24 Schindler’s List 39.0%
25 Apocalypse Now 38.9%

The list is hard to argue with. Citizen Kane even appears at the top of the American Film Institute’s 100 YEARS…100 MOVIES. Every film here is a household name. The rankings are a measure of influence and not necessarily quality. In particular, The Passion of the Christ, strikes me as out of place. It is ranked so highly because it was so controversial and had ramifications for the influential Mel Gibson.

Algorithmic Ranking

My ranking is purely algorithmic and I didn’t pick any of the movies nor manipulate the rankings. The relative column measures of the importance of a movie relative to Citizen Kane. Unlike my university rankings, movies are much closer together in influence. For example, The Wizard of Oz is almost as important as Citizen Kane (not to mention Gone With the Wind!).

The algorithm boils down to a bored surfer on Wikipedia. As someone clicks around randomly in Wikipedia articles, they will click on more influential films more than we click on less influential films. I compute a number, called the PageRank, which quantifies how often a surfer will visit a certain page by clicking randomly. The PageRank is actually part of the algorithm used by Google to show you search results on the internet – only applied to movies instead of websites. The above ranking is top 25 PageRanked movies.

Try this at home

The code for this project is all available in the WikiRank repository on Github. Creating lists like these requires two steps: link analysis and selecting out movies-articles. The first step is a standard implementation of PageRank in pagerank.go. The second is a little more sophisticated – Wikipedia has an alphabetical listing of all movies ever made. By pulling down this list (see scrape_movies.py) and following obvious links (i.e. not lists, etc.) I was able to get a master list of movies. I join the master list with my influence rankings and sort it to produce a global ranking of every movie ever made.

What about data beyond the top 25? I’m making the raw list available as a JSON file. The least influential movies include such gems as Ultrachrist! and How the Sith Stole Christmas.

Follow up: I’ve broken the most influential movies down by year in a new post about a History of Film in Wikipedia.

Disagree? Bugs? Humbugs? Post in the comments below or fire me a tweet @cosbynator