Dataset: Tropes in 1030 movies you should see in your lifetime

Dataset: Tropes in 1030 movies you should see in your lifetime

What is a “Trope”?

Have you ever seen a movie or read a book in which a character needs to get information from someone, but torture would be excessive or uneffective so instead they threaten the subject of destroying something they value? Well that’s called “Interrogation by Vandalism” and it’s a trope. The presence of an extremely valuable object that is clearly going to be destroyed with great comedic effect? That’s called “Priceless Ming Vase” and is also a trope. A trope is basically a recurrent, somewhat stereotyped, narrative element (but I’m sure you will find a better definition if you are really interested in this topic).

What’s in the dataset?

In this dataset you’ll find data about the tropes contained in 1030 movies listed in one of the editions of a book titled “1001 Movies You Must See Before You Die”. To be more specific for each movie you’ll find a boolean feature named after one of more than 14000 tropes with value 1 if the trope is present in the movie, 0 otherwise. Here is an example of what the dataset looks like:

The column names correspond to how the trope names appear in the url of the tvtropes.org entry describing it (example: the “TitleDrop” trope is described in https://tvtropes.org/pmwiki/pmwiki.php/Main/TitleDrop ).

As you can imagine by looking a the numbers, a lot of tropes appear in just one or two movies. For this reason the features (tropes) are ordered from most frequent to least, so you can just truncate the rows at the n-th most meaningful and leave out the other ~14k – n less interesting columns. So “ShoutOut” is the most frequent trope, while “ZippingUpTheBodybag” is the least frequent one. Here’s what the distribution looks like:

Tropes Frequency

First 1000 tropes frequencies

Where does this data come from?

The data comes from the wiki website tvtropes.org which is an incredible sinkhole of personal time and a great place to find movies, books and any other kind of media you might like along with an impressively detailed and dangerously interesting description of any trope you could find in a story.

To be more specific I performed a scraping task (using scrapy) starting from the page about the 1001 movies you must see before you die and entering each movie page (if existing) to retrieve all the tropes listed inside. After that I just performed a little bit of wrangling on the raw data to have it in a more readily usable form.

All the content from that website is released under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 so I’m releasing this dataset under the same license. I’m also required to make clear that the team at Tvtropes does not explicitly endorses this adaptation of their work.

Download

Here is the dataset (zipped licens + csv) and the raw data (zipped license + json) if you think it might be useful.

Comments are closed.