Thursday, January 02, 2014
A peek at the massive tagging project behind Netflix
Those of you watching a lot of Netflix over winter break might have noticed some odd genre recommendations popping up. They aren't genres by the typical definition – Western, thriller, etc. – but weird amalgamations like "Immigrant-Life Deadpan Action Movies" or "Heartfelt Ghost-Story Mysteries." Those might seem too specific to be useful, but they're a byproduct of Netflix's big mission: the most thorough tagging of films and television ever attempted.
The sleuths at The Atlantic outlined the shockingly intricate system Netflix uses to categorize its 10,000+ film library. Each item in Netflix's collection has been marked with a series of keywords; by combining these, Netflix has produced at least 76,000 "genres" that it can recommend. Most fascinatingly, Netflix can adjust content based on audience expectations. If a lot of people are watching "Race Against Time Satires About Royalty," for instance, they might try to secure the rights to similar movies.
This dataset is of course inaccessible to the public, but it is by far one of the most ambitious projects ever attempted at this scale and accuracy. Netflix's usage of this data suggests a future where film tastes can be quantified and studied by studios at great levels of detail. Big data is already revolutionizing other industries; media will soon follow suit in unexpected ways.