Best Bang for Your Buck

Sachin Naik
4 min readJun 25, 2021

Note to reader — This is a project I did in the past and I have kept it as is but I plan to revise it. The git hub repo for this project is being updated and it will be posted soon!

The goal of this project is to perform Exploratory Data Analysis (EDA) on imdb’s top 1,000 movies and make recommendations to a new movie production company that will help increase their profit return rates.

Getting The Data: The first part of this project involved collecting data to use for analysis via a web scraper. We used the Beautiful Soup python library to to extract information from www.imdb.com. (Further details on web scraping coming soon). This data was fairly structured and required minor cleaning. Movies with multiple genres were duplicated in such a way that they were listed as a single genre. There were a few outliers in terms of profitability, budget and income that we had to remove.

Our thought process: A lot of high grossing movies have a high budget, which does not necessarily translate into a high profit percentage. With this in mind, the main theme of our recommendations was to assess rate of return (profit percent) instead of revenue. In other words, give the movie production company the best bang for their buck.

Question 1 : What kind of budget ?

To answer this question, we first wanted to see if there was any correlation between movie budget and the generated income.

Gross income vs Budget of top 1000 Movies on imdb in millions USD

Budget and income had a moderately strong correlation. This suggests that a high budget would very likely generate a high income. This makes sense because high budget movies have a lot more resources to ensure the success of a movie, but does this mean it has a high return rate as well?

Profit percent vs Budget of top 1000 Movies on imdb in millions USD

Profit percent and budget had a weak correlation, so we decided to look at the median return rate of movies. Small budget movies (around $25 million) had a return of about $100 million, which is roughly 400%. This suggests small budget movies were able to generate almost the same profit as higher budget movies, ($50 million — $200 million) but with a much higher return rate. Based on this, we recommend a budget of $25 million.

Question 2 : What about genre?

We looked at the return rates for movies based on their genre.

Genres with the highest Profit percent

Film noir, Western and music were the top 3 most profitable genres. The return rates are shown in the graph above. Film noir was the lowest in terms of budget, (around $2 million) followed by music ($5 million) and then Western ($10 million).

Question 3: How long should the movie be?

Since each minute of the movie production costs money, we wanted to see if there was a correlation between the movie run time and return rate. Interestingly, there was almost no correlation, so we decided to use the median amount of run time for the top 100 most profitable movies. We recommend a run time of about 116 minutes based on the graph below.

Question 4 : Which Director ?

Depending on the genre the movie production company decides to go with, here is a list of directors who have had the highest number of profitable movies:

What else can we do ?

  • We can collect more data and use it to make profit predictions based on various factors, such as genre, budget, etc.
  • We can analyze time of release to determine if that has an impact on profitability
  • We can research movie studios that have had the highest success rate when it comes to generating high profit.

My experience: Exploring data is usually the first step in the data science process and overall I had a lot of fun working on this project. The most fun part of it was web scraping and I found the process of gathering data by automation very gratifying.

Thank you for reading!

--

--