Introducing Movie Tycoon: A GCP Hosted Product Idea for Hollywood

Leveraging NLP and Big Data on Google Cloud Platform for aiding movie investments

Akshay Madar
Towards Data Science

--

Image by author — Product Logo

A product pitch to potential customers and investors in Hollywood…

Motivation behind the product

The relationship between movies and the audience is decades old. Movies are not only a reflection of the societies that we live in, but they also shape popular culture and aspirations. However, in the age of internet, this association has been gradually transcending the realms of box office, theaters and television to online streaming and movie stock trading.

Huh! Movie stock trading? What is that? Let me explain…

Hollywood Stock Exchange® is the world’s leading entertainment stock market. At HSX.com, visitors buy and sell virtual shares of celebrities and movies with a currency called the Hollywood Dollar®.

Clearly, people are engaging with movies in newer ways. Hollywood too, as an industry, is increasingly looking towards leveraging data analytics to inform both — creative and business decisions, something which is highlighted in a story titled Big Data and Hollywood: A Love Story on the The Atlantic,

“My dream is when Hollywood really starts looking at data, and using data in a big way, and it’s driving business value.” — Richard Maraschi, Global Leader of Advanced Analytics at IBM

Movie Tycoon

Movie Tycoon helps movie investors on HSX.com by providing insights on where and whom to invest the money on. Additionally, the aim is to provide creative personnel with a tool to analyze reviews and use it as a feedback for future projects. This way, investors can identify the right price for investments in cinema business, and theater owners can schedule movie shows based on box office predictions.

Data pipeline

Fig 1. Data streams from HSX and Rotten Tomatoes feeding into the Hub Engine

In addition to reviews from Rotton Tomatoes, following data points were collected from HSX’s weekend box office:

  • Movie Name
  • Symbol
  • Week Gross
  • Total Gross
  • Genre
  • Release Date
  • Theaters
  • MPAA rating

Data architecture

Fig 2. Data architecture and associated tools for data collection

Product pipeline

After data is sourced into Hive which allows to query solutions on the movie database, and python web scraping tools are deployed to build a corpus for performing natural language processing, Movie Tycoon leverages Cloudera Hadoop on GCP to create mapper-reducer frameworks.

All data is then routed to Jupyter Notebook for further analysis, from where resulting outputs are sent as csv files to Tableau for creating dashboards for end users.

Fig 3. Product pipeline for Movie Tycoon

Hive workflow

All data flows are made easily available with HIVE SQL compute jobs. To ensure that all new box office movies data is incorporated into the analysis, the entire workflow within Movie Tycoon is automated using Scheduler jobs, with each job running every Monday at 8:00 am.

The automated scheduler took approximately 10 mins during pilot runs.

Fig 4. Hive workflow

Visualizing product insights

To aid movie investors, script writers and production houses, the entire decision support system is visualized on Tableau, using real-time dashboards such as the one shown below.

Fig 5. Real-time Tableau dashboard

Top box-office returns are observed in the following genres:

  1. Action
  2. Musical
  3. Family

NLP insights

Understanding movie reviews is crucial to the product’s success and robustness in coming up with strategic and insightful observations about how the audience perceives cinema. Content writers stand to gain massively from this yet untapped data source. Following detailed analysis using various NLP techniques, below topics are found to generate great positive sentiment:

  • Story
  • Performance
  • Brilliant Drama
Movie Theater GIF By James Curran

Conclusion

Going back to The Atlantic article again…

It is the hope of many in Hollywood that by combining deep understandings of both content and audience, studios will be able to choose and tailor their movies from the very start, and perhaps even identify some kind of magic formula to screenwriting.

Movie Tycoon will allow you to do exactly that, and much more, with features like scalability, cloud security and intelligent analytics to drive decision making. It brings both — Hollywood and audience — much closer than ever before, while leveraging modern day data science and big data tools to bring the synergy between these stakeholders to life.

For more technical documentation, you can have a look at the associated code files here.

--

--

💡 Product Manager ◼️ I write about data science + product ideas ◼️ Purdue MS ◼️ Let’s connect ➡️ www.linkedin.com/in/akshaymadar/