2019 - A Year in Bollywood

This post analysis Bollywood movies released in 2019, I started this post by some basic web scraping then build to analyse few interesting stuff

Posted by Vaibhav Singh on Wednesday, January 1, 2020

TOC

2019 - A Year in Bollywood

Before getting started

Source of Data :

This is one of the first posts I am writing in which I have scraped the dataset myself off web Wikipedia

Based on structure of data, seeking to get answers for below based on dataset:

1. Which Actor had most movies in 2019
2. Which Genre works & which don’t in bollywood ?
3. Which production houses collaborate to make movies the most ?
4. How many movies gets released on “Non-Friday” days ?
5. Lastly, Have English title in Bollywood movies the new normal !

Lets get started

Starting to scrape the data related to Bollywood movies from the Wikipedia link shared earlier

library(purrr)
library(htmltab)
library(lubridate)

url <- "https://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2019"
tbls <- map2(url, 4:7, htmltab)
tbls <- do.call(rbind, tbls)

tbls$Cast <- gsub('([[:upper:]])', ' \\1', tbls$Cast)
tbls$Cast <- gsub('  ', '-', tbls$Cast)
tbls <- tbls %>% mutate(release_date=dmy(paste0(Opening.1,"-",Opening,"-",2019)),day=wday(release_date,label = TRUE))

Viewing scraped data

1. Which Actor made most movies

Actor who made the most movies in 2019 - Jimmy Shergill all the way !

Actor who made the most movies in 2019 - Jimmy Shergill all the way !

2. Which Genres work in Bollywood ?

Which Genres work in Bollywood - Drama & Comedy by far the most common Genres

Which Genres work in Bollywood - Drama & Comedy by far the most common Genres

3. Which Production house make the most movies

Which Production House make the most movies - Though it looks that RSVP individually made the most movies, it is interesting to note that banners such as T-Series and Dharma made more movies but in collaboration with various other banners

Which Production House make the most movies - Though it looks that RSVP individually made the most movies, it is interesting to note that banners such as T-Series and Dharma made more movies but in collaboration with various other banners

4. How many movies gets released on “Non-Friday” days ?

Around 10% of movies got released on special days, mostly big banners/stars movies

Figure 1: Around 10% of movies got released on special days, mostly big banners/stars movies

5. Are English titles the new normal in Bollywood

On an Overall basis - 36% of total 102 Bollywood movies released in 2019 had English titles

On an Overall basis - 36% of total 102 Bollywood movies released in 2019 had English titles

If we see Genre wise we see that Comedy have more chances of having a Hindi Title than Drama movies.

If we see Genre wise we see that Comedy have more chances of having a Hindi Title than Drama movies.


comments powered by Disqus