TOC
2019 - A Year in Bollywood
Before getting started
Source of Data :
This is one of the first posts I am writing in which I have scraped the dataset myself off web Wikipedia
Based on structure of data, seeking to get answers for below based on dataset:
1. Which Actor had most movies in 2019
2. Which Genre works & which don’t in bollywood ?
3. Which production houses collaborate to make movies the most ?
4. How many movies gets released on “Non-Friday” days ?
5. Lastly, Have English title in Bollywood movies the new normal !
Lets get started
Starting to scrape the data related to Bollywood movies from the Wikipedia link shared earlier
library(purrr)
library(htmltab)
library(lubridate)
url <- "https://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2019"
tbls <- map2(url, 4:7, htmltab)
tbls <- do.call(rbind, tbls)
tbls$Cast <- gsub('([[:upper:]])', ' \\1', tbls$Cast)
tbls$Cast <- gsub(' ', '-', tbls$Cast)
tbls <- tbls %>% mutate(release_date=dmy(paste0(Opening.1,"-",Opening,"-",2019)),day=wday(release_date,label = TRUE))
Viewing scraped data
1. Which Actor made most movies

Actor who made the most movies in 2019 - Jimmy Shergill all the way !
2. Which Genres work in Bollywood ?
Which Genres work in Bollywood - Drama & Comedy by far the most common Genres
3. Which Production house make the most movies
Which Production House make the most movies - Though it looks that RSVP individually made the most movies, it is interesting to note that banners such as T-Series and Dharma made more movies but in collaboration with various other banners
4. How many movies gets released on “Non-Friday” days ?

Figure 1: Around 10% of movies got released on special days, mostly big banners/stars movies
5. Are English titles the new normal in Bollywood
On an Overall basis - 36% of total 102 Bollywood movies released in 2019 had English titles
If we see Genre wise we see that Comedy have more chances of having a Hindi Title than Drama movies.
comments powered by Disqus