Starting from the The Movie DataBase (TMDB) sample datasets,
All groups and individual must do the following:
- For each movie, compute the number of cast members
- How many movies do not have a homepage?
- For each year, how many movies do not have a homepage?
- Extract the domain of each homepage.
- Extract a set of normalized tables. That is, each entry of a normalized table must
contain exactly one value (not a list or a dictionary). There is no need to use SQL
for this point.
- For each movie, compute the gross margin (difference between revenue and budget)
- For each movie, compute the number of crew members
- For each movie, compute the number of directors
- For each language, compute the number of movies where such language is spoken.
- For each company and each decade, compute the overall revenue
- For each decade, compute the company with maximum revenue
- In each year, how many movies have revenue smaller than the budget?
The following part of the exercise must be done only by groups of two or three people
- Distribute the revenue according to the order of appearance in a movie. Assume that
the i-th actor contributes twice as much as the (i+1)-th actor to the revenue.
- For each actor find the total revenue attributed to him/her.
- Find the actor that is responsible for the most overall revenue.
The following part of the exercise must be done only by groups of three people
- For each movie, compute the ratio between males and females in the cast
- For each movie, compute the ratio between the attributed revenue of males and females
in the cast
- Find the director that has the highest average ratio computed in the previous point.
Notes
- It is better to use GitHub for developing the project. It is
not mandatory, though.
- The project must be a jupyter notebook.
- There is no restriction on the libraries that can be used, nor on the Python version.