Let me find you your next favourite Korean actors based on your current ones

A few months ago, Squid Game, a drama from South Korea, become viral and took the attention of people worldwide. According to FlixPatrol, it has been in the Top 10 Most-Watched Worldwide in Netflix. Not only that, some iconic stuff or even the game itself are now being adapted in real life (but not in a cutthroat way).
From this drama popularity, it seems the world now is more familiar with Korean Culture and Dramas. Now, it is easier to access Korean Dramas on various platforms because of their gaining popularity. It also means that Korean Actors gain more exposure globally.
Sometimes, after a lot of binge-watching, you may wonder what dramas you should watch. Or, maybe, you want to know other actors that have similar track records to the ones you have watched. Lo and behold, let me share how you can find similar actors based on certain characteristics when they starred in dramas 😉
How The Data Was Derived
Getting List of K-Dramas
Firstly, to get the list of all K-Dramas as many as possible, I referred to Wikipedia here. From this link, all drama names and years were retrieved by using BeautifulSoup.
After that, the next step is to get the movie ID for each drama. For this, we call IMDbPy by searching and matching the drama title and year that were previously collected. The tricky part is that there is some possibility that the drama name is similar to other dramas. For this case, we collect all movie IDs anyway.
Filtering and Collecting Information
After collecting the movie IDs, the next step is to check and get rid of dramas in which movie IDs have not been found. We also have to check whether the drama came from South Korea or not, in case non-Korean dramas with similar names were retrieved. These steps were done for every drama and every movie ID found (in case multiple movie ID collected). After these processes, related information to the drama, such as cast members, rating, genres, were then retrieved.
For any drama that does not have a rating, the rating will be substituted by averaged genre ratings. Before doing the substitution, for each genre available, we calculated the averaged rating from all dramas which belong to that genre. Then, for the missing rating of a drama, the value will be filled by averaging averaged rating for respective drama’s genres, since dramas can have multiple genres.

Not only K-Drama dataset is processed, but a data frame consisting of actor names and IDs is also produced for feature processing. For this case, only the first ten listed actors in each drama will be included, with the assumption that the first ten listed actors are the most appearing characters in each drama. Hence, the characters who appeared shortly will be avoided. We want to make the actor selection more specified.
How Did You Figure Out Who’s Closest to Who?
As mentioned above, IMDbPy provided us with details or complete descriptions of drama and its casts. Therefore, we can utilise these details to create features.
Count of Drama and Averaged Rating
For each actor, we will count how many dramas he/she has starred in and accumulate his/her drama ratings. Then, the accumulated drama ratings will be divided by the number of dramas.
Recency of Actors Starring in Drama
Each actor has their own time and preference on deciding to star in dramas. There are actors whom you can see on the screen frequently. Others perhaps do it once a year, or maybe they come in a drama after several years from their last dramas.
Based on this condition, there should be some feature to measure how frequent/recent an actor stars in dramas. For this, we try to calculate the difference between the current year (2021) and the year the drama was released. This will be accumulated per each actor and stored in the ‘recency’ variable.
Lead or Not?
I also noticed that some actors frequently starred as supporting actors. Others might star as supporting actors but become leading actors in other dramas. To account for this condition, we will use the order of cast listed in IMDb. For each drama, the first three actors mentioned will be assigned a weight of two. The rest of the casts then are assigned with a weight of one.
Averaged Rating per Genre
Observing from the dataset, dramas may have multiple genres, and of course, actors have played in numerous drama titles. Hence, for each actor, we will count (or accumulate) the total number of dramas and ratings that belong to certain genres.
For example, if an actor X starred in a drama with genre Y and Z, then the drama count for genre Y is 1 and Z is also 1. But then, if the same actor also starred in another drama with the genre of W and Z, then the total drama count for that actor is 1 for genre W, 1 for Y, and 2 for Z. Similar calculation also goes for total rating per genre.
After that, for each actor, his/her accumulated drama rating per genre will be divided by the number of unique dramas he/she has starred in as a form of normalization.
Find The Closest Actors by Cosine Similarity
To find which actors are the closest to the inputted actor, we used Cosine Similarity by Scikit Learn. Cosine Similarity has been widely used in finding similar documents or information in Natural Language Processing. It measures the cosine of an angle between two vectors, hence the value is between -1 and 1. If two vectors overlap (which means very similar or exact), the value is 1. If they are opposite or different in direction, the cosine similarity is -1. Formulaically, Cosine similarity is the dot product of two vectors divided by the product of the vectors’ length.

To implement this, we use Pairwise Cosine Similarity from Scikit Learn. This function will return a matrix consisting of pairwise similarities between all samples in input. The input of course is the actorFeatures data frame that was derived after pre-processing and feature selection above.
Let’s test this with two names of Korean Actors! Below, you can find the three closest actors, given two actor names. For this case, let’s test with Lee Min-ho (_The Heirs, Boys Over Flowers, The King: Eternal Monarch) and Bae Suzy (Start-Up, Vagabond, While You Were Sleeping_).

By the result above, if you are a fan of Lee Min-Ho, you can check the dramas starred in by:
- Kim Tae-hee (_Hi Bye, Mama!_, Stairway to Heaven)
- Shin Min-a (_Hometown Cha-Cha-Cha, Oh My Venus_)
- Lee Joon-gi (_Flower of Evil, Moon Lovers: Scarlet Heart Ry_eo)
And, if you want to watch dramas similar to the ones which Bae Suzy starred in, you can look up to these actors’ dramas:
- Shin Sung-rok ([Kairos](https://en.wikipedia.org/wiki/Kairos(TV_series)), The Last Empress_)
- Jun Ji-hyun ([Mount Jiri](https://en.wikipedia.org/wiki/Jirisan(TV_series)), Legend of The Blue Sea_)
- Park Shin-hye (_Sisyphus: The Myth, The Heirs_)
For further references, you can check my Github below.
GitHub – intandeay/K-Actors: Exploration about K-Dramas & Actors