Using AI to Make Better Teams

An Easy Walk-Through Using BERT

Daniel Fein
Towards Data Science

--

Photo by Andrew Moca on Unsplash

“I have meticulously decided your pairings such that you and your partners skills complement each other,” said Dr. Jill Helms, my professor for a writing class I’m taking at Stanford. At first, I didn’t believe that such a thing was possible: even if she perfectly understood all of our skills, how could she have optimized every single pairing? But then I got to thinking about it, and came up with an extremely over-engineered, but interesting, solution.

If you could represent each person’s skillset, along with something they wish to get better at, as vectors in space, all we would need to is match people according to the similarity between these vectors. Believe it or not, this is actually not too hard using a few standard python packages. Let’s get started!

To clarify exactly what it is we’re doing here, we will write code that asks a group of students to each give their name, a sentence about their skillset, and a sentence about what they wish to learn. The code will then output the ideal matchings for the students such that students are matched with others who have the skills that they desire.

Organizing the Project

I chose to split the function into two files for readability and reusability (if I wanted to spin this off as a Flask API, for example). We’ll call the one that handles data retrieval main.py, and the one that does the work of making the pairings get_pairings.py . We’ll also need to install some packages using pip to get the code working. Specifically, we’ll be relying on three main packages:

  • sentence-transformers offers state-of-the-art pre-trained transformer architectures for producing sentence embeddings. In layman's terms, this package gives us the function that will take sentences and turn them into vectors.
  • sklearn is a broadly useful data analysis library. In this project, we’ll be using it to get the function that calculates the similarity between vectors.
  • numpy is a very popular framework that helps python deal with long arrays of numbers. We’ll use it to make our pairings.

These packages can all be installed by running the line pip install sentence-transformers sklearn numpy in a terminal. Once we’ve done this, we’re ready to write some code!

Getting Data

Our first step is to ask each student for their name and sentences. For our purposes, it works well to store an array of names, an array of the ‘skillset’ sentences’ and an array of the ‘desired skillset’ sentences. This will be handled by the main.py file. To make your lives easier, I’ll paste in all the code here and explain after.

First, we import our code in from get_pairings.py, which I’ll explain more below. Then, we write a function called pair that uses python’s built in input function to get user input from the command line as strings. We ultimately loop through each student, building up the arrays described above (which we’ll call knows, wants, and names). We then pass these arrays into out imported run function in order to get back a list of lists of students’ names. Each inner list in this return value contains a pairing of students. We then take this list and print out each of the pairs. We now have the main structure of our program down, the only thing left to do is to actually figure out how to make the pairings!

Making Pairings

We need to write a function that takes in the three lists that we received above and turns them into optimized pairings. If we were doing this all from scratch, it would probably require hundreds upon hundreds of lines of code. Luckily, we’re able to build on other people’s code using the packages described above. I hope you’ll find that the result is relatively straightforward. The code for my get_pairings.py file is pasted here:

We have four functions to break down. Let’s start with the last (yet most important) one: the run function. It takes the the three arrays representing the students and returns a list of lists containing the best pairings. It does this by first loading in a model (specifically BERT, a state of the art transformer model created by researchers at Google). It then runs the list of sentences to analyze through the model to receive ‘embeddings,’ which are functionally vector representations of each of the sentences. It then computes a table containing the cosine similarity of each of these vectors to one another. This can be thought of as the angle between each of the vectors if you were to plot them out in space.

The other three functions each work on processing this table in order to turn these similarities into good pairings. First, the tables are ‘averaged’ by the avg_scores function. Say Billy’s skills match perfectly with Bob’s desired skills, but Billy’s desired skills are very different from Bob’s actual skills. This average function would replace the respective table entries for these similarities with the average of both. So, if the Billy-desires and Bob-skill matchup is 1, but the Bob-desires and Billy-skills similarity is 0, then each would become 0.5.

Next, the get_pairs function reads the table into pairs. It does this by finding the highest average similarity scores in the table, storing them into an array, and then setting all of the similarity scores for each person in that pairing equal to negative infinity, so that the members of that group aren’t put into another group. It does this until all the pairs that can be made are found. Finally, it takes the indices of these pairs and converts them into the names given by the names array, and returns this result to be handled by main.py, or whatever other client it is used by.

Result

In the end, the working function should be able to do this.

$ python3 main.py
Enter how many people you would like to pair off: 4
Enter the name of student 1: Daniel
Enter one or more sentences describing Daniel’s skillset: I like art.
Enter one or more sentences describing Daniel’s desired skillset: Something relating to water, boats, or swimming.
Enter the name of student 2: Gaby
Enter one or more sentences describing Gaby’s skillset: I know a lot about the ocean and am a diver.
Enter one or more sentences describing Gaby’s desired skillset: I would like to know about history.
Enter the name of student 3: Jade
Enter one or more sentences describing Jade’s skillset: Computer Science and math.
Enter one or more sentences describing Jade’s desired skillset: Economics is very cool to me.
Enter the name of student 4: Ali
Enter one or more sentences describing Ali’s skillset: I love studying the economy and monetary policy.
Enter one or more sentences describing Ali’s desired skillset: Quantitative fields like science and math.
Here are the pairings:
1: Jade and Ali
2: Daniel and Gaby

I love stuff like this because it shows just how easy it is to use state of the art AI for everyday projects and use cases. I hope that some people can learn from this and start bringing AI to more people everywhere.

--

--

I’m an undergrad at Stanford trying to learn more about AI and Venture Capital. I record my most interesting thoughts on Medium. On twitter @DanielFein7