Creative Ways to Build a Better Agency through Data Science

Using topic modeling and network graphs to visualize company data

Joe Zeoli
Towards Data Science

--

Photo by Clint Adair on Unsplash

At my agency, 20nine, we’re constantly leveraging data to help our clients find solutions to their problems and track the effectiveness of those solutions when it comes to achieving their goals. But until recently, we hadn’t considered just how useful data science could be when it comes to looking inward and improving our own operations, employee experiences and effectiveness in fulfilling our agency’s purpose.

So we changed all that — and in the process, we completely revamped how we thought about employee performance reviews and goal setting. Here’s a look behind the scenes in terms of how we did it.

The Platform

To get started, we built ourselves an internal platform, called The Forge, to help employees create, manage, and track their personal priorities related to their growth from a personal, professional, and business perspective. The Forge is also a place where employees can go to give recognition, aka Kudos, to their coworkers via points that can be redeemed for a monetary reward. By building a platform to enable these actions and capture the associated data, we had two goals in mind:

1. Get a 30,000-foot view of what employees were passionate about to see how we could best help them towards their goals

2. Understand how our organization worked together and how we might be able to switch things up so everyone has a chance to collaborate more

Employee Priorities

Our first initiative was focused on getting a big-picture idea of what employees were working towards within their roles and careers. Our hope was that, by identifying overlap, we could structure our benefits and coaching sessions to support the major shared priorities of our team.

From a data science perspective, the plan was to aggregate all stated priorities to date and then cluster them into topics. Using this method, we looked to spot trends among the 600 of goals identified by our employees.

Topic Modeling

Topic modeling is a wonderful way to organize, understand, and summarize a large amount of data contained within content. In this case, we could load in 600 data points and easily summarize what they’re about in one simple output.

In The Forge, inputted priorities have a few data points we can pull from. For this initiative, we decided to grab both the title and the description content and merge them together. (After some initial review, we determined that the title alone didn’t always contain enough information to determine what a given entry was about.)

To perform our topic modeling, we are using the Latent Dirichlet Allocation (LDA) Mallet Model. The basic idea is that each document (or priority in this case) is made up of various words, and each topic also has words associated with it. We need to set how many topics we want to distill, and then the model determines the words in each topic and then matches the documents to those topics.

This model does not programmatically determine the best number of topics, so we need to find the optimal number ourselves. To do this, we can run the model over the data set many times with different topic numbers and measure the Topic Coherence score, which indicates the degree of semantic similarity between high-scoring words in the topic. Essentially, do all the top words in this topic make sense together?

We can then plot out the score for each of the topic amounts to visually see where the drop off is.

Image by Author

In this case, around 14 topics appears to be the sweet spot.

From here we run the LDA Mallet model, setting this number as our topic count to get the 14 topics with corresponding keywords. In this case we get:

  1. end, build, growth

2. business, develop, plan, pm

3. design, professional, personal, task

4. team, work, continue, meeting

5. project, creative, collaboration, track

6. brand, strategy, time, venture

7. improve, skill, management, set

8. process, increase, run, future

9. day, grow, priority, account

10. client, knowledge, communication, relationship

11. make, lead, role, review

12. learn, year, time, agency

13. create, content, video, social

14. work, focus, great, culture

Putting it all together

Having these topics with keywords is great because we can start to get a good idea about the areas where our employees are really interested in growing. For example, topic number 7 seems to revolve around improving skills related to management. A great way to help employees looking to develop their management chops might be to offer financial help towards a management program or lunch and learns from our executive team.

To be able to visualize this data even better, we want to reduce the topics into a 2-dimensional space. For this, we used t-Distributed Stochastic Neighbor Embedding (or t-SNE) and plotted the values onto a scatter plot.

Now, we can easily visualize the clusters to get an idea of the size of the topics, but we can also click into individual nodes and get its specific data points (in this case, the title and description of the priority).

Image by Author

Employee Collaboration

Our next challenge was a bit less intensive from a data science standpoint, but super interesting nonetheless. Looking at The Forge’s employee recognition (aka Kudos) data, our hypothesis is that employees give the most Kudos to the people they work with the most. So, using this data, we can create a network graph of 20nine employees, with the connections among them representing the Kudos they receive from each other. The more they receive from a single person, the stronger the weight of the connection between them. We can also programmatically cluster this information using the weights to help us visually see the different “cliques” within the organization.

Image by Author

Above is the output network graph of our company (with the names removed) based on three months of Kudos data. It is a great representation of the projects we had at the time and how they were divided up in the company.

In the future, we can animate this data from quarter to quarter to make sure that there is solid collaboration within the organization while also seeing how it evolves over time. Going forward, this data-driven approach to understanding employee collaboration can help us ensure no employee or “node” gets too far out of the network, thereby ensuring everyone is feeling included and a part of the company, especially at a time when we are all working remotely.

The Future

We plan to continue using both quantitative and qualitative data to optimize how we run as a company and make decisions that help employees feel included and achieve their own professional goals. As an agency, this is an important part of practicing what we preach to clients every day: By unlocking the power of your data, you can unlock the full potential of your team and help them on the path to fulfilling your organization’s purpose.

--

--