
Introduction
I replicated a Google-style search engine from scratch in Java during my senior year of college. Over the course of the semester, I spent 30 hours on each of five individual assignments and over 100 hours during the last two weeks of classes working in a group of 4 on the final project for the course.
I am currently in my final semester before receiving my master’s degree in Data Science from the University of Pennsylvania. Despite the incredible amount of stress, sleep deprivation, and exhaustion it caused me, this Web Systems course was by far the most rewarding course I have taken during my four years at Penn. I cannot think of anything more rewarding than, at the end of four months of hard work and grinding, knowing that you have built Google from scratch, a system that billions of people use every day.
We built a web server and framework similar to Spark Java, a web crawler, and a system that could be distributed across nodes individually. We finished the semester working in groups of 4 to build the full search engine with these components and a few others.
By the end of the semester, we had crawled over 1 million documents from the internet (respecting the robots exclusion protocols on each website), indexed over 140 million words from these documents in a lexicon and reverse index, and developed a PageRank algorithm to rank all of the documents we had crawled. We learned how to run all of these different components on AWS using less than the $400 budgeted to us through AWS Educate account credits. Finally, we created a user interface to enter search queries and respond with our ranked results.

The goal of this article isn’t to flex all of the things I was able to accomplish through this class. I listed all of those components of our project to show you what I was able to accomplish in four months of Programming. The biggest project I had worked on before this was creating Chess in Java. Don’t doubt what you are able to accomplish if you are willing to work hard on it.
My goal is to encourage you to consider working on a project that you are interested in which feels immensely difficult, maybe even impossible. By the end, you will have created something incredible and it will be worth it. This type of opportunity for growth is not available without feeling uncomfortable and setting high expectations for yourself.
What I Learned
How to Read Documentation
As you develop as a programmer, one of the most useful skills is the ability to read the documentation of a library and implement code with it.
The quality of documentation across libraries within a given language varies greatly. When you work on a massive project, you will be using a ton of different libraries. This helps expose you to this diversity and develop your ability to read the documentation in all its forms.
By the end of the project, we had used over 20 different Java libraries. Entire group meetings and even days were spent getting familiar with how to write efficient code in some of the more complex libraries. I now feel very confident that I can effectively learn any library I would like to use.
How to Google Coding Questions
Similar to reading documentation, a lot of time was spent googling errors we faced and how certain things worked. Learning how to write the correct search query into Google is a necessary skill as a programmer.
These questions ranged from certain error messages we could not figure out how to fix to how to run our code more efficiently. The second issue of code efficiency is not a problem you typically run into when working on smaller-scale projects.

When you are working on a massive project, like scraping 1 million documents or indexing over 140 million words, understanding code efficiency is a necessity to finish the project on time. Diagnosing what parts of the code are not running as fast as you would like and then finding solutions online is a critical skill that massive projects allow you to improve on.
How to Work With a Team
Another crucial skill to develop is your ability to work on a team when coding. If you want to be a software engineer or data scientist, you have to be able to work effectively on teams. While it’s not a necessity to work on a team for a personal project, it is definitely strongly suggested if you have a few people who would like to join you. There are a few added benefits to working in a team on these large projects.
The first benefit is to your overall interviewing experience. During behavioral interviews, most companies want to know how well you work on teams (as you will be working on teams when you work for them). Having an impressive experience like working on a challenging project in a team is very helpful in these conversations.
Another benefit is that you will learn how to effectively communicate technical ideas to people. When working on large projects, there are many design decisions that need to be made. During meetings with your group, you have the opportunity to express your vision for how to implement aspects of the project. Effectively discussing technical ideas is a very useful tool both in the context of advocating for your own ideas and discussing what you think about your teammates’ ideas.
You also have the opportunity to use version control tools like git. Version control is a necessary skill for most programmers. Understanding how to write code for the same project as teammates is not something you can pick up naturally by working on your own.
Finally, the work is more enjoyable when working with a team of people. Spending hundreds of hours working on a project by yourself is not as fun as communicating with others and collectively working towards a significant goal.

How to Budget Resources on a Project
Not all massive projects have to have a monetary cost. If they do, there is an added benefit of understanding how to manage these costs. If there aren’t any monetary costs to the project, you can also learn how to effectively budget your time.
AWS educate gives a free $100 credit to all students who go to partnered universities to work with their cloud computing platforms (link here). While we were working on the Google project, our group of four understood that we had a combined budget of $400 to intelligently spend on AWS to complete the project.
All of our data was stored on AWS in addition to the parallel EC2 nodes for crawling and indexing, and our EMR clusters for running map-reduce jobs for indexing and PageRank calculations.
We held multiple meetings during the two weeks we worked on the project together to budget our resources effectively. These meetings involved us calculating what machines we could use to both finish the projects on time and also stay within our budget constraint. This type of cost vs. efficiency planning is not available in smaller-scale projects.

As a student, I also had to budget how I spent my time. I was taking five other courses during the semester (including three other graduate-level computer science courses). The time commitment required for this course made me rethink how I spent all of my time. I completely changed how I scheduled my time and how to deal with my urges to procrastinate in order to make time for this project every week.
I Love Coding
Before this class, I had suspected that I enjoyed coding. I had never been able to test this theory. I enjoyed working on my computer science homework assignments and was pretty sure that I wanted to work in data science after I graduated, but I was not sure that it was necessarily the correct path.
After working on this project, I realized that I not only enjoy coding, I love it. Working on this project for hundreds of hours over the course of four months showed me that I loved coding even when it was stressful, even when I had to code for extremely long sessions, even when it caused me extreme sleep deprivation, even when problems felt too difficult to overcome, and even when I didn’t want to keep coding. The next day, I would always wake up excited to continue tackling the problems I couldn’t solve the day before.
I am now confident that I will love my data science job after graduation. Without this type of experience, you cannot really know that you will enjoy something during tough times (which I believe is a key to enjoying your work). I can now call programming a passion and am so happy that I can also call my passion a job.

Conclusion
During the first few weeks of the course, I kept telling myself, "I should really drop this course. I am not going to be able to do all of this work." Our professor kept encouraging us that although it was a lot of work, we would not regret taking the class and that we would be able to get through it. By the end of the semester, I realized that he was right.
We doubt ourselves too much as programmers. We think that we have to know everything before we get started with a new programming project. If we sit down and just start working on it, we amazingly are able to get through the challenges and grow.
This project completely changed how I view myself as a programmer. I still get that thought that I will never be able to finish that really cool idea I just had. Then I just think to myself, "I made Google." And I start working. I want you to have that same fuel. What’s that really ambitious idea that you have been holding onto for the past few weeks? Get started on it. You can do it. You’ll come out a better version of yourself.
Thank you for reading this article and good luck with your ambitious project.