BigQuery: the unlikely birth of a cloud juggernaut

How 10 engineers transformed cloud data analytics

Ju-kay Kwek
Towards Data Science

--

Man welding with sparks flying furiously
Photo by gilaxia on iStock

The BigQuery team recently marked the 10th anniversary of its launch with a fun commemorative video. As the founding product manager on BigQuery, sitting with the celebrations gave me a tremendous sense of pride in what the team accomplished. Used by thousands of companies large and small, BigQuery has become one of the most widely used cloud data warehouses in the world. And, with many petabytes of data under management, it has arguably done the most to bring big data analytics into the mainstream.

So it might come as a surprise to learn about BigQuery’s humble beginnings inside Google — fewer than 10 engineers for the first few years. To get BigQuery to where it is today, this tiny team relied on a single-minded focus on customers to navigate multiple potential pitfalls and swiftly bring a ground-breaking product to market. I thought I’d share the inside story on those early days — an incredible journey with a great team, and what led me to eventually start Switchboard.

The vision for cloud analytics

In 2010, a group of forward-thinking engineers in Google’s Seattle office took stock of Google’s groundbreaking internal storage, compute, and analytics capabilities — the same technologies used to crawl and index the entire internet. They imagined the transformative benefits these technologies could provide for companies going digital, and started to develop an external service layer that would eventually become Google Cloud Platform (GCP). A key pillar of this strategy was unleashing the power of web-scale analytics for regular companies without the need to build or manage their own massive data centers. The project codename…BigQuery.

The analytics core engineering team, Jordan Tigani, Siddartha Naidu, Jeremy Condit, Michael Sheldon, and Craig Citro, created the early proof of concept — a REST API — that outside users could use to tap into Google’s vast analytics power. I joined as the PM, using my pre-Google background in enterprise software to help the team define the product roadmap and path to market.

Early “science experiment”

BigQuery’s DNA originates from Dremel, a groundbreaking internal Google service described in a 2010 white paper, and used by engineers and PMs across the company for ad hoc logs analysis on web-scale products. Think about Google Search or Gmail, where the interactions of billions of user accounts are logged and stored on vast arrays of storage servers for debugging or analyzing usage patterns. Traditionally, such activities would typically require hours of MapReduce job execution to boil down raw logs into more manageable aggregates. Dremel turned this notion on its head with the ability to use SQL to query across trillions of rows of data in real-time, and return results in seconds.

Data center racks of hardware
Photo by imgix on Unsplash

It’s worth remembering the broader industry context at the time: corporate business intelligence and analytics were dominated by companies like Teradata, IBM, and Oracle, and required a company to own and operate racks and racks of servers and hard drives — up to $100,000 per terabyte of storage! “The Cloud” was still a very new concept to many businesses, as was the notion of storing all data for later analysis. At the time, many CIOs stated, “We will never store our company’s sensitive data on someone else’s computers!”

Amit Aggarwal and I debuted the Alpha release of BigQuery at Google’s I/O conference in May 2010. The service was so new that at the time, BigQuery didn’t even have an ingestion API! Working with external data required one of our engineers to manually onboard data into the system. But that willingness to push forward and work closely with external early adopters was one of the key strengths of the team that would enable us to get to market quickly.

Toy train engine
Photo by Gerold Hinzen on Unsplash

The little engine that could

Over the next year, the BigQuery team faced critical resource challenges. There was so much to build. The Dremel engine was created with internal experts in mind — insiders with CS degrees who could hop in and out of a command line, debug error stack traces or speak with an internal team for assistance. You even had to provision your own data servers to use the service. And of course as an internal user, there were no “costs” therefore no pricing or billing. So the rest of the cloud service had to be built from scratch: programming & security model, access control, metering & billing, a front end UI, solid error handling, to name just a few crucial elements.

With just 10 engineers, we had to be laser-focused on the MVP (Minimum Viable Product). The team got a big boost when Jim Caputo joined as the engineering manager. His no-nonsense approach helped to keep things moving rapidly yet in alignment. Jim quickly became a thought leader and a critical partner for me — we were tied at the hip through most of BigQuery’s development.

Dodging and weaving to launch

With rapid progress and positive customer feedback, the BigQuery team started to become known within Google. But being small meant that, in Google’s somewhat Darwinian internal environment, the BigQuery team had to constantly be on the lookout for other, much larger Google product teams with overlapping mandates.

Several well-funded teams in the Ads Product Area harbored similar ambitions to build out query, visualization, and data catalog capabilities for their customers. At one point a team with five times as many engineers attempted to fold BigQuery into their plan. Others claimed they would simply replicate BigQuery’s capabilities in their product roadmap.

We countered this with a steadfast focus on our (external) customers. By understanding their specific needs and pain points, we could make it clear to stakeholders why this mattered to Google’s public cloud efforts. This enabled us to keep the early product focused and drive steadily towards launch.

Motorcycles racing each other on a racetrack
Photo by Joe Neric on Unsplash

Countering the naysayers

There was no guarantee that BigQuery would launch just because things were progressing quickly. The project faced some powerful internal skeptics who opposed the very concept. The opposition came in many forms — one senior exec insisted that BigQuery (which aimed to turn the economics of data analytics on its head by charging per-query) should charge vast sums to a small group of companies. Others argued that the cloud APIs and web user interface would never work.

Most memorably, one senior engineer on the launch review committee scornfully (and very publicly) told us, “This project makes no sense. There is no way anyone is going to use such a service.”

We persevered, sharing what early customers were telling us — that there was an important need to be served.

The early adopters

By October 2011, BigQuery was approved for Limited Preview, we were able to enlist the help of Google’s Developer Advocate team. Think of these folks as hardcore engineers who build developer communities. Leading this effort was Michael Manoochehri (who later became Switchboard’s Co-Founder). Giving talks at developer events, writing reference apps and API code samples, and finding compelling partners and applications, Michael, joined by teammates Ryan Boyd and Felipe Hoffa, did the heavy lifting to jump-start the early developer community around BigQuery.

We were also fortunate to work with visionary companies including Gamesys, Claritics, and the early Tableau and Looker teams. These teams shared the need to process and understand large amounts of data fast, at a rate that would be impossible to match by building out their own Vertica or Hadoop cluster.

Enterprise momentum

We also worked closely with Google’s early Cloud Enterprise Sales team, led by Eric Morse and Matt McNeil. We found partners in crime with intrepid Sales Engineers Brian Squibb and Tom Grey. They’d had their ears to the ground with customers whose data was exponentially growing and evolving into new formats like nested JSON — large enterprises like State Farm, Booking.com, and Omnicomm.

In 2011 we partnered with Omnicomm and the DoubleClick customer engineering team in an early demonstration of BigQuery’s unique commercial applicability. We onboarded 25 large DCM (DoubleClick Campaign Manager) advertiser networks into BigQuery — many terabytes of fine-grained ad server logs. These logs were enriched at the individual event level with demographic data, which enabled real-time campaign audience and frequency/reach analytics. This was groundbreaking as the alternative would be many months of engineering to build and tune a multi-million dollar parallel processing Vertica cluster or wrestle with Hadoop MapReduce technology. With BigQuery, Omnicomm paid for just the storage and querying when necessary, at mere hundredths of the cost of traditional systems. Omnicomm eventually built several products around this new strategic data asset.

Rocket launch
Photo by SpaceX on Unsplash

General Availability and developer adoption

BigQuery launched into General Availability on May 1, 2012. Since its start as a “science project” we’d progressively iterated and listened to customers, and in the process built a system that early customers loved for its power and simplicity. Now we had to tell their story to the world.

I remember the excitement of writing the launch blog post with our marketing lead Raj Sarkar and Liz Markman, our PR whiz. They assembled a launch campaign with a conference roadshow and days of journalist and analyst briefings. The concept of Big Data was new and shiny at the time, filled with the promise of all sorts of data coming online waiting to be unlocked. Professionals of all stripes were keenly interested in BigQuery as a way to accelerate access to data insights. Wired, GigaOm, and Gartner were among the technology press covering that trend.

By 2013, BigQuery was fast-becoming a standard for cloud-based analytics particularly with technical startups and digital-native companies. Jordan and Siddartha wrote the definitive BigQuery reference which rapidly boosted know-how for this community. These teams had the coding chops to leverage APIs and knew how to work with data in BigQuery.

Speaker badges from our conference road show (Photo by author)

Unaddressed need in the Enterprise

Even with this brisk adoption, challenges remained on the enterprise side. While technical organizations were adopting BigQuery, the gap between those teams and their line-of-business counterparts started to widen. We spoke with Revenue Operations and Marketing Analytics leaders who complained about the increasing number of external data sources. Their business teams were grappling with disparate and often complex feeds as they adopted SaaS services like Salesforce, Marketo, or NetSuite to run their businesses. This made it very difficult to combine the data into the right “shape” needed to derive insights, often requiring (and burdening) precious dev or BI resources.

Increasingly our team was approached by more traditional, non-technology enterprises, such as airlines, insurance companies, and retailers. And interestingly, it wasn’t CIOs or CTOs coming forth, it was CMOs and business unit heads. Consistently, their message was “I have all this new data, it’s large and complex and my tech team doesn’t have the bandwidth and/or domain-expertise to help my team manage it…can you help us?”

Michael and I worked to find system integrator (SI) partners to help these stakeholders. But those SIs invariably ended up building bespoke apps or cobbling together legacy BI and ETL systems. Those CMOs and business units still didn’t have control over their data. They still had to rely on a layer of IT that, in many cases, simply didn’t understand this new 3rd party data. We would eventually leave Google to form Switchboard to help enterprises tackle what has become a defining problem for many organizations. But that’s a story for another day.

Photo by Javier Allegue Barros on Unsplash

The future of data in the cloud

Since its launch, BigQuery has continued to push the envelope for cloud-based analytics. The team has added powerful new capabilities such as streaming ingestion, support for standard SQL, and machine learning functions. New reservation pricing makes it much more cost-effective to take advantage of all this power. Perhaps because the innovative DNA of the BigQuery team remains intact. For example, the team recently announced the beta of BigQuery Omni, which makes it possible to use BigQuery to analyze data living in other cloud systems like AWS or Azure.

Reflecting back on this process, I learned how great innovation arises from within tech juggernauts. From the outside, it may appear that with nearly unlimited resources, these companies hire hoards of engineers to constantly churn out thousands of ideas in an assembly line. To the opposite, it takes incredible focus, a dedicated team, and phenomenal technology to emerge from the crucible of internal competition, feature creep, resource constraints, and vocal naysayers. It’s not enough to have a good idea. To give birth to a tech product you need to satisfy a pressing customer problem and have the tenacity to solve it for them. But the rewards of successfully launching a product and seeing it become an industry standard are incalculable, and I am personally grateful to have been part of it. It’s certainly shaped my approach to my post-Google professional life.

10 years is a long time, especially in tech. It’s been great to acknowledge that milestone and reflect on an incredible journey. Congratulations to the BigQuery team — and its growing community — for their sustained innovation and evolution. I can’t wait to see what the next 10 years bring!

--

--

CEO & Co-Founder at Switchboard. Launched Google BigQuery. Data/Analytics, Enterprise Software, Entrepreneurship, Startups. http//:www.switchboard-software.com