The world’s leading publication for data science, AI, and ML professionals.

Nebula SDK

In this post I would love to chat about Nebula SDK delivered in javascript language.

  • analyze big data in JS

Unleash power to turn big data to simple intelligence in a second

An dummy illustration (Screenshot by Author)
An dummy illustration (Screenshot by Author)

(Disclaimer: This article is written by Shawn Cao as part of knowledge sharing series of Nebula Engine)

In this post I would love to chat about Nebula SDK delivered in javascript language.

Why SDK is in javascript?

As a holistic data engine, Nebula ships with a native UI running on web browsers. Through its UI, users can either compose queries from UI elements or they can use the built-in IDE to write simple javascript code to query Nebula.

Hence naturally we ship a SDK in javascript to allow users to leverage Nebula UI for the most return. With saying that, we may release SDK in other languages such as Python SDK, GO SDK, and Java SDK in the future.

Backend Javascript

Backend user defined functions allow users define instant javascript function or lambda to add new runtime column to existing table schema. And the function will be sent to Nebula core engine to execute along with the query.

Let’s look at an example:

If you successfully brought up nebula services on your local box, there is a single test data set you can play with, it’s nebula.test, this test data set has all kinds of types for each column. Column value is in short integer.

const f = () => {
  const v = nebula.column("value");
  return v % 2 == 0 ? "even" : "odd";
};

This lambda function is simply reading value of column value (through API nebula.column) and mod the value by 2, return literal names based on the even number check.

Next, we use API nebula.apply to register this instant lambda as a new column named "even" with type of string (literally "even"or "odd").

// first param - column name
// second param - column type 
// third param - instant function / lambda
nebula.apply("even", nebula.Type.STRING, f); 

Now, we can use this new defined column in our query construction. Such as we want to find out total number of even values and odd values from time range "2019–08–16" till "2019–08–26" which covers the whole range of the data set (macro time supported too, such as -5d, -10h, -30m, now, etc.).

nebula.source("nebula.test")
  .time("2019-08-16", "2019-08-26")
  .select("even", count("id"))
  .run();

Note that, we’re using this new column even in above query. If you have Nebula running on your computer and execute above lines in the web IDE, you will see something like this.

after aggregating count by new column (Screenshot by Author)
after aggregating count by new column (Screenshot by Author)

Regarding visualization pane on the right side of the IDE, you can choose different visual type from the upper-right options list once query is done.

Client Side Javascript

Besides backend javascript by which you can transform existing data into any desired data you would like to analyze, we allow users to write client side javascript to build up final data to present.

The first one to introduce is pivot, pivot function accepts single column to pivot the whole result data by specified column.

Let’s continue above example by appending pivot statement

// new column logic
const even = () => {
  const v = nebula.column("value");
  return v % 2 == 0 ? "even" : "odd"
};
// create new column with the lambda
nebula.apply("even", nebula.Type.STRING, even);
// build the query
nebula.source("nebula.test")
  .time("2019-08-16", "2019-08-26")
  .select("even", count("id"))
  .pivot("even")
  .run();

The result will now look like

after pivot column even (Screenshot by Author)
after pivot column even (Screenshot by Author)

The limited values of original "even" column pivoted into columns "even" and "odd" now.

Next API to introduce is map, this API gives us flexibility to add a new column and remove existing column if we don’t want to keep them in our final visual. For example, I want to define a new column ratio to hold the value of number of even numbers divided by number of odd numbers, we can add a map statement like this:

const even = () => {
  const v = nebula.column("value");
  return v % 2 == 0 ? "even" : "odd"
};
nebula.apply("even", nebula.Type.STRING, even);
nebula.source("nebula.test")
  .time("2019-08-16", "2019-08-26")
  .select("even", count("id"))
  .pivot("even")
  .map(r => r["ratio"]=r["even"]/r["odd"])
  .run();

This change gives us new result looking like

after adding new column ratio (Screenshot by Author)
after adding new column ratio (Screenshot by Author)

We have the new column ratio along with raw columns from previous steps, to highlight the ratio itself, we don’t need the raw numbers of even/odd numbers any more, then we can pass these two column names to the same map function to remove them at the same time, now it looks like this:

const even = () => {
  const v = nebula.column("value");
  return v % 2 == 0 ? "even" : "odd"
};
nebula.apply("even", nebula.Type.STRING, even);
nebula.source("nebula.test")
  .time("2019-08-16", "2019-08-26")
  .select("even", count("id"))
  .pivot("even")
  .map(r => r["ratio"]=r["even"]/r["odd"], "even", "odd")
  .run();
after removing even/odd columns (Screenshot by Author)
after removing even/odd columns (Screenshot by Author)

Aggregation & Timeline

Through Nebula SDK, users can switch their result between aggregation view and timeline view. Timeline view is just to ask Nebula engine run the same aggregation in different time buckets.

Continue previous example, now we would like to see the time series of ratio value every 30 minutes (1800s). The code is just a tiny change by passing two parameters (true, 1800) to run API, or we can use a shortcut API timeline directly.

const even = () => {
  const v = nebula.column("value");
  return v % 2 == 0 ? "even" : "odd"
};
nebula.apply("even", nebula.Type.STRING, even);
nebula.source("nebula.test")
  .time("2019-08-16", "2019-08-26")
  .select("even", count("id"))
  .pivot("even")
  .map(r => r["ratio"]=r["even"]/r["odd"], "even", "odd")
  .run(true, 1800); // or .timeline(1800)

By executing above change, we get a completely different view that is exciting:

timeline of ratio value (Screenshot by Author)
timeline of ratio value (Screenshot by Author)

By changing the time window, you could get view of any time granularity (also you can have 3+ different view of timeline: line, area, bar).

For example, I would like to see 10 minutes granularity (600s) in bar view:

...
.timeline(600);
timeline of ratio value every 10 minutes - bar (Screenshot by Author)
timeline of ratio value every 10 minutes – bar (Screenshot by Author)

Replay The Exercise

Thanks for reading so far! If you would like to reproduce the example illustrated in this post, please run up Nebula single node on your computer (MacBook or Linux), as simple as:

  1. git clone https://github.com/varchar-io/nebula
  2. cd nebula && ./run.sh (you would need to have yarn installed)
  3. open http://localhost:8088 in your browser.

If you want to practice other features included in Nebula SDK after reading this post, please refer full SDK doc at https://nebula.bz/sdk.html for more details.

Summary

Nebula SDK allows users to interact with your big data set with simple javascript code, which in my opinion fully opens the door for any advanced Data Analysis we may have.

By combining backend javascript functions and client javascript functions, we can turn our large amount of data into whatever view we would like to present. Nebula engine has capability to process terabytes of data in sub-second, plus its powerful SDK, it makes data analytics work easy, engaged and lots of fun.

Thank you for completing reading this post. If you have any questions, please send me a message, would love to chat more.

Have a good one!


Related Articles