Creating chRistmas caRds

A data scientist’s guide to creating Christmas cards in R

Greta Gasparac
Towards Data Science

--

Christmas is near,
but lockdown’s still here.
How can I share Christmas cheer,
if my friends are not near?

Zoom meetings are not
proper Christmas greetings,
so I think to myself: “What can I make
to lift their spirits during this Christmas break?”

“I can still send them handmade Christmas cards!”, I say,
but sadly, 2020 is, again, in the way.
The postal service is breaking down,
and who knows when my card is leaving town!

So what else remains for a data scientist like me,
other than firing up RStudio and plotting the iconic tree.

Diverging bar plot

Image by Author.

Let’s start with an easy one. The key point of a diverging bar plot is comparing data with a midpoint/baseline. We will use this idea to create symmetric bars that will represent our Christmas tree.

This is the most cumbersome of the approaches we present, since we need to create the data frame manually. What we need for this 5 level tree:

  • Specify each of the 10 bars (however, only 5 unique values, since we are specifying both, the left and the right side of the diverging bar plot). This is shown in the wish column, where we can be a bit sneaky and include a secret message!
  • Set divergence values. Values at the same level need to be the same in the absolute sense, with one of them being positive and the other negative.
  • Add labels for different parts of the tree (so we can color it later).

That’s it, we are ready for ggplot2 to do the magic. Create the bars, select your colors, throw in some ornaments in the end and your first card is ready!

Dirichlet sampling

Image by Author.

Now let us dip our toes into statistics.

Dirichlet distribution is a multivariate generalization of the beta distribution, parameterized by vector α of positive reals and length K. The support of a Dirichlet distribution of order K is the open standard (K-1)-simplex, which is a generalization of a triangle. For example, for K=3 we get an equilateral triangle.

A triangle? Hmm … Christmas tree is sort of a triangle, isn’t it?

All we have to do is sample uniformly from an equilateral triangle to get the points for the tree and the ornaments. Then we map the values from ℝ³ to ℝ² and plot them using ggplot2. To top it off, we also add a nice star on the top and wish everyone a merry Christmas!

3D spiral

Image by Author.

For the last one we move from 2D to 3D. We draw a simple spiral and decorate it with some bigger spheres for the ornaments, and some smaller ones for the twinkly Christmas lights.

In the 2D plane a spiral has the following parametric representation:
x = r(φ)cos(φ) and y = r(φ)sin(φ).

We set r(φ) = φ and φ= i/30, where i is the iteration index. To stretch it out in 3D we also need to add the third coordinate z, which can be just z = n_tree-i, where n_tree is the number of points we want.

We generate the data and then sample from that to get the coordinates for the ornaments and the lights. We reduce the z coordinate for the ornaments by some constant, so they appear below the line, and add some Gaussian noise to the lights, for them to spread out around the tree.

We plug the data into plotly, specify the colors, size and other details … and here it is! Our very first 3D Christmas tree.

And thus we paved this new Christmas path,
with programming, statistics and math.

There’s one thing left for me to write:
“Merry Christmas to all, and to all a good night!”

--

--