Exploring Highcharts in R

Visualizing trends & patterns using data from ‘How I met your mother’

Manasi Mahadik
Towards Data Science

--

This article originally started out as exploratory quest to understand data visually in R. What subsequently followed was scrambles of experimenting, testing and making leaps into the dark. During this period I also re-watched my favorite show — How I Met Your Mother, this time looking at some recurring patterns & the body of data sets it has left behind. The need to form associations between the two parallels led me to document & share this learning and that itself remains the inspiration of this article.

The general R viz space currently is pretty exciting as it is, ggplot2 dominating the game with Hadley Wickham’s trademark school of making complicated operations remarkably lithe. Its efficiency in being able to quickly visualize trends, and customize just about anything you’d want, makes it a solid & a dependable tool to rely upon. Plotly on the other hand let’s you build remarkably beautiful D3 plots particularly for websites, dashboards. It allows flexibility to hover your keyboard or mouse and see data labels and lets you customize zoom to see specific details making it an exemplary go-to for R shiny apps & markdown documents.

Entering the data viz space in 2009,was ‘Highcharts’ a D3JS module who’s current clientele boasts of Facebook, Microsoft & Stack Overflow.

The story of Highcharts is perhaps one for the ages. Its maker, Torstein Hønsi , based in a small Norweign town surrounded by fjords, mountains and streams, was in the quest for a modest charting tool for updating his homepage with snow depth measurements from Vikjafjellet, the local mountain where his family keeps a cabin. Irked by the constant plug-ins he encountered, he took the leap of kick-started Highcharts. Since 2009, it remains the company Highsoft’s best selling product & it is known around the world as a premier charting tool.

Joshua Kunst’s Highcharter package in R is a wrapper function to the original Highcharts Java Script library. This article explores some of its visual functionality using data concerning a show know for its proclivity with charts and viz- How I Met Your Mother.

HIMYM features makes use of many aesthetic charts

There are two main functions in the package namely — hcharts() and highcharts() similar to Ggplot2’s qplot() and ggplot(). The former is single key shortcut while the latter makes use of html widgets and animations. Since this article draws some pretty elementary charts, the hcharts() function suffices for most of them . I have used the more elaborate highcharts() function in the final chart to combine the polar & line chart.

Let’s jump in!-

The grouped column chart

The good old column chart is perhaps the most perineal charting option to display categorical data, for good reason. It effectively compares and contrasts data in an easy to read fashion.

Below I have plotted the number of drinks the HIMYM gang drinks through the series.

The syntax is fairly easy to navigate, using the hcharts() shortcut the first command itself suffices the task of creating the raw column chart, while the subsequent lines of command add layers of aesthetics & information. Here- the title, subtitle, credits & a custom theme.

Another aspect to note here is the extensive employment of magrittr’s (its co-developer Stefan Bache in the vignette insists it be pronounced with a sophisticated french accent) %>%, the 'pipe’, making the syntax amicable for tidyverse users.

Number_of_drinks %>% 

hchart(type = 'column', hcaes(x = `HIMYM Character`, y = `Number of Drinks`, group = Type)) %>%

hc_title(text = "How much did the gang really drink?",
style = list(fontWeight = "bold", fontSize = "30px"),
align = "center") %>%

hc_subtitle(text =
"'All I want was to have a regular beer at my regular bar with my regular friends in my regular city.'-Ted Mosby",
style = list(fontWeight = "bold"),
align = "center") %>%

hc_credits(enabled = TRUE,
text = "Data Source: HIMYM;https://imgur.com/user/haaaaaaaveyoumetted",
style = list(fontSize = "10px")) %>%

hc_add_theme(hc_theme_ffx())
Number of drinks by types of drinks & show characters.

The pie chart & colored area chart

Luckily the author of the highcharter package was presumably a fan of the show himself and included some data sets based the show in the package itself. For these two charts I have used those very data sets — Marshall’s favorite bars and pies. In the show, Marshall presents his preferences of his favorite bars & pies in the form of pie & bar chart to the gang. Here, his preferences are visualized in the form of a pie-chart & a colored area chart. The syntax used is fairly similar to the first chart created. I have added custom desireble colors to these charts basis their hex codes. Although custom themes can be coded into the charts , the package itself has a vast repository of beguiling themes. Through out this article I have used the 'ffx' theme, inspired from the Mozilla Firefox browser.

#pie chartfavorite_bars %>%  hchart(type = ‘pie’, hcaes(bars, percent)) %>%  hc_title(text = “Marshall’s Favorite bars”,
align = “center”,
style = list(fontWeight = “bold”, fontSize = “30px”)) %>%
hc_tooltip(enabled = T) %>%
hc_subtitle(text = “In Percentage of Awesomness!”,
align = “center”,
style = list(fontWeight = “bold”)) %>%
hc_add_theme(hc_theme_ffx()) %>% hc_credits(enabled = T,text = “Data source:HIMYM”)#colored area graph

favorite_pies %>%
mutate( segmentColor = c("#000004", "#3B0F70", "#8C2981", "#DE4968", "#FE9F6D"))%>% hchart(type = 'coloredarea', hcaes(x = pies, y = percent))%>%
hc_title(text = "Marshall's favorite pies",
style = list(fontWeight = "bold", fontSize = "30px"),
align = "center") %>%
hc_subtitle(text = "In Percentage Of Tastiness!",
style = list(fontWeight = "bold"),
align = "center") %>%
hc_add_theme(hc_theme_ffx())
Marshall’s favorite bars & pies- in a pie chart & colored area graph respectively

The bubble chart

The penultimate chart is a bubble chart to visualize the top rated & voted episodes of the series. The bubble chart unlike the scatter plot , allows the plotting of 3 D data- the size of the bubble adding to the dimensionality of the data. Here- the X axis represents the name of the episode and the Y axis represents the rating of the episode while the bubble represents the number of votes on IMDB. This helps us infer that although ‘The three day rule’ is rated among the top 10 episodes, it has minuscule votes and hence must be interpreted with caution. Call outs and annotations like this can also be added to the chart to point certain facts - the hc_annotation() function allowing the same.

HighestRated %>%  hchart(type = “bubble”, hcaes(x = ‘Episode’, y = Rating, size = Votes, color = 
Votes), maxSize = “20%”) %>%
hc_title(text = “Top Rated Episodes of HIMYM”,
style =list(fontWeight = “bold”, fontSize = “30px”),
align = “center”) %>%
hc_subtitle(text = “What was your favorite episode?”,
align = “center”,
style =list(fontWeight = “bold”)) %>%
hc_credits(enabled = T, text = “Size by Number of Votes,
Colored by Rating.
| Data Source- IMDB”) %>%
hc_add_theme(hc_theme_ffx())
Top Rated episodes of the series, by Ratings & Votes on IMDB

The polar line chart

Finally, a polar line chart is brought into play to gauge reactions to the much debated HIMYM finale.

This data set had a interesting spin to it. The HIMYM finale was subject to some serious jeering after it was aired (read here, here, here & here). In fact even researching for this article fired up some of the fury I first felt after I watched the finale, more than 5 years on. But the data, as it its sometimes known to do, tells us a startlingly altering story .

Majority of the folks studied by social analytics platform Canvs felt buoyant about the finale. The Canvs algorithm used sentiment analysis of peoples twitter reactions to figure out how people feel of forms of entertainment.

The chart below visualizes this data set in the form of a polar line chart, coloring the most commonly shared ‘opinions’ in pink that is- reactions above the mean threshold of reactions while the rest is highlighted in yellow.

highchart() %>%  hc_chart(polar = TRUE) %>%  hc_title(text = “HIMYM Finale Reactions”,
style = list(fontWeight = “bold”, fontSize = “30px”),
align = “center”) %>%
hc_subtitle(text = “‘And that kids is the story of how I met your mother’”,
style = list(fontWeight = “bold”),
align = “center”) %>%
hc_xAxis(categories = fans_react$Fans,
style = list(fontWeight = “bold”)) %>%
hc_credits(enabled = TRUE,
text = “Data Source:Based on a study done by Canvs as reported by The Atlantic”) %>%
hc_add_theme(hc_theme_ffx()) %>% hc_legend(enabled = FALSE) %>% hc_series(
list(
name = “Bars”,
data = fans_react$Contribution,
colorByPoint = TRUE,
type = “column”,
colors = ifelse(fans_react$Contribution < mean(fans_react$Contribution),”#ffeda0",”#c51b8a”)
),
list(
name = “line”,
data = fans_react$Contribution,
pointPlacement = “on”,
type = “line”))
HIMYM Finale reactions on twitter bucketed and visualized

Key takeaways-

  1. The Highcharter package in R with it’s navigable nature, graspable syntax and advanced D3JS visuals is a great addition to the R viz space, currently dominated by the Grammar of graphics Ggolot2 package.
  2. It can be used to create the a wide range of charts like bubble charts, column charts, tree maps, time series graphs etc using similar syntax and arguments.
  3. Not only does it support R objects but also employs the loved %>% 'pipe’ operator making it seem incredibly familiar to tidyverse users. This homely syntax combined with stunning visuals makes it go-to tool in R shiny apps and R markdown documents.

The code along with all data sets used can be found on github here .

Thanks for reading! You can reach out to me here.

--

--