The world’s leading publication for data science, AI, and ML professionals.

A primer on visual overview of data frame

Data visualization is not only a means of analysing data and communicating results after cleaning the data, but also a way to make sense of…

Data visualization is not only a means of analysing data and communicating results after cleaning the data, but also a way to make sense of the entire dataframe at initial stage. Here I illustrate 3 simple methods (2 lines of code each) for visually explore the entire data frame’s composition, correlation and distribution.


ggpairs

ggpairs will plot of a matrix of variables and use the suitable visual types depends on whether they’re categorical or numerical.

library(GGally)
ggpairs(iris, aes(colour = Species, alpha = 0.4))

Featureplot

Featureplot in Caret could plot a grid of each variable individually in box/density plot, or pairwise in scatterplot matrix.

library(caret)
featurePlot(x=iris[,1:4], y=iris[,5], plot="box", scales=list(x=list(relation="free"), y=list(relation="free")), auto.key=list(columns=3))

Tableplot

Tableplot will plot a table of each variable. For categoricals it can be a good way to explore its composition. It also shows missing values when present.

library(tabplot)
tableplot(iris)

These methods work better for dataframe with less than 10 dimensions where columns aren’t primarily free-text, as they can visualize numerical and categorical variable relatively easily. More methods that suits different kinds of Data frame (text-heavy, high-dimensional) to be explored.

This is the #day6 of my #100dayprojects.


Related Articles