Data visualization is not only a means of analysing data and communicating results after cleaning the data, but also a way to make sense of the entire dataframe at initial stage. Here I illustrate 3 simple methods (2 lines of code each) for visually explore the entire data frame’s composition, correlation and distribution.


ggpairs will plot of a matrix of variables and use the suitable visual types depends on whether they’re categorical or numerical.

ggpairs(iris, aes(colour = Species, alpha = 0.4))


Featureplot in Caret could plot a grid of each variable individually in box/density plot, or pairwise in scatterplot matrix.

featurePlot(x=iris[,1:4], y=iris[,5], plot="box", scales=list(x=list(relation="free"), y=list(relation="free")), auto.key=list(columns=3))


Tableplot will plot a table of each variable. For categoricals it can be a good way to explore its composition. It also shows missing values when present.


These methods work better for dataframe with less than 10 dimensions where columns aren’t primarily free-text, as they can visualize numerical and categorical variable relatively easily. More methods that suits different kinds of Data frame (text-heavy, high-dimensional) to be explored.

