GUI’s or coding: Production vs. Operation

As we enter deeper into the era of data and computer science, a debate for the tools we use comes about software as GUI or coding. Which one should we use after all?

Valeria Fonseca Diaz
Towards Data Science

--

Many researchers or professionals who never thought of dealing with data and analytics found themselves in the last years diving into this expanding discipline of data science. Even scientific research in the last century required a lot of academic researchers to deal with data analysis at a much smaller scale for whom building even simple linear models did not turn out to be an intuitive task and served themselves with a wide set of options of GUI software to perform such tasks.

As our knowledge was never as solid about data science as it is right now, we also find ourselves clarifying our mindset to perform analytics tasks and our culture when developing data science projects. Here’s where the choice for building our analyses using GUI’s or coding them comes into play.

User-friendly software finds its merit in its own name, designed to be used by researchers who don’t have a solid background in programming to enable them to perform analysis tasks. In data science, it could include a nice visualization or a summary table. However, when it comes to data science, we no longer mean to provide a figure or a table only, we refer to building pipelines that perform a series of different tasks on input observations. In this regard, our final product is complex and so our tools to build our products need to be robust as well.

There’s nothing intrinsically wrong with using GUI software, but we need to understand when it can be used and when it doesn’t satisfy robustness.

Production vs. Operation for Data Science. (Image by author)

User-friendly software is not a robust production environment

The main distinction that needs to be taken into account when debating the use of GUI software vs. coding is the difference between developing/producing an analytical pipeline and operating a built pipeline or launching it for its use. This distinction has historically been very clear for computer scientists and software developers under DevOps practices, and now it is time to start clarifying it also for data scientists/statisticians or any researchers who find themselves performing data analysis.

To use one data analytics instance, we can think of building a prediction model. Classical statisticians or researchers who have dealt with linear models to publish p-values can place themselves here, as calculating p-values is a by-product of building linear models. Building a model requires several approaches such as cleaning the data, filtering information, testing different model architectures, possibly optimization under different criteria, etc. Many GUI’s software contain the tools to carry out this task, panels for the different aspects, drop-down menus for different options, and at the end, it is up to the user to test (many) different alternatives and save the output to make a final decision.

The scenario described above is equivalent to build a car next to the highway with a tent where all the necessary tools can be stored. Similar it is to build a computer on the working desk of the office. It could be done, but should it? In these 2 examples, it is easy to differentiate the 2 environments: There’s a production environment which is generally divided into different environments depending on the task to build it (construction of pieces, ensemble, etc.), and there’s an operating environment which is the highway or the working desk at the office.

Programming is for everyone

Even though the production of a model could very well take place in GUI software, this cannot defeat the robustness and accuracy that comes into place when we build our analytical tools by programming. Programming or coding is becoming a necessary skill for the current century with the growth that computing and data science is gaining at such a fast rate. Building our models or pipelines for data analysis needs to start gaining its separate place from their operation environment. The more we dedicate a clear and robust environment to produce our models, the more accurate machines we will build for later deployment.

Producing a model is not a user-friendly task, but operating it certainly is.

Even though programming is still a challenging concept to digest for plenty of professionals, the concept is much more user-friendly nowadays than has ever been before. Programming languages like Python and R are highly intuitive to understand in their syntax and their paradigm, without mentioning the open-access facility. They don’t come around with a click but they do come with significantly less effort than learning lower-level languages like C. Note here that “significantly” comes without a p-value to prove it.

GUI’s: An environment for operation

It is therefore not the case that GUI’s are not useful at all. User-friendly software, as its name declares, is meant to be employed for a user-friendly task. Producing a model is not a user-friendly task, but operating it certainly is. In this view, we can and probably still should rely on GUI’s as a mode of deployment for our models or data science pipelines.

The one aspect that GUI’s cannot beat against coding: Reproducibility

An extra aspect that cannot be left out of the discussion is the principle of reproducibility which is of special importance in sciences that are developed through a computer. Reproducibility is an emerging topic of discussion across many fields of both, academia and the industry. Among the different discussions, we find debates on what reproducibility means for AI, for science, and the value it represents. Even though a coding culture does not guarantee smooth reproducibility as has been debated because much more is needed, it certainly defeats the drawbacks that GUI’s pose, such as being fully user-dependent.

As we crave becoming better data scientists and the field continues to grow, especially at such a high rate, it proves necessary to make room for developing a new skill and to shift our working culture from a faster production result to a robust production environment for data science.

--

--

Data science researcher. Co-creator of MV Learn. Enthusiastic writer, technology ethics reader