Hands-on Tutorials
I have found Docker to be a powerful tool for both BI and data science development workflows. Mitigating the "it works on my machine" issues is phenomenal in itself, so why not start from a development standpoint for robust business solutions? Finding a balance between scalability, workflow efficiency, and minimization of software dependency/shared library conflicts can be difficult. However, I have found Docker to be an incredibly powerful solution in this aspect. Through Docker, I have been able to utilize development tools, frameworks, and software languages without installation on a host machine. This approach provided reproducible environments and contained individual processes. The capability to pull a specific version of Python or Julia Docker image to program and develop from without bothering with actual installation or dependency compatibility issues on my host machine has been an incredible experience (not to mention no longer needing to incorporate management of virtual Python environments).
Although this article presents a simple example demonstrating the power of integrating Docker into BI/data science development workflows, it does require some fundamental knowledgeable background for Python and Docker.
Tutorial Overview
The example will be taking the approach of a straightforward interactive and dynamic ML web application pulling parameter dependencies from an external database server, MS SQL Server. The function to predict is the logarithmic spiral function in cartesian coordinates (a vector-valued function), and the web app itself will depend on the Dash framework (Plotly). With Docker Compose, a multi-container environment is easily managed by Docker. One container will be dedicated to running a development MS SQL Server simulating the final production environment. Another container will be for the primary application (Docker Compose will handle the internal networking between these two containers). Each container will reference an environment file for providing dynamic and reproducible environments while mitigating storing static values in-code throughout arbitrary files. These values will be interfaced through a Python script that references a YAML file for determining the environment variable values. This approach may come off as adding superfluous layers of logic, but I have found this template approach to make life substantially easier in the end (especially for when a project scales in complexity).
For managing code structure, additional discrete Python files will be used for running the app, serving the app, holding the prediction function, assistance in the GUI structure for the app, and automating the seeding/scaffolding of the development database. For assistance in scaffolding and seeding the development database, I will provide an approach using a Python script as the initiator of this process, which procedurally calls on a bash file to handle the scaffolding aspect with the assistance of two SQL files.
All that’s necessary for the example project to run on a given machine is for Docker and Docker-Compose to be installed. To have the actual web application to run, uncomment the bottom section of the Dockerfile before running the Docker Compose UP command. An alternative would be to execute "python _dash_appserver.py" from within the running web application container (the name should be _interactive-vector-valued-function-app_dash-appx). If the web application is running, it should be available from port 8080 (this can be changed in the Docker Compose file if needed). Pull a copy of the project code here:
DEV, PRD, and ENV abbreviations will be used in code representing development, production, and environment, respectively. Below is a snapshot of the project structure:

Dockerfile for the Main Application
The main application will derive from a Dockerfile for creating a reproducible and controlled application environment (for both DEV and PRD). Starting from Python image 3.8.6-slim, the project code is copied over along with a definition of the working directory (alternatively, a Git reference could be used here vs. referencing a local directory for the project code). Then a RUN command is used for the dependencies and environment configuration necessary for the application to function expectedly (e.g. including sqlcmd **** from shell). I commented out the bottom section of the Dockerfile that automates running the application as soon as the container is started as a non-root user for promoting a development and testing environment.
Docker Compose Setup
In addition to establishing the container orchestration configuration, when the docker-compose UP command is executed, the .env files holding the environment variables used by their respective containers will be passed in. For example, the variable ‘MSQL_PID’ in _devmssql.env will determine the type of MS SQL Server to spin up (in this case, Developer). If in a production environment for an official process, ensure that a valid license exists from your organization unless attaching to an already existing production SQL Server (either way, this example replicates a production environment to ensure effective debugging and code development). The image below shows the mapping of .env files to their corresponding Docker containers from the docker-compose.yml file.

Web Application Files
There is a main web application file (_dashapp.py) for the primary functionality of the web application with package imports for Dash and custom packages. At a high level, the application layout is defined after imports and initial instantiation of objects needed which gives the overall layout and structure to the application layout. Below is an overview of the expected application visual presentation (straightforward with a plot and a tab list of parameters as the input for the prediction function below it).

Everything necessary for the layout can be defined in-line here except a call to a custom import, the tabbed parameter list for the prediction function. The horizontal tab list will be an array so the custom import, _guisetup.py, will require an output matching this input requirement. This is for managing potential dynamic changes without requiring heavy maintenance of the main application Python file. For each parameter tab, a slider is used for determining the corresponding parameter value.
Due to the simplicity of the application’s user interface, only two app callbacks are required for the Dash framework: one for updating the parameter sliders from a GUI perspective, the other for passing the required input values for the prediction function, and returning the plot object. Updating the GUI sliders is straightforward as can be seen below.
The second callback for calling the prediction function and returning a plot is a little more involved than the first callback function, so wrapping the prediction function itself in a separate file will help compress the code here. After the input arguments have been passed to the prediction function, the expected output is a Pandas DataFrame ready for plotting with Plotly via the JSON format.
The code below is what’s used for serving the app. In this situation, waitress will be used (Gunicorn is another solid option among others for serving Flask-based applications).
Helper Classes and Files
Before an attempt to seed and scaffold the development database can be made, it’s important to know what the production environment structure is for the corresponding database. This project uses a helper GUI file, _guisetup.py, for assisting with the main app functionality and presentation which is dependent on the _appconfig.py file. The _appconfig.py file is then dependent on the config.yml **** file for determining what the environmental variable names are for retrieving the required values. For brevity of this article, not all project code will be shown explicitly.


Seeding the Development Database Server
For setting up the development database server, we’ll create and seed the expected production database, schema, and tables. The start of this process will be included in the docker-compose.yml file under the command section (this is executed when the container starts). Here, a condition will be checked to make sure that we should proceed with seeding a development database via an environment variable from the _dev_app_envvars.env file. If this condition succeeds, proceed with calling the Python generator file, _mssql_db_datagenerator.py (file location: _app/helpers/seeddb). The SQL files used for seeding/scaffolding the development database will require variables to be passed through when called (these variables are shown as "$()" within each .sql file).


Referencing the _get_default_gui_clsvalues() function in the same file, _mssql_db_datagenerator.py, below is the code for inserting the generated data into the development database after executing _mssql_db_seedscript.sh (line 26) for the database scaffolding.
Below is the SQL code called on by _mssql_db_seedscript.sh:
Prediction Function
For the application’s prediction function, two discrete support vector regression models from scikit-learn will be used for predicting the x and y components of the vector-valued function curve respectively (the logarithmic spiral). The code required for training and returning the prediction results as a Pandas DataFrame is mostly straightforward. Because the function to predict is not a scalar function but a vector-valued function (a line curve parametrized by two components), it’s critical to be mindful of the steps along the curve line for both the x and y coordinates.
End result of the web application running will look like this with tabbed interactive parameter sliders for the model:

Conclusion
This working example demonstrates how easy it is to orchestrate multiple processes in an automated approach that simultaneously works across any machine running Docker. Although some parts of this example are arguably over-the-top for a simple curve fitting function, such as having a database to hold values for an interactive dashboard app when they’re already generated from the app itself, the underlying approach demonstrates the efficacy for scalability and AGILE development when applied to business solutions.
I wanted to share the approach of leveraging Docker for BI/data science projects utilizing a working example. Especially given that Docker can be used directly in the development phase, as it reduces the need for installing extraneous software tools or debugging host machine issues due to runtime incompatibility issues. Personally, Docker is a powerful tool for business innovation.