Software-dictated-hardware
Software is eating the world and Artificial Intelligence (AI)/ Machine Learning (ML) is eating the software.
More and more business leaders are coming to the same conclusion that every business, irrespective of its end product or service, must integrate AI tools and techniques into its business processes and roadmap planning. This also means a changing scenario for the Information Technology (IT) workloads for that organization as more and more AI workloads/ tasks consume the computing infrastructure – whether on-premise or on the cloud.
And the interesting observation in that aspect is that these so-called "AI workloads" are best suited to run on a certain configuration of hardware. They can, of course, run on the traditional general-purpose computing infrastructure that an organization might have planned and put in place 10 years back. But the best return-on-investment or faster time-to-market can only happen if the hardware is thought out and optimized for the specific AI workload that an organization plans to run.
The last few decades of the 20th century (and maybe the first decade of the 21st century too) saw Moore’s law-fueled growth in general-purpose computing machinery and software was tailored to suit the need and limitation of the hardware.
We are deep into the 21st century now. The software stack (mostly AI) starts dictating the hardware requirements and specifics.
There are plenty of high-quality articles written about this topic. In this essay, I want to point out a few less-mentioned and underrated features and inter-relationships that we may want to keep in mind while designing the best hardware combination for our AI workload.
CPU is not dead
The choice and configuration of CPU often get a step-brotherly treatment when it comes to discussing AI-specific hardware. That’s because of the prevalence and dominance of GPU in the most popular AI/ML tasks – Deep Learning (DL) in Computer Vision (CV), Natural Language Processing (NLP), or Recommender Systems.
But the actual scenario is evolving.
A vast array of non-GPU-optimized tasks
However, a great many startups and established tech giants have started going beyond the straightforward DL workloads and mixing interesting pre-processing or parallel blocks of computing around DL platforms for their business.
Almost none of these ideas are brand new. A majority of them have been used only in very specific situations before or have not found widespread use beyond the academic world or their niche application area. What is interesting is that they are now being fused with the AI/ML workloads and playing an important role in the performance and robustness of the entire business platform of global tech giants.
Some of the examples are listed below,
- Molecular dynamics
- Physics-based simulation
- Large-scale reinforcement learning
- Game-theoretic frameworks
- Evolutionary computing
Because of their historical development, a majority of the code and algorithms for these types of computing tasks are not yet highly optimized for integrating into a GPU. And that makes the choice of CPU for these tasks highly critical for a startup or an IT organization when it designs and decides a particular server system.
Omniverse example
Recently, Nvidia has launched the Omniverse platform to revolutionize the way product and process designers work collaboratively in multiple fields – architecture, media, game design, manufacturing, etc. The goal is to converge high-performance computing (HPC), AI/ML models, physics-based simulation, design automation onto a single platform.
Clearly, such a confluence of disparate fields that have traditionally used disparate computing paradigms and styles need careful optimization on the hardware side – balancing the need for CPU/GPU/memory.
Another AI-task example
A typical example will be where the CPU has to do a lot of custom feature-processing involving missing or messy data up-front and only send a small fraction of good data for training a DL model. It is entirely possible that the total computing time and energy expenditure is bottlenecked by this data wrangling and data I/O with CPU and onboard storage, not the DL model training or inference.
Or, there could be a multitude of application-specific signal-processing to be done at the front-end to extract features before a pre-trained model can be run on a GPU for inference. Again, the bottleneck (and the chance to improve) is on the front-end CPU side.
The choice of the number of cores, boost, L1/L2 cache – all the considerations pertaining to a CPU family – is way more critical for such a server design than spending time and money on the best GPU.
Example of a critical CPU generation choice
Sometimes, the choice of the CPU generation may depend on subtle considerations such as whether the AI workload demands a totally asynchronous communication between the main memory and CPU.
Some types of AI and data analytics code take advantage of such communication and some don’t. When not needed, it can be overkill as the CPU generation, which enables this hardware upgrade, can be costly and more power-hungry than the previous generation. As in any engineering decision-making, optimization is critical in this scenario.
Everybody knows about the concept of TCO – Total Cost of Ownership – of a server system. But there is an increasingly important concept of TCE – Total Cost to Environment – that factors into this kind of optimization study.
Mix and match also help optimization. Again, the mixing strategy needs to be tuned to the application workload.
Underrated considerations for GPU
Even when the AI workload consists of a large portion of DL model training and inference, a hardware provider or designer must get into these conversations with potential customers.
CPU-to-GPU progression
Main DL frameworks such as TensorFlow and PyTorch have become a lot more flexible – only in recent days (past 1–2 years) – to be used as a generalized ML problem-solving tool. This paves the way to turn conventional CPU-optimized problems into tasks suitable for GPUs.
However, not all data science/ ML teams are equipped enough (or bandwidth-limited) to take advantage of this progress. For them, spending a lot of money on the latest GPU without a good codebase is still a resource waste. In those situations, a mixed server deployment is a good starting point.
Support for lower precision numbers
A great many DL workloads can get sufficient accuracy – or, more importantly, business value – using lower precision arithmetic in their models and parameters. This, effectively, allows organizations to run bigger models on the same hardware.
However, not all GPU families support this feature natively. Most older-generation GPUs need a lot of software and programming tweaks and optimization to make this possible. If a customer can take advantage of this feature, hardware providers should advise accordingly and promote the GPUs suitable for this tweak.
Multi-GPU cluster mapped to the right software/AI stack
A lot of multi-GPU systems are built without proper planning or thought in the AI/software stack that can truly take advantage of such a configuration. Apart from traditional DL tasks, a lot of distributed data analytics tasks can also take advantage of multi-GPU systems (check out libraries such as Dask, RAPIDS, or Ray).
Therefore, mapping the exact software stack and multi-GPU system configuration is also a critical consideration that will grow in importance.
Edge – the new dark horse
The future evolution and progress of AI-optimized hardware cannot be complete without a mention of the dark horse – edge and hybrid cloud.
Form factors to change
Hardware designers must adapt to expect unusual demands on the form factor (even a departure from traditional 1U/2U/4U designs). True edge devices often have some unusual installation limitations – height, space, weight, etc. If you imagine the edge servers being installed close to a large machine in a manufacturing plant, you will get the idea.
Inference-focused or analytics-focused?
Automotive, manufacturing, IoT systems are going to be the main drivers for edge-focused servers and computing infrastructures. It is easy to imagine that most of the AI workload on such servers will be inference-focused.
But, increasingly, sophisticated visualization and business analytics tasks (often involving large database extractions) are being imagined for the edge-intelligence platforms. That kind of workload needs a somewhat different hardware configuration optimization than one focused purely on AI-inference tasks.
Object-storage, video analytics, security considerations
There are new and exciting edge computing and edge-analytics application areas that present newer challenges to realize an optimal hardware configuration. Some of these questions are worthy to ask
- Is there a need for object storage on the edge?
- Is there a continuous video analytics task running on the edge?
- What are the data retention and security considerations for the edge data (raw and processed)?
Summary and conclusions
Overall, we see that the design and installation of enterprise-scale computing infrastructure is not a one-way street anymore. With the advent and rise of AI/ML-focused workloads in a typical business IT and engineering environment, the mapping of exact needs and limitations of such AI workloads to the hardware configuration is ever more critical.
Multi-pronged strategy, an optimization-focused mindset, close relationships with the software teams, and in-depth knowledge of modern AI/ML tools and techniques are a must to succeed in this scenario.
Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.