Artificial Intelligence (AI) is everywhere. It has slowly crept away from its original definition and has become a buzzword for most automated algorithms. In this post, I don’t argue what AI is or isn’t – that’s a highly subjective argument at this point. However, I’d like to highlight Computational Intelligence – a well-defined topic.

Motivation
What is Artificial Intelligence? Who knows. It’s an ever-moving target to define what is or isn’t AI. So, I’d like to dive into a science that’s a little more concrete – Computational Intelligence (CI). CI is a three-branched set of theories along with their design and applications. They are more mathematically rigorous and can separate you from the pack by adding to your Data Science toolbox. You may be familiar with these branches – – Neural Networks, Evolutionary Computation, and Fuzzy Systems. Diving into CI, we can talk about sophisticated algorithms that solve more complex problems.
A large community exists within CI. Specifically, within the IEEE, there is a large CI community— with a yearly conference for each branch. I’ve published/volunteered at the FUZZ-IEEE conference over the last few years, and it’s always an excellent opportunity to learn about emerging mathematics and algorithms. Each community drives innovation in the CI space, which trickles from academia into industry. Many CI methods began in academia and evolved into real-world applications.
One of the most common questions I’ve received when talking about CI is, "what problems does each branch solve?" While I can appreciate this question, the branches are not segmented by which problems they solve. The inspiration of the theories segments the branches. So, it’s impossible to segment into their applications. "But Bryce, what is a CI theory?" In a nutshell, each theory begins as a mathematical representation then implemented into an algorithm (something a computer can do). In their own right, each of the branches deserves many articles. In this post, I give a high-level overview and example of each branch working together to solve a problem. As you read this, remember, it’s impossible to do more than scratch the surface with the methods contained in each branch. I’ll be writing more in-depth posts about specific instances of each of these branches, but I want to describe each of these at a high level so you can get a taste of what’s possible.
Neural Networks

Inspiration: "Using the human brain as a source of inspiration, artificial neural networks (NNs) are massively parallel distributed networks that have the ability to learn and generalize from examples." [1]
Each NN is composed of neurons, and their organization defines their architecture. The width and depth of NNs define their architecture; this is where "deep learning" originated – by having deep NNs. In the natural language processing (NLP) realm, the GPT-4 architecture is receiving much attention. For computer vision (CV), I’ve always been a fan of the GoogleNet architecture. No architecture is perfect for every situation, which is why there are so many different ones.
Neurons are the building blocks of NNs, so it’s essential first to understand the neuron. Neurons are visualized as follows,

Two essential steps occur to compute the mathematics at each neuron.
- A dot product – multiplication & addition (inside the circle)
- A pre-defined math function ("f(y)")
The multiplication is straightforward; dot the weights (w) with the input (x). Once you have your answer, you feed it to the math function, f(y). Each of these steps is a slick trick to learn from data. The multiplication is a linear equation, so it can only learn linear relationships. Most of the time, the data isn’t linear. As a result, we need to give the NN the freedom to learn more complex patterns. The pre-defined, non-linear function (an activation function) allows the NN to learn non-linear relationships.
In the previous image, we viewed a single neuron, but to create a deeper network, we may have something that looks like this.

Architectures become increasingly complex with each neuron. I suggest looking into how many parameters GPT-4 has ;). Now, you can imagine how many different architectures you can have with the infinite number of configurations. Of course, hardware limits our architecture size, but NVIDIA (and others) are scaling the hardware at an impressive pace.
So far, we’ve only examined the computations that occur inside the network with established weights. Finding suitable weights is a difficult task, but luckily math tricks exist to optimize them. If you’re interested in the details, I encourage you to look up backpropagation. Backpropagation exploits the chain rule (from calculus) to optimize the weights. For the sake of this post, it’s not essential to understand how the learning of the weights, but it’s necessary to know backpropagation does it very well. But, it’s not without its caveats. As NNs learn, they optimize all of the weights relative to the data. However, the weights must first be defined – they must have some value. This begs the question, where do we start? We commonly assign random values to the weights. Unfortunately, poor initialization may lead to a suboptimal solution. So, the common practice relies on training the NN many times with different initializations hoping to find the best. This problem is a direct result of using backpropagation – finding a locally optimal solution does not mean we have the BEST solution. There are plenty of keywords in this paragraph to Google for more in-depth.
The point is that NNs are a mathematical framework representing complex functions inspired by the brain’s neural structure. NNs learn many functions, but the most common is regression (classification is technically a type of regression -think about it). In future posts, I will get more in-depth of complex architectures and their use-cases. But, consider an example where a network classifies wine as either good or bad. Example features (inputs) for the wine are its sweetness, acidity, and alcohol level (there are many more possible inputs, but let’s limit it to three). Let’s also restrict this problem to a single neuron. Given enough data samples good and bad wine (relative to these inputs), the NN can learn to classify a wine as good or bad from these features.

Remember what our problem may be? Random initialization. Fortunately, Evolutionary Computation offers a solution (note: I’m focusing on this issue so that I can tie all three branches together). For a more in-depth look at NNs, I recommend this article.
Evolutionary Computation

Inspiration: "Using the biological evolution as a source of inspiration, evolutionary computation (EC) solves optimization problems by generating, evaluating and modifying a population of possible solutions." [1]
Genetic Algorithms (GAs) are likely the most popular algorithm belonging to EC. Particle Swarm Optimization, Ant Colony Optimization, Genetic Programming (among others) also belong to evolutionary computation, but we’ll limit the scope to GAs.
- The evolutionary process inspired the creation of GAs. GAs simulate many generations of the problem at hand to find the best solution. Like NNs, GAs attempt to optimize a cost function, but in GAs, it’s called a fitness function. Fitness functions are flexible in what they can model, but they all have the same components – chromosomes and genes. The building blocks of each chromosome are the genes. Similar to NNs, genes are the weights the GA is optimizing. The steps to the algorithm are as follows,
- Generate N chromosomes (random)
- Evaluate fitness of each chromosome (fitness function)
- Select parent chromosomes for the next generation (selection)
- Create children chromosomes from most fit parents (crossover)
- Alter any genes? (mutation)
- Repeat from step 2 until happy (convergence)
For a deeper explanation, check out this post. GAs possess the ability to find a better solution than a NN, but it doesn’t guarantee they will. One of the parameters may be close to the perfect solution, but it cannot recognize it. No perfect optimization algorithm exists, which is why there are so many. However, GAs can find great answers when appropriately used. Consider our previous wine example. We can use both a GA and NN in tandem. In this case, the GA can first identify the initial set of weights (each set would be a chromosome), and the NN can then optimize using backpropagation. Using these algorithms together isn’t necessary (because we can probably find a good solution with either), but I want to highlight how we can use these algorithms together to solve the same problem.

Fuzzy Systems

Inspiration: "Using the human language as a source of inspiration, fuzzy systems (FS) model linguistic imprecision and solve uncertain problems based on a generalization of traditional logic, which enables us to perform approximate reasoning." [1]
Full disclosure -I’m biased towards Fuzzy Systems, so I’ll try to stick to the facts. There are many ways to explain this topic, but I like to start with Fuzzy Sets. Traditional set theory forces elements to belong to one set or another. For example, a horse belongs to the set of mammals, and a frog belongs to the set amphibians. This framework (aka crisp set theory) works for situations with precise segmentation; however, the world isn’t this precise. Consider representing the color gray. When does it belong to the white set? When does it belong to the black set? In crisp set theory, we have to decide which one. But, in fuzzy set theory, it can belong to both with a degree of membership. The membership function (or characteristic function) computes the membership degree. The size of the membership degree is how much an element belongs to a set. The following example shows what a membership function may look like when we plot against the full range of values. We can look at the following plot and estimate the a raw value of 50 will have a membership degree of 1; however, as the raw values approach 100, the membership degree quickly approaches a membership degree of 0.

Many different functions can represent the membership function; I’ve just shown one example. These functions form the basis of the theory because we begin to describe the world mathematically with them. Once these are defined, we begin to develop more complex representations like fuzzy rules.
Fuzzy rules are similar to traditional logic rules; however, they are FUZZY. Let’s consider the following rule about the alcohol content in our wine example (I know the language isn’t perfect, but stick with me):
If acidic, then we will like it.
With crisp logic, we define a threshold. This threshold draws the boundary between what we consider to be alcoholic or not. If we draw this line at an acidity level of 4, everything below 4 becomes not acidic (including 3.99). We know this doesn’t make sense. We probably will like it if it’s 3.99.
Enter Fuzzy Logic. Applying fuzzy logic to this rule, we use a membership function to capture both "acidic" and how much we will like it. Without the hard threshold, we’re able to interpret this statement with more human-like intuition. The mathematical output may look like this :

In this example, we assume only one feature matters – acidity. But, as we add other conditions, the rules become complex, and we must aggregate the membership values. For example, what if we only like sweet, acidic wines? Then we must create a rule that models both features, not just one. For a more in-depth look at fuzzy logic systems, check out this post.
Conclusion
AI is ubiquitous. The term has flooded our lives, and it has lost its flavor. But, as a data scientist / ML engineer / AI engineer (whatever you call yourself), we can hold the community to a higher standard. We can be specific with our algorithms, so showcase our work is more than a series of pre-defined if-then statements. Granted, I know there are intelligent algorithms outside of this framework, but this is a realistic way to discuss our work and highlight the uniqueness of our methods (if you’re using these). If you’re new to CI, I challenge you to find applications, extend the theory, and become a part of the CI community (I’ll be the first to welcome you 😉 ).
References : [1] https://cis.ieee.org/about/what-is-ci
Bryce Murray, PhD, is an Applied AI Scientist at Two Story, where he builds algorithms to operationalize emerging technology and scale people analytics. His work on eXplainable AI data fusion has been published in IEEE Transactions on Emerging Topics in Computational Intelligence and elsewhere. Bryce’s areas of expertise include data fusion, deep learning, Machine Learning, and fuzzy logic. He earned his PhD in Electrical and Computer Engineering at the University of Missouri.