Thoughts and Theory
This article suggests a novel way of simulating something that might resemble general intelligence. It is argued that a fixed natural framework housing agents with fixed weak physical attributes and flexible ‘brain’ architectures can be evolved with a genetic algorithm following simple rules. Additionally, embedded in the genes of agents are networks that determine their own reward and punishment functions that are used individually to learn. A fractal memory system to write to and read from is incorporated in the brain architecture of the agents, where the brain design also mimics some characteristics of Long Short-Term Memory Recurrent Neural Networks.
Table of Contents
- Introduction
- Definition
- Simulation
- Framework
- Physical Attributes
- Simulation Constraints
- Brain
- Concluding Remarks
1 Introduction
Simulating artificial general intelligence has appeared to be a harder problem than previously thought [1]: progress in the field of machine learning has proven to be insufficient to complete this challenge. This article suggests a way in which ‘intelligence’ can be simulated, arguing an evolutionary approach is at least one option.
2 Definition
What we as humans define as intelligence is hard to put into words. If one would ask around to see how people define this term they would logically end up with varying answers, as is the case for probably all concepts. Still, the word ‘intelligence’ is a relatively broad concept when compared to other ones.
Without agreeing on one definition, one can not easily simulate intelligence artificially so that all spectators would agree that it is: usually, we all think a different set of facets of human behaviour can be attributed to ‘intelligence’, although they might be similar [2]. Notice, however, that it is human behaviour, that embeds something that we usually associate with intelligence. When we call other creatures intelligent, like elephants or dolphins, it is usually because we recognize human-like behaviour in theirs. So it seems like people generally attribute the term ‘intelligence’ to what humans have and other creatures have considerably less of. It is clear that humans, who do not have phenomenal physical abilities were able to survive for a large part because of their intelligence [3, p. 23].
3 Simulation
If this definition is considered, it logically follows that intelligence would be possible to simulate by in turn simulating digital creatures to have this same aspect. Namely, in nature, the reason for a creature to have certain characteristics will be either or both because of natural selection and sexual selection, processes that are easy to replicate artificially.
To generate such agents one could in principle code these agents from the ground up. Nevertheless, this is equivalent to attempting to build intelligence from the ground up which means you should understand how it works (and somewhat defeats the purpose of trying to simulating it).
Next, one could let evolution do the work and use a nature simulation to try to develop intelligent digital ‘life’. For this option, too, there are complications. Namely, to evolve intelligent life digitally in a copy of our natural system would firstly be scientifically impossible as it would require a complete understanding of nature and humankind simply is not as far yet. Secondly, to get to a simulation state where intelligent life would have been able to develop would take a huge number of time steps, consequently requiring astonishing amounts of computing power, even when using the slight abstraction of the part of nature that we understand and is expressible in code, this is without a doubt a computationally infeasible task.
While one could attempt to use a relatively high-level representation of nature with the same intent that is feasible computationally, it does not seem possible to develop intelligent life as is defined in this article because the neural processes at the root of it can only be built from small building blocks exclusively found in low-level simulations. Unless certain adjustments are made to parts of the simulation processes, there might not be an acceptable probability of success underlying such experiments.
One adjustment proposed by this article is to split the artificial neural evolution from the evolution of the natural framework that exists around the agents, while also fixing the physical properties of the agents. There would be no utility in having the natural framework around the agents evolve, too, at least not enough to make its trade-off with computational costs worth it. The same goes for the physical properties of the agents, but more is explained later.
4 Framework
As mentioned before, a framework capable of simulating this should first consist of a relatively high-level representation of nature in reality for computational burden-related reasons. Second, it should allow for the substitution of physical attributes important for survival by mental ones. To save time, one conducting the proposed experiment would profit from using an existing one.
An environment that would suit these prerequisites is the game Minecraft. As is clear by the workings of the game, both conditions are met. Namely, an important part of Minecraft is that players augment their in-game base capabilities by crafting certain tools, building shelter and gathering food, thereby improving their otherwise poor chances of survival if only physical attributes supported them [4]. The standard game mode is literally called ‘survival’ mode. There are more candidates that would also meet the requirements, like the game Rust, but Minecraft is suggested as there is a plethora of aspects to the game that facilitates a sizable amount of creativity and complexity in players’ behaviour.
5 Physical Attributes
Thus, the agents should be able to evolve mentally but have their weak fixed physical attributes from the outset, as it is not the point of the experiment to see what physical combinations subject to a constraint function would be locally optimal and this carries computational costs. As these properties are fixed, a framework now has to be set up for everything that has to do with the physical characteristics of the agents (leaving regular player controls and their results alone, for now).
Regarding reproduction, let us say that the option is chosen to keep the model closest to the human species in the sense that there would be an opposite-gender prerequisite to reproduction. Note that this should not necessarily be the optimal choice. For example, one could also say any n number of genders exist and are needed to reproduce themselves by some genetic reproduction rule. It is not clear how well these options would work or how well even the infinite collection of different rules would, so it should be considered which rule to use.
Regarding the appearance of the agents, traits like inheritance of looks along with visible age and gender cues can be incorporated by giving every agent a skin of one RGB colour determined by parental genes with some coloured bands around limbs corresponding to age and gender (bands with an outline being an obvious requirement, where the outline has a standard colour to stop these bands from blending in with the colour of the agent’s skin). Hopefully, these can give rise to a deeper level of social interaction and constructs formed by the agents.
6 Simulation Constraints
As the title suggests, this article is mainly about the loss function of intelligence. More will follow on this, but for now, a good summary would be that the loss function of intelligence is determined by the agents themselves in the simulation. By choice, the simulations are selected on how long they have run before the agent species becomes extinct. As the agents have weak physical capabilities, mental ones are crucial for survival. The better these mental capabilities, the longer the simulation will run and thus the more likely the starting genes will be used for the following batch of simulations. Nevertheless, there should be flexibility when it comes to this aspect of genes propagation and so this selection rule can be changed.
The simulation environment itself can also be adjusted to be harsher for the agents if most simulations do not end up with extinction within running time. Then, one option is to set the agents free in vanilla Minecraft, but this might cause problems as it might be sufficient for the agent to survive without forming complex mental abilities also mentioned in the aforementioned. Therefore, it should be forced upon the environment that survival is almost impossible for the agents without mental capabilities helping you.
After all, humans would not have come so far without their mental capabilities [3, p. 23]. It is also thought that the Neanderthals, a physically stronger species from another branch of the man (Homo) genetic tree, became extinct because they had better physical traits which means that natural selection did not push them as much to develop the mental constructs that Homo Sapiens developed. Humankind may have even used this advantage over this species counterpart and drive them to become extinct [3, pp. 19–21]. Thus, the digital species should be made progressively more feeble, hungry and have more mobs to defend themselves against.
Furthermore, to limit the computational burden, an old and relatively simple version of the game could be used, perhaps with the removal or addition of some aspects (to stimulate certain behaviour). One logical compromising measure would be to limit the world size to a relatively small number of game chunks.
In each simulation, a fixed number of agents should be spawned at the beginning, a number that limits the computational burden, but still allows for natural and sexual selection together with social interaction. A suggestion of this article is that agents also have a fixed limited lifespan and that the simulation will start over when the population dies out. As mentioned earlier, bands around limbs on the skin will dynamically display a colour pattern displaying age so that agents can retrieve this information.
7 Brain
7.1 Modular Architecture
For the ‘brains’ of the agents, one possibility would be to not have a single design for them at the beginning of the simulation but a randomised construction made up of several artificial neural network components and conforming to some constraints such as it having an input and output vector. For example, certain components can be taken that are also used in a CNN, LSTM RNN and in Transformers. It would be nice if evolution could decide (by natural selection and the genetic algorithm) how these modular pieces should be combined and ordered, but this is not an easy task to express in code. Sure, the random generation of the brain of any particular agent is possible, but how would one go about combining the two parental ‘brains’ to create the neural network of the child? Furthermore, this method can take a really long time as obviously weak combinations are tried as well to approach the local minimum, and there are a lot of possible combinations. Thus, this method would require loads of computing power, and so this article suggests a humanly designed model. This might take a toll on the success of the simulations as the modular implementation is one of the factors that is potent to produce surprising results.
7.2 Senses
This article suggests a novel architecture precisely structured to mirror some of the characteristics of the human brain. First, it has to be established what characteristics there are to a human brain that might be desirable to incorporate. To begin with that, we start at the inputs the brain receives. Usually, the five basic senses like sight, hearing, taste, smell and touch come quickly to mind.
In Minecraft, the first two definitely make sense to incorporate. The world that can be observed could be made less crisp than in the regular game to spare computational costs. Hearing could be fed to the agents in stereo and possibly compressed for the same reasons as for sight.
Taste and smell usually are necessary in real life to give cues about the healthiness of what is (to be) eaten. In Minecraft, the probability of eating something random and it not being healthy is slim. Except for a few items, the trade-off between accuracy and computational burden could be struck at a point where a signal is given when eating or drinking something healthy and another when eating or drinking something unhealthy, not where all items have a unique ‘taste’. Perhaps there even is no need to strike such a trade-off as there is no obvious reason to think why this aspect is crucial to the development of mental capabilities.
Regarding touch, it seems hard to implement something that mimics closely how we humans receive signals from touch in Minecraft. Nevertheless, a computational and time-saving compromise could be to give the agents information on where surfaces of blocks are around them in the area of the 4 x-y block coordinates closest to the agent. This aspect can possibly be excluded, too, for the same reasons given for taste and smell. Internal signals, like pain, hunger, temperature and the feeling of happiness can for a large part also be incorporated.
Between two instances of receiving inputs, the change in health can be used to feed forward pain signals to the brain. For hunger, the same holds, but the level itself also contains important information as it does for health, too. These should thus also be used as inputs to the model. Next, temperature seems hard to implement as there is no such thing in Minecraft and does not seem to be necessary for the development of mental capabilities.
Lastly, the Vestibular and Proprioception senses, relating to movement and body position, respectively, can mostly be incorporated by using the outputs of the network as inputs, but more on this follows now in the discussion of model outputs.
7.3 Outputs
The output of the model should include all regular player controls but sound, too, perhaps facilitating the development of sonic communication. Furthermore, it should include the option to mate with the agent looked at and close to. If the other agent also outputs the signal for some time while looking at the other agent, reproduction should take place, but not ‘for free’. As agents will perish after a fixed amount of time all other things being equal, the best simulations according to the rule at hand would simply be the one where all agents spam the reproduction ‘button’. Therefore, it should cost something like a portion of the hunger bar to reproduce, with offspring starting out with insufficient hunger to reproduce.
Finally, the level of happiness that is determined by the model should be both an input and also be used for learning. Suppose there is a network that uses the outputs and another input vector to determine the reward or punishment relating to the time step, for example with a hyperbolic tangent activation function. Notice, though, that propagating this reward backwards will cause the network that leads from the output nodes to the reward and punishment nodes to have its weights and biases approach infinity without adjusting other weights and biases in the brain in a meaningful way to optimise chances of survival. Therefore, it is suggested that the value of this reward-punishment node is still determined by feeding the input and output vector through a network but that the weights and biases of this network can not be nudged at all. Instead, only the weights and biases in the weights and biases in the networks before the output nodes are nudged by stochastic gradient descent.
The genetic algorithm of separate simulations by time before extinction will serve to optimise this network in a way that improves the chances of survival. As is previously mentioned the happiness or here the reward or punishment for any time step of the agent should also be taken as an input because it seems that humans have this ability, too by feeling happy or sad to a certain extent. This is the gist of the loss function of intelligence. After many simulations and their selection, it is expected that the determination of happiness will be structured as such to maximise the chances of survival by fostering strong mental capabilities.
7.4 Memory
Now that the inputs and outputs have been discussed, one part that can be seen as the finalising one here before discussing the links between them is that of memory. This article suggests that fractal tree-like memory read from and written to by neural networks encompasses all capabilities of a short term, long term memory system and one in between. One exemplary fractal of m layers and splits of size n for every joint would be one that starts at the entry point and can arrive at any of the n^m stored memory vectors. At every step of the fractal, a null route can also be taken which returns a null vector.
The reason for this fractal tree-like memory system is because it is similar to a system of categories, where looking up specific items is quick and structured. In a Roman single-word dictionary, for example, looking up the words can also be displayed in a fractal tree-like way, where there are 26 letters for every layer and one route to indicate an ending path for the word formed. The number of layers of this network will correspond to the longest Roman word. Intuitively, dictionaries are constructed in this way because a random ordering of the words would prove immensely more time inefficient to find the relevant word. Similarly, a huge neural network that serves to pick one stored vector out of many would take more computing power to use practically.
A construction similar might hold for the brain, too. The memory system suggested submits the input vector it is provided with to the unique neural network (weight & bias wise) at the given intersection arrived at which will then classify the vector by outputting a number for all of the n+1 categories to proceed in. Then, the way to go will be determined by the highest output value out of all options and the null route. Finally, the system will arrive at the designated memory vector that will be read.
Now, when the reward or punishment of the model is propagated backwards, the reward or punishment relating to all elements of the memory vector will be added up to serve as an aid in determining a substitute for a real gradient of all weights and biases that have been used to pick the memory, because this memory construct is not differentiable. For all weights and biases not used, the gradient used will simply be zero. Given that on average every level higher in the memory fractal is passed roughly n more times (if null routes are not often taken), the reward or punishment from the vector will be divided by n for every layer higher in the tree to then determine the ‘gradients’ of the relevant weights and biases.
When the brain has processed all input signals and memory, the input signals and output vector can be written to the system by finding a way in the same formatted fractal tree but with a different network at each node. These memories are the ones that the fractal memory reading network discussed earlier is able to read from. Notice that the memory system can also choose to not write or read any vector at all to or from the system, respectively.
As is clear, some ending points in the fractal memory reading tree might be used more often to write to. For humans, this could be compared to us remembering some things for a long time (so rarely these endpoints are written to) and some for a shorter period (endpoints that are overwritten more often). As the writing network is not adjusted by backpropagation but just by the genetic algorithm, memories that are recalled often are not necessarily frequently changed.
7.5 Neural Architecture
A graphical representation of the brain network will now follow, but first, the icons used and their meaning are introduced:
With these established, the total agent network is given below:
At 1, the current input vector is concatenated with the output vector of the previous time step and a vector that will be named the interpretation vector, so that all external signals comparable to those that a human brain receives are fed through the network along with something that sort of serves as a cell state that can be found in LSTM RNNs. After this concatenation, the vector is copied and sent two ways.
At 2, one of those copies arrives at the fractal memory reading system, where at each joint a separate simple neural network classifies the direction the vector is to be propagated. ReLU could be chosen as the activation function because of lower computational costs, but the sigmoid function is chosen here. As the highest output is to be used in the determination of which direction to continue the propagation of the memory reading system, a sigmoid or hyperbolic tangent activation function will more likely have a strictly ordered list of preferences and not have to choose randomly between all elements if all are attributed an activation of 0 returned by the ReLU. The position that the memory system ultimately arrives at will contain a stored vector of equal size to the concatenated vector at 1.
At 3, the memory vector is concatenated to the input signals vector that then first converts this long vector into one vector of equal size to senses vector and output vector together. This vector is called the interpretation vector and will be copied to be used in the next time step.
The hope of using this interpretation vector is that the network will interpret the input signals and memory vector in a way like we humans do, too. When one is concentrated on a particular task like watching a circus performer breathing fire, much of the neural information coming from our senses feel to be prominent in the determination of our outputs. When one is closing their eyes, though, and trying to sleep, memories of the man breathing fire might arise. Although less spectacular than in real life, memories can be very vivid, allowing you sometimes to recall what somebody looks like you met and visualising them in your mind. Occasions in between, occur too, probably a lot more often. Suppose you are drawing a person breathing fire. You probably will think back to the man that spit fire for some moments, then continue drawing. While you are memorising the man, your eyes still feed your brain all signals relating to the environment around you, because you never closed your eyes. But were you really aware at the moment of memorisation of your surroundings? Was it not as if a man spitting fire had blended in with your periphery and took up some of the neural space that relates to your brain’s visual perception? Many more examples like this can be given relating to sound, smell, feeling and taste, but the point of this explanation is that we do not directly react to the literal external signals our brains receive, but to its interpretation of those signals.
Thus, at step 3, before the determination of outputs, an interpretation of all external signals is constructed by means of the sigmoid activation function. The outputs of the model will also be determined with the sigmoid activation function.
At step 4, the input, output and interpretation vector at the current time step are concatenated. This vector is then used to determine the total reward or punishment of the network, denoted here as E for evaluation. The weights and biases of this network can not be nudged, for reasons mentioned before. This network will thus also be optimised by the genetic algorithm of the simulations.
At 5, before this vector is fed forward in the memory writing network, a neural network will select the properties that it thinks are helpful to save. The memory vector is then saved or not, depending on if it is kicked out or not at any joint of the fractal tree to not be saved at all and discarded like previously explained. The reason why this fractal memory tree is outside of the green box is that it is not subject to optimisation by reward or punishment of the network. All networks inside of it though, are.
Why it is chosen that this network is not subject to optimisation was previously highlighted but one can also elaborate on this explanation by supposing for a moment it was. If the agent puts out a reward signal that is backpropagated the same route through both the fractal memory reading and writing network, not only will the particular path be chosen more often when retrieving memories, it will also be chosen more often to save memories in. Now, it can be the case that the memory vector saved at that position is perfect for helping the agent make decisions in certain situations, so retrieving the memory more often when it has been beneficial to the output values seems fine. However, when a vector is more likely to be saved in that particular position, that precious memory might get lost. A similar argument can be made for the case when there is a punishment that is backpropagated through the memory network.
Thus, it is chosen that the fractal memory writing network is optimised subject to the genetic algorithm of the simulation. It should be noticed, though, that the fractal memory network used for writing could also include a network that decides how to combine the submitted vector with the vector previously saved in the endpoint in question, so this should be considered.
As a visual aid, the recurrent form of this network could be displayed like this:
To illustrate why it is argued that this type of network would be able to develop intelligent behaviour is by giving a couple of examples of how this model could in theory perform the following tasks.
- Regular verbal communication: suppose an agent in the simulation receives input signals so that it feels the urge to say something. How will this urge result in the corresponding sequence of outputs that make up an understandable sonic message to other agents? To answer this question, it would be best to start at what serves as the urge to communicate this message. As an agent processes the input signals, its brain might be structured in such a way to respond by creating a mental map that represents the actions to be performed. A simple abstraction that would serve as an example would be that the agent might visualise a black grid with one white pixel, which the network responds to by outputting a sonic frequency. This could be contained as information in the interpretation vector used in the architecture. The next time step, as the previous output vector and interpretation vector are used to determine the new interpretation vector, this black grid with one white pixel might be converted to a completely black grid which the agent responds to by outputting another sonic frequency. Even though in this example two tones are used to communicate, it is easy to imagine what more advanced interpretation vectors can make the agent do. This notion of mental maps embedded in the interpretation vector has many uses. Next, performing arithmetic will be discussed.
- Arithmetic: suppose an agent feels the need to calculate how much food it would have if it had twice as much. A mental map may be embedded in the interpretation vector constructed that corresponds to a number of vertical sticks. The next time step (with unexciting new input and output vectors fed forward) the network processing the interpretation vector might recognize this pattern to be duplicated so that now there are twice as many sticks visualised in the interpretation vector. This vector can then be used in the next time step or written to the fractal memory network. One might think that the agent can not possibly know if it completed the action, but this information might also be included in the mental map. For example, it could be the case that there is a red pixel that will turn green upon the transformation of the mental map of sticks to its double representation.
Still, this architecture should not be seen as the only possible one. For example, trials can also be performed where the memory vector used at the particular time step is concatenated to the vector that is submitted to the memory writing network, as this might contain useful information, or experiments can be run to see how attention and other implementations compare to the one proposed in this article.
8 Concluding Remarks
In the end, experiments will need to be run in order to be able to show if some of the results can be achieved that this article suggests might be observed. Before the full experiment proposed in this article will be chosen to be carried out, perhaps it will be best to experiment with the Fractal Memory system or even the full brain architecture and genetic algorithm in environments requiring less computing power. As the ideas laid out in this article are not set in stone, these experiments might improve on these so that better results will be achieved in the end.
[1] Nick Bostrom. Superintelligence. Oxford University Press, 19–20, 2014. [2] Shane Legg and Marcus Hutter. A Collection of Definitions of Intelligence. 17–24, 2007. [3] Yuval Noah Harari. Sapiens: A Brief History of Humankind. Vintage Books London, 2014. [4] Josh Miller-Watt. Minecraft beginner’s guide. GamesRadar, 2012.