neuromorphic computing

3 days ago
11 min read

Artificial intelligence can arguably be placed as the most profound technological revolution till date which allows us to leapfrog from deterministic computing to stochastic intelligence. While artificial intelligence is increasingly able to produce and at times exceed human level intelligence capabilities, the intelligence architecture for artificial intelligence is very different when compared to biological intelligence resulting in massive differences in energy efficiency, computational efficiency and dynamic learning capabilities. The foundational element of artificial intelligence is a large set of numerical parameters or weights which can be viewed as a very massive mathematical filter that processes each unit input signal by performing a large set of calculations on it to produce a unit of intelligent output signal. The intelligence capability for a mathematical model of artificial intelligence is therefore decided by the scope and scale of numerical parameters or weights contained within its mathematical filter. So a highly capable artificial intelligence model will have over a trillion numerical parameters or weights where each unit input signal has to undergo over a trillion calculations just to produce a single unit of output signal. This creates massive computation requirements for the functioning of artificial models mandating a specialized hardware architecture like advanced graphical process units (GPUs) to seamlessly perform a large amount of calculations within a short amount of time. GPUs have many identical processing cores that enable the vast independent and therefore parallelly possible calculations needed for servicing artificial intelligence model computations. While advanced GPUs seamlessly enable production-scale deployment of artificial models across the world, there are few catches - GPUs consume massive amount of energy partly due to the massive computational requirements just to produce a single output unit of intelligence and largely due to the inefficient architectural bottleneck of needing to repeatedly stream all the weights from a separate memory space into the computational cores for each cycle of computation. Modern computing as we know today is built around the Von-Neumann architecture named after the famous mathematician and computer scientist Jon Von-Neumann. In the Von-Neumann architecture, the active memory of a computing device which is needed for performing computations is kept at a location which is physically separate from the space where the actual computation takes place. In each computation cycle, only a small set of information which sits in a large queue within a separate memory space is loaded into the computation space and the computed information is then loaded back into the memory space following which the next small set of information in the queue is loaded from the memory space into the computation space for the next computation cycle. The Von-Neumann architecture thereby mandates additional energy expenditure outside of the computational energy requirements, just to facilitate the back and forth transfer of information between the computation space and the active memory space which results in massive energy-related inefficiencies especially for artificial intelligence models which constantly require high-bandwidth back and forth streaming of information like weights and contextual memory from the active memory space (VRAM) to the computational cores within the GPU. The computational output of an artificial intelligence model at the inference stage is largely constrained by the storage capacity and the data transfer bandwidth of the active memory space rather than the computational speed and capacity of the computation space. A trillion parameter artificial intelligence model where the weights alone require 2 TB ( 2 bytes per weight) of active memory storage has to be split across a cluster of multiple GPUs as a single high-end GPU has only usable VRAM storage capacity of ~180GB even though it has exceedingly high computational capacity of around 4 quadrillion calculations per second. There needs to be sufficient allocation of active memory space of the GPU clusters for contextual memory alongside the parameters or weights memory so that the artificial intelligence can produce context-specific intelligence. Therefore, a single copy of a trillion parameter model ends up requiring 20+ high-end GPUs where each GPU requires over 1 kW of power and therefore requiring a total of 30+ kW (if we factor in net energy efficiency and additional cooling-related energy expenditure) which can service multiple users to increase utilization depending on the complexity of user tasks and contextual memory requirement per user. This highlights the massive hardware and energy requirement just to host a single highly capable artificial intelligence brain. Now shifting to biological intelligence, we enter into a very different paradigm of intelligence. The human brain has 86 billions neurons with a very rough average of 1000+ synaptic connections per neuron for the sake of calculation which loosely translates into a 100 trillion parameter intelligence model requiring power consumption of only 20 W. The scale difference in both parameter count and energy consumption between artificial and biological intelligence is extremely drastic and thereby a very interesting area of study. The current scale of artificial intelligence models peak around only a few trillion parameters but if we were to host a single copy of a hypothetical 100 trillion parameter artificial intelligence model with context-specific intelligence capabilities, we would require a cluster of 2300+ high-end GPUs which have a total power requirement of around 5 MW. While equating of parameter count might not be a completely appropriate way of comparing like-to-like intelligence levels of artificial and biological intelligence it still helps give us some sense of the scale difference of energy requirements for servicing artificial intelligence versus servicing biological intelligence. By this rough extrapolation, a single copy of a 100 trillion parameter artificial intelligence requires 250000 times more power consumption as compared to a single human brain to function. This scale difference is massive and is an unfortunate downside of artificial intelligence which provides high utility but at the cost of being extremely power hungry. Usable energy is a scarce resource universally required across a multitude of critical utilities outside of artificial intelligence making energy a massive bottleneck in the scaling of artificial intelligence due to which present supply is disproportionately low in comparison to demand thereby preventing us from realizing the full potential of artificial intelligence. Engineers are trying to solve this bottleneck from two different vantage points i.e. energy supply and compute efficiency. Tackling energy supply involves creating systems where usable energy can be easily harnessed and accessed for artificial intelligence computation and this is why the idea of orbital AI data centers in space is being explored as direct solar energy from the sun is extremely abundant in space and easy to harness at very little cost which alleviates the energy bottleneck while still presenting some new engineering and economic challenges like poor heat conduction within space making it harder to cool the data centers, high costs for launching data centers into space and the inability to service them in case of physical failure while operating in space. Tackling compute efficiency is the other crucial vantage point for solving the energy bottleneck where harnessed energy is utilized in the most efficient manner allowing us to increase useful intelligence provided per unit energy consumed. Engineers have devised many steps to increase compute efficiency like quantizing model weights thereby reducing active memory and bandwidth required to store and transfer weights to and from the compute space; intelligent routing of user tasks where simple tasks are routed to small parameter model and complex tasks are routed are to large parameter models; mixture of experts (MoE) architectures which is another mechanism for sub-routing complex tasks within a large parameter model where a large model is broken down into a set of experts of smaller parameter sizes and depending on the nature of the complex task subsequent routing takes place within the large model to the relevant expert; and hardware improvements like increasing VRAM capacity and data transfer bandwidth within GPUs. These steps to increase compute efficiency are however limited to the computing paradigm of Von-Neumann architecture thereby creating a ceiling on the maximum compute efficiency that can be delivered due to inherent architectural constraints. To bypass the constraints of Von-Neumann architecture, scientists and engineers are working on an entirely new paradigm of computing called neuromorphic computing which is entirely modelled after the human brain. The architecture of the human brain is an extremely fascinating area of study and a massive source of engineering inspiration to unlock powerful artificial intelligence capabilities at peak energy efficiency. Neurons are the fundamental cellular units of any biological brain and intelligence is largely a function of the scope and scale of neuron density and neuron connectivity. A single neuron can connect to multiple other neurons and a synapse is the junction where any two neurons connect with each other. Biological intelligence is a high level emergent phenomenon derived from continuous low-level neuron firings and inter-neuron communication. Neurons communicate through transfer of weak electrical signals whose conduction across neurons depends on the synaptic weights of the synapses within the neurons where synaptic weights can be viewed as the most fundamental unit of biological intelligence similar to how numerical weights serve as the fundamental unit of artificial intelligence. Each synapse is characterized by its synaptic weight which can be conceptually understood as the electrical conductance of a synapse within a particular neuron. Neurons communicate through weak electrical signals which is a result of the flow of positively charged potassium/sodium ions rather than flow of electrons as conventionally observed in electrical signal flow through metallic conductors. A potassium/sodium pump exists on every neuron which consumes ATP (biological unit of energy) to expel 3 sodium ions (Na+) out of the neuron cell and intake 2 potassium ions (K+) into the neuron cell effectively creating a negative potential difference (-70 mV) inside the neuron and a gate lock preventing the flow of sodium ions into neuron cell membrane. If an adjacent neuron starts firing, depending on the synaptic weights of the inter-neuron connections the voltage lock gets lifted accordingly allowing sodium ions to enter into the cell membrane of the neurons thereby reducing the potential difference. When the potential difference reduces and crosses a certain voltage threshold (-55 mV) within the interconnected neurons, these interconnected neurons also start firing and transmits the electrical signal further to each of their respective interconnected neurons and the process repeats for the next of set of interconnected neurons. The synaptic weights of each neuron connections which is an indicator of conductance decides which neuron connection can cross the potential difference threshold required for firing along with frequency of neuron wiring once the threshold is crossed. Synaptic weights which have high conductance can cross the potential threshold more easily resulting in seamless and higher frequency inter-neuron firings. High frequency inter-neuron firings enables a real timing learning effect which can strengthen the respective synapses and thereby increase the corresponding synaptic weights or conductance. This is a very beautiful aspect of biological intelligence where the fundamental unit of intelligence i.e. weights can be updated in real time whereas in the case of artificial intelligence, the weights are frozen and require a separate long and energy intensive training process in order to update the weights. The human brain has over 100 trillion synaptic weights which are capable of continuously getting updated in real time. The capabilities and intelligence level among different humans can be analyzed through the nature of their respective inter-neuron connections and corresponding synaptic weights. A highly focused human brain possesses few but highly efficient and powerful neuron connections with large synaptic weights, high inter-neuron connection density and robust long-range low-leakage inter-neuron signal transfer within the prefrontal cortex all of which collectively enable deep processing of signal i.e. necessary information hereas unnecessary neuron connections which corresponds to processing noise i.e. unnecessary information are pruned by reducing their synaptic weights thereby preventing unnecessary unsynchronized neuron firings across the brain which unnecessarily drain energy. A highly focused brain operates through extremely sparse synchronous firing of few powerful neuron connections where disparate unnecessary neuron firings which is typically observed within an overstimulated or distracted brain get silenced resulting in extreme energy efficiency making complex information processing extremely efficient and less energy intensive ensuring that the limited energy budget which gets allocated to the brain is used for effectively processing only signal and not noise. Dopamine plays an important role where a high baseline level of dopamine and dopamine receptors catalyzes synaptic fine tuning of necessary neuron connections and synaptic pruning of unnecessary neuron connections along with enablement signaling of deep synchronous firing of the necessary sparse neuron connections and inhibition signaling of unnecessary neuron firings. High inter-neuron connection density and long-range low-leakage inter-neuron signal transfer within in the necessary sparse neuron connections which along with providing high computational efficiency also enables deep lateral processing capabilities. Sparse neuron firing is a key differentiating property of the human brain and a core driver for the energy efficiency observed within biological intelligence when compared to artificial intelligence as only extremely small subsets of the total synaptic weights of human brain gets utilized at any point in time depending on the nature of the input signal being processed in contrast to artificial intelligence where all weights get utilized for processing the input signal irrespective of the nature of the input signal. Mixture of Experts (MoE) architecture utilized in large parameter artificial intelligence models to compute efficiency is loosely inspired from the sparse neuron firing mechanism utilized by the human brain. Another key differentiating property of biological intelligence is the co-location of memory space and computation space in contrast to the current paradigm of artificial intelligence which uses Von-Neumann architecture which physically separates the memory space from the computation space resulting in energy loss due to inefficient back and forth electrical signal transfer between the physically separated memory space and computation space whereas in biological intelligence, the computation happens within the memory space itself. Information signals in biological intelligence is encoded based on the properties of neuron firings where a certain pattern of neuron firings like high frequency firing of a single neuron or collective firing of multiple neurons at a certain frequency correspond to different information signals in contrast to artificial intelligence signal encoding where information signals where it be text, audio or images is broken into a set of tokens where each token is projected into high dimensional vector space which possesses unique mathematical meaning differentiating one token of information from the other. In artificial intelligence, the intelligence arises from performing a large set of fixed mathematical calculations using all the frozen weights developed during a separate training process on an information token which is essentially a long mathematical vector to produce a different mathematical vector which translates into a different information token. In biological intelligence, the intelligence computation is complex and not as straightforward as observed in the case of artificial intelligence. Information signals which are encoded through varied neuron firings patterns passes through the different neuron synapses of a neuron and depending on the synaptic weights of each inter-neuron connection that signal propagates into the respective connected neurons. The synaptic weights accordingly alter the neuron firing patterns when transferring from one neuron to the other depending on the value of their weights thereby performing a certain computation of sorts resulting in a change in the information signal when moving from one neuron to another. This pattern of signal propagation and subsequent information alteration repeats across multiple synapses where each synapse possesses a synaptic weight i.e. memory and enables a certain computation within the memory space itself ultimately producing a useful output information signal. Neuromorphic computing is a new paradigm of intelligent computing which mimics the architecture of biological intelligence where engineers uses an electrical unit known as memresistors to emulate the synaptic weights of neurons. Memresistors like neurons also require a threshold activation potential to enable current flow and its conductance can be modified depending on the magnitude and flow of electric charge much like a synaptic weight allowing for dynamic artificial intelligence where weights are not frozen but can change dynamically in real-time depending on the information signals flowing through it. Information is encoded according to the voltage frequency of an analog electric signal where the flow of signals across a mesh of memresistors produces an information signal of certain voltage frequency which translates to the intelligent output of the neuromorphic computing mesh. Neuromorphic computing however possesses certain engineering challenges which makes it harder for it to enter into mainstream intelligence computing. Conventional computing and artificial intelligence operates purely on digital electrical signals whereas neuromorphic computing requires precise analog signals to encode information and the present process of both digital-to-analog conversion and analog-to-digital conversion experiences power leakage significantly reducing the energy efficiency gains of neuromorphic computing. Due to charge leakage within memresistors as a result of external environmental conditions, precise conductances values within memresistors are hard to maintain thereby reducing the precision of overall computational output of the neuromorphic computing mesh which is not acceptable for language-based intelligence where precise mathematical outputs are a fundamental necessity. Neuromorphic computing however is presently useful in others domains of intelligence like sensory intelligence where the precision levels of language-based intelligence is not required. Addressing the current engineering bottlenecks for neuromorphic computing can be a massive step forward in potentially supercharging wide scale energy-efficient artificial intelligence.