Researchers have developed a new artificial intelligence (AI) chip that mimics the human brain for more energy-efficient speech recognition.
The analog AI chip uses phase-change memory devices to perform computations in parallel, dramatically reducing power consumption.
In tests, the chip achieved up to 12.4 trillion operations per second per watt, 14 times more energy-efficient than conventional AI systems.
The brain-inspired chip could enable speech recognition with very low latency on small, portable devices.
Key facts:
- The analog AI chip has 34 million phase-change memory devices to perform parallel computations.
- It achieved software-equivalent accuracy on a keyword spotting task and near-software accuracy on a large 45 million parameter speech recognition model.
- The chip delivered up to 12.4 tera-operations per second per watt, 14x more efficient than top systems.
- It processed speech 2,500 times faster than real-time, enabling low-latency recognition.
- The technology mimics neural connections in the brain for massively parallel computing.
Source: Nature 620, 768-775 (2023)
Brain-Inspired Computing
The human brain is extremely power efficient for complex computations like visual and speech processing.
So researchers have been trying to mimic the brain’s computing principles in AI chips.
Brains have a massively parallel structure, with neurons and synapses performing many simple computations simultaneously.
New “neuromorphic” chips aim to replicate this parallel architecture using advanced hardware materials.
One approach is using phase-change memory, which can mimic the analog conductance states of biological synapses.
When arranged in crossbar arrays, phase-change devices can perform parallel matrix operations just like neural networks.
This “in-memory computing” avoids moving data back and forth from separate processors, saving huge amounts of energy.
The new analog AI chip developed by researchers at IBM and partners contains 35 million phase-change memory devices.
The devices are integrated into 34 tiles, each with a 512×2048 crossbar array for parallel computing.
The tiles are interconnected by a 2D mesh network for efficient data transfer, mimicking connections between neurons.
Massively Parallel Speech Recognition
The researchers tested the brain-inspired chip on speech recognition tasks.
Speech recognition involves converting audio signals of human speech into text transcriptions.
State-of-the-art speech recognition systems use deep neural networks with billions of parameters to achieve high accuracy.
But these huge neural networks are computationally demanding, requiring graphics or CPU processors that consume lots of power.
The analog AI chip aims to achieve the same accuracy levels much more efficiently using parallel phase-change memory computing.
The researchers first tested a simple keyword spotting network to detect 12 key phrases like “Yes” and “No.”
The model had 129,000 parameters mapped to phase-change devices.
The chip achieved equivalent accuracy to software implementations and processed speech 700x faster than real-time.
Next the researchers implemented a more complex 45 million parameter recurrent neural network transducer (RNN-T) for speech-to-text transcription.
They mapped 140 million phase-change devices across 5 separate chips to represent the model.
The large RNN-T model achieved 98.1% of the accuracy of the original software-only network.
And the multi-chip system processed speech 2,500x faster than real-time, enabling low-latency speech recognition.
Ultra-Efficient Parallel Computing
The analog AI chip delivered unprecedented energy efficiency for neural network inference.
The tile arrays achieved up to 20 trillion operations per second per watt for matrix computations.
Accounting for data transfers between tiles and conversions between analog and digital, the overall chip efficiency reached 12.4 TOPS/W.
That’s 14x better efficiency than the most advanced AI accelerators on the MLPerf benchmark.
The efficiency gains come from computing in place with phase-change memory.
There’s no need to shuttle data back and forth from separate processors and memory.
The massively parallel architecture minimizes wasted computation.
And phase-change devices can encode and store weights, avoiding a costly weight reload.
The tiles only need a one-time weight programming before execution.
Together these advantages add up to huge efficiency gains over conventional systems.
Next Steps Towards Commercialization
The experimental results demonstrate an AI chip achieving unprecedented efficiency gains through brain-inspired computing principles.
But there are still challenges to address for commercial products.
One is supporting larger neural network models. The current 5-chip system already encoded 45 million parameters.
But the most advanced AI models today have billions of parameters.
Larger capacity will require further integration, potentially stacking phase-change layers or using multi-chip packages.
Another need is on-chip digital processing, which currently happens off-chip.
Adding digital cores next to the analog tiles could minimize data transfers and further boost efficiency.
Mixed signal chips with analog and digital domains on the same silicon die are standard in commercial products.
Finally, the team must ensure sufficient model accuracy, speed, and energy gains are maintained as the technology scales up.
There are always challenges moving from lab prototypes to high-volume manufacturing.
But the brain-inspired computing principles demonstrated have strong commercialization potential.
The efficiency breakthroughs could enable speech interfaces with extremely low latency for applications like voice assistants and real-time translation.
And tiny, low-power devices like hearing aids and wearables could benefit from on-device speech recognition.
By mimicking the energy-efficient computing in our own brains, neuromorphic chips promise to transform the way AI processes data on local devices.
References
- Study: An analog AI chip for energy-efficient speech recognition and transcription
- Authors: S. Ambrogio et al. (2023)