IBM’s NorthPole chip – nearly a decade in the making – goes from strength to strength and has reached a new milestone: Researchers have published a series of fantastic benchmark results in the journal Science.
The 12nm chip, built on the TrueNorth architecture, is 25 times more energy efficient than commonly used 12nm GPUs and 14nm CPUs. This is consistent with tests on the ResNet-50 model and was measured as the number of frames interpreted per joule of power.
NorthPole is also much better in terms of latency and space required to compute, outperforming all major architectures, including a GPU implemented with a 4nm process, according to IBM.
Combining computer power with memory
How does she manage to achieve such results? The memory is on the chip itself and not connected separately – embedded in each of the 256 cores on the chip. NorthPole also contains 22 billion transistors and its cores can run 2,048 operators per core.
The architecture eliminates the Von Neumann bottleneck, which revolves around the delays caused by data having to travel between the CPU and RAM in most systems, according to the company. As a result, it can perform much faster than the best GPUs out there, including Nvidia’s best AI-focused graphics cards.
“Architecturally, NorthPole blurs the line between computing and memory,” said Dharmendra Modha of IBM Research. “At the level of individual cores, NorthPole appears as memory-on-computer and from outside the chip, at the level of input-output, it appears as active memory.”
AMD has also used the concept of combining memory and computing power on one component. Continuing the processor-in-memory (PIM) theme, Xilinx last month presented its Virtex XCVU7P card, which had eight accelerator-in-memory (AiM) modules.
IBM, which is adding memory to each computing core in its NorthPole chip, sees this component as perfect for emerging AI use cases, including computer vision-related uses. It was also tested on natural language processing and speech recognition. NorthPole is also suitable for edge applications that require massive amounts of data processing in real time.