The software market will be nearly three times larger than the hardware market by 2026, and Intel CEO Pat Gelsinger didn’t miss that at this month’s Hot Chips conference.
Software will drive hardware development, especially chips, as complex systems drive the insatiable demand for computing power, Gelsinger said during his keynote address at the conference.
“Software has become much more important. We have to treat those as the sacred interfaces that we have, rather than house hardware, and that’s where silicon has to fit in,” Gelsinger said.
The importance of software in silicon development saw a resurgence at the Hot Chips conference. Many chip designs presented are developed in the concept of hardware-software co-design, which emerged in the 1990s to ensure “attainment of system-level objectives by exploiting the synergy of hardware and software through their simultaneous design”, according to a paper published by IEEE in 1997.
Software is moving the industry forward with new computing styles such as AI, and chipmakers are now taking a software-first approach to hardware development to support the new applications.
The idea that software drives hardware development isn’t new, but it has been revived in the age of workload accelerators, said Kevin Krewell, an analyst at Tirias Research.
“We’ve had FPGAs since the 1980s, and that’s software-defined hardware. The more modern interpretation is that the hardware is an amorphous collection of hardware blocks that are orchestrated by a compiler to run some workloads efficiently, without a lot of external control hardware,” Krewell says.
Chip designers are co-optimizing hardware and software to break down the walls between software tasks and the hardware they run on, with the goal of achieving greater efficiency.
“It’s popular again today because of the slowdown of Moore’s Law, improvements in transistor speed and efficiency, and improved software compiler technologies,” Krewell said.
Intel is trying to keep up with the software’s insatiable computing demand by developing new types of chips that can scale the computer in the future.
“People are developing software and silicon has to get under it,” Gelsinger said.
He added that chip designers “must also consider the composition of the critical software components that go with them, and that combination, that co-optimization of software.” [and] hardware essentially becomes the path to being able to bring such complex systems.”
Gelsinger said software indirectly defines Intel’s foundry strategy and the capabilities for the factories to develop newer types of chips that cram multiple accelerators into one package.
For example, Intel has placed 47 computer tiles — also known as chiplets — including GPUs in a codenamed Ponte Vecchio, which is for high-performance computing applications. Intel supports the Universal Chiplet Interconnect Express (UCIe) protocol for die-to-die communication in chiplets.
“We will have to do co-optimisations in the hardware and software domain. Also about multiple chiplets – how they play together,” Gelsinger said.
A new class of EDA tools is needed to build chips for systems at scale, Gelsinger said.
Intel also shed some light on its “software-defined, silicon-enhanced” strategy, linking it closely to its long-term strategy to become a chip manufacturer. The goal is to plug middleware into the cloud that is enhanced by silicon. Intel proposes subscription features to unlock the middleware and silicon that increase speed.
Software can make data center infrastructure flexible and intelligent through a new generation of smartNICs and DPUs, which are compute-intensive chips with network and storage components.
The hardware architecture of networking is at a watershed, with software-defined networking and storage features shaping hardware design, said Jaideep Dastidar, senior fellow at AMD, presenting at the Hot Chips conference.
AMD talked about the 400G Adaptive smartNIC, which includes software-defined cores and fixed function logic such as ASICs to process and transfer data.
Software elements help these chips handle a diverse range of workloads, including on-chip computing that is offloaded from CPUs. The software also gives these chips the flexibility to adapt to new standards and applications.
“We decided to take the traditional hardware-software co-design paradigm and extend it to hardware-software programmable-logic co-design,” Dastidar said.
The chip has added ASIC to programmable logic adapters, where one can make adjustments, such as custom header extensions, or add or remove new accelerator functions. The program logic adapters – which can be FPGAs that define ASIC functions – can also perform full custom data plane offload.
The 400G Adaptive smartNIC also has programmable logic agents for interacting with the embedded processing subsystem. The chip has software to program logical adapter interfaces to create coherent IO agents for interacting with embedded processor subsystems, which can be modified to run the network control plane. Software allows dataplane applications to run entirely in the ASIC or the programmable logic, or both.
AI chip company Groq has designed an AI chip in which software takes control of the chip. The chip hands over chip management to the compiler, which controls hardware functions, code execution, data movement, and other tasks.
The Tensor Streaming Processor Architecture includes integrated software control units at strategic points to send instructions to hardware.
Groq has uprooted conventional chip designs, re-examined hardware-software interfaces and designed a chip with AI-like software control to handle chip operations. The compiler can reason about the correctness and plan instructions on the hardware.
“We explicitly give control to the software, especially the compiler, so that it can reason about the correctness and plan instructions on the hardware,” Abts said.
Groq used AI techniques — making decisions based on patterns identified in data from probabilities and associations — to determine hardware functionality. This is different from conventional computing, where decisions are made logically, which can lead to waste.
“It wasn’t about abstracting the details of the hardware. It’s about explicitly checking the underlying hardware, and the compiler has a regular view of what the hardware is doing in a given cycle,” Abts said.
Systems are becoming more and more complex, with tens of thousands of CPUs, GPUs, smartNICs, FPGAs, connecting to heterogeneous computing environments. Each of these chips has a different profile in response time, latency and variation, which could slow down large-scale applications.
“Anything that requires a coordinated effort across the machine is ultimately limited by worst-case latency across the network. What we’ve done is try to prevent some of this waste, fraud and abuse that shows up at the system level,” Abts said.
Abts gave an example of a traditional RDMA request, where issuing a read-to-destination typically results in a memory transaction, which then streams the response back across the network where it can be used later.
“A much simplified version of this is where the compiler knows the address being read. And the data is simply pushed across the network when it’s needed, so it can be consumed at the source. This makes for a much more efficient network transaction with fewer messages on the network and less overhead,” said Abts.
The concept of spatial awareness has appeared in many presentations, meaning that the distance that data has to travel is reduced. Proximity between chips and memory or storage has been a common thread in AI chip designs.
Groq has made fine-grained changes to the basic chip design by decoupling primary compute units in CPUs, such as integer and vector execution cores, and bundling them into separate groups. The proximity speeds up the processing of integers or vectors, which are used for basic computing and AI tasks.
The reshuffle has a lot to do with how data travels between processors in AI, Abts said.