Cerebras Systems and AWS are planning to launch a new, more powerful AI inference offering. At the core of this technology is a massive chip whose development is linked to an EPFL diploma.
Cerebras Wafer Scale Engine is over fifty times larger than the largest graphics chips on the market. (Source: Cerebras Systems)
AWS and Cerebras Systems announce a collaboration to deliver a faster AI generative inference solution in the coming months. It will be integrated into Amazon Bedrock and will be based on Cerebras’ CS-3 systems.
This partnership comes in a context of increasing usage of agent-based AI, as highlighted by Cerebras in its statement. They note that these usages, particularly in software development, generate up to fifteen times more tokens than conversational interactions. This shift increases the demand for fast inference capabilities, which have become critical for some production environments.
Cerebras states that its systems can achieve up to 3000 tokens per second. This performance is based on its Wafer-Scale Engine (WSE) architecture, with the WSE-3 version being touted as the largest AI processor. The company claims that this chip is 56 times larger than the largest GPUs and enables training and inference performance over 20 times higher, with reduced energy consumption per calculation unit.
The WSE partly owes its origin to Switzerland: Jean-Philippe Fricker, co-founder and Chief System Architect of Cerebras Systems, is an EPFL graduate.
The partnership with AWS also includes the development of a so-called “disaggregated” inference architecture. This distributes the workloads between AWS’s Trainium chips, dedicated to preprocessing, and Cerebras systems.
The financial terms of the partnership with AWS have not been disclosed. However, it is worth noting that Cerebras recently struck a deal with OpenAI, valued at potentially over 10 billion dollars, according to Reuters.






