The future of fast inference | Morgan Rockett | TEDxBoston

Thu May 8, 2025

This talk covers the specialized hardware that Large Language Models (LLMs) run on, as well as and the current limits of context windows and inference speeds. Then, a selective chunking method is introduced for fast summarization with unlimited file size. Included is a demo which summarizes tens of thousands of pages from the JFK files in seconds, highlighting the power of performant algorithms and the blazing inference speed of Cerebras. Societal implications are discussed to conclude the talk.AI, Algorithm, Big Data, Computers, Engineering, Future, Intelligence, Internet, Machine Learning, Math, Software, Technology, Web Algorithms • Optimization • Parallel Programming • Python (Programming Language) • Artificial Intelligence (AI) This talk was given at a TEDx event using the TED conference format but independently organized by a local community. Learn more at https://www.ted.com/tedx

More from TED

View all

1-6 of 50