Interview: From CUDA to Tile-Based Programming: NVIDIA’s Stephen Jones on Building the Future of AI

As AI models grow in complexity and hardware evolves to meet the demand, the software layer connecting the two must also adapt. We recently sat down with Stephen Jones, a Distinguished Engineer at NVIDIA and one of the original architects of CUDA.

Jones, whose background spans from fluid mechanics to aerospace engineering, offered deep insights into NVIDIA’s latest software innovations, including the shift toward tile-based programming, the introduction of “Green Contexts,” and how AI is rewriting the rules of code development.

Here are the key takeaways from our conversation.

The Shift to Tile-Based Abstraction

For years, CUDA programming has revolved around a hierarchy of grids, blocks, and threads. With the latest updates, NVIDIA is introducing a higher level of abstraction: CUDA Tile.

According to Jones, this new approach allows developers to program directly to arrays and tensors rather than managing individual threads. “It extends the existing CUDA,” Jones explained. “What we’ve done is we’ve added a way to talk about and program directly to arrays, tensors, vectors of data… allowing the language and the compiler to see what the high-level data was that you’re operating on opened up a whole realm of new optimizations”.

This shift is partly a response to the rapid evolution of hardware. As Tensor Cores become larger and denser to combat the slowing of Moore’s Law, the mapping of code to silicon becomes increasingly complex.

Future-Proofing: Jones noted that by expressing programs as vector operations (e.g., Tensor A times Tensor B), the compiler takes on the heavy lifting of mapping data to the specific hardware generation.
Stability: This ensures that program structure remains stable even as the underlying GPU architecture changes from Ampere to Hopper to Blackwell.

Python First, But Not Python Only

Recognizing that Python has become the lingua franca of Artificial Intelligence, NVIDIA launched CUDA Tile support with Python first. “Python’s the language of AI,” Jones stated, adding that an array-based representation is “much more natural to Python programmers” who are accustomed to NumPy.

However, performance purists need not worry. C++ support is arriving next year, maintaining NVIDIA’s philosophy that developers should be able to accelerate their code regardless of the language they choose.

“Green Contexts” and Reducing Latency

For engineers deploying Large Language Models (LLMs) in production, latency and jitter are critical concerns. Jones highlighted a new feature called Green Contexts, which allows for precise partitioning of the GPU.

“Green contexts lets you partition the GPU… into different sections,” Jones said. This allows developers to dedicate specific fractions of the GPU to different tasks, such as running pre-fill and decode operations simultaneously without them competing for resources. This micro-level specialization within a single GPU mirrors the disaggregation seen at the data center scale.

No Black Boxes: The Importance of Tooling

One of the pervasive fears regarding high-level abstractions is the loss of control. Jones, drawing on his experience as a CUDA user in the aerospace industry, emphasized that NVIDIA tools will never be black boxes.

“I really believe that the most important part of CUDA is the developer tools,” Jones affirmed. He assured developers that even when using tile-based abstractions, tools like Nsight Compute will allow inspection down to the individual machine language instructions and registers. “You’ve got to be able to tune and debug and optimize… it cannot be a black box,” he added.

Accelerating Time-to-Result

Ultimately, the goal of these updates is productivity. Jones described the objective as “left shifting” the performance curve, enabling developers to reach 80% of potential performance in a fraction of the time.

“If you can come to market [with] 80% of performance in a week instead of a month… then you’re spending the rest of your time just optimizing,” Jones explained. Crucially, this ease of use does not come at the cost of power; the new model still provides a path to 100% of the peak performance the silicon can offer.

Conclusion

As AI algorithms and scientific computing converge, NVIDIA is positioning CUDA not just as a low-level tool for hardware experts, but as a flexible platform that adapts to the needs of Python developers and HPC researchers alike. With support extending from Ampere to the upcoming Blackwell and Rubin architectures, these updates promise to streamline development across the entire GPU ecosystem.

For the full technical details on CUDA Tile and Green Contexts, visit the NVIDIA developer portal.

Jean-marc is a successful AI business executive .He leads and accelerates growth for AI powered solutions and started a computer vision company in 2006. He is a recognized speaker at AI conferences and has an MBA from Stanford.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

Source link

Interview: From CUDA to Tile-Based Programming: NVIDIA’s Stephen Jones on Building the Future of AI

The Shift to Tile-Based Abstraction

Python First, But Not Python Only

“Green Contexts” and Reducing Latency

No Black Boxes: The Importance of Tooling

Accelerating Time-to-Result

Conclusion

Leave a Reply Cancel reply

Recent Posts

Recent Comments

You May also Like

Johnson Dodges Trump’s Crypto Dinner Guest List Questions

North Dakota is launching a stablecoin

US CPI Meets Estimates, Locks In 25Bps Cut As BTC Edges Lower

JPMorgan Enters Crypto Trading, Custody Still Off Limits

Company Information