As AI models grow in complexity and hardware evolves to meet the demand, the software layer connecting the two must also adapt. We recently sat down with Stephen Jones, a Distinguished Engineer at NVIDIA and one of the original architects of CUDA.
Jones, whose background spans from fluid mechanics to aerospace engineering, offered deep insights into NVIDIA’s latest software innovations, including the shift toward tile-based programming, the introduction of “Green Contexts,” and how AI is rewriting the rules of code development.
Here are the key takeaways from our conversation.
The Shift to Tile-Based Abstraction
For years, CUDA programming has revolved around a hierarchy of grids, blocks, and threads. With the latest updates, NVIDIA is introducing a higher level of abstraction: CUDA Tile.
According to Jones, this new approach allows developers to program directly to arrays and tensors rather than managing individual threads. “It extends the existing CUDA,” Jones explained. “What we’ve done is we’ve added a way to talk about and program directly to arrays, tensors, vectors of data… allowing the language and the compiler to see what the high-level data was that you’re operating on opened up a whole realm of new optimizations”.
This shift is partly a response to the rapid evolution of hardware. As Tensor Cores become larger and denser to combat the slowing of Moore’s Law, the mapping of code to silicon becomes increasingly complex.
- Future-Proofing: Jones noted that by expressing programs as vector operations (e.g., Tensor A times Tensor B), the compiler takes on the heavy lifting of mapping data to the specific hardware generation.
- Stability: This ensures that program structure remains stable even as the underlying GPU architecture changes from Ampere to Hopper to Blackwell.
Python First, But Not Python Only
Recognizing that Python has become the lingua franca of Artificial Intelligence, NVIDIA launched CUDA Tile support with Python first. “Python’s the language of AI,” Jones stated, adding that an array-based representation is “much more natural to Python programmers” who are accustomed to NumPy.
However, performance purists need not worry. C++ support is arriving next year, maintaining NVIDIA’s philosophy that developers should be able to accelerate their code regardless of the language they choose.
“Green Contexts” and Reducing Latency
For engineers deploying Large Language Models (LLMs) in production, latency and jitter are critical concerns. Jones highlighted a new feature called Green Contexts, which allows for precise partitioning of the GPU.
“Green contexts lets you partition the GPU… into different sections,” Jones said. This allows developers to dedicate specific fractions of the GPU to different tasks, such as running pre-fill and decode operations simultaneously without them competing for resources. This micro-level specialization within a single GPU mirrors the disaggregation seen at the data center scale.
No Black Boxes: The Importance of Tooling
One of the pervasive fears regarding high-level abstractions is the loss of control. Jones, drawing on his experience as a CUDA user in the aerospace industry, emphasized that NVIDIA tools will never be black boxes.
“I really believe that the most important part of CUDA is the developer tools,” Jones affirmed. He assured developers that even when using tile-based abstractions, tools like Nsight Compute will allow inspection down to the individual machine language instructions and registers. “You’ve got to be able to tune and debug and optimize… it cannot be a black box,” he added.
Accelerating Time-to-Result
Ultimately, the goal of these updates is productivity. Jones described the objective as “left shifting” the performance curve, enabling developers to reach 80% of potential performance in a fraction of the time.
“If you can come to market [with] 80% of performance in a week instead of a month… then you’re spending the rest of your time just optimizing,” Jones explained. Crucially, this ease of use does not come at the cost of power; the new model still provides a path to 100% of the peak performance the silicon can offer.
Conclusion
As AI algorithms and scientific computing converge, NVIDIA is positioning CUDA not just as a low-level tool for hardware experts, but as a flexible platform that adapts to the needs of Python developers and HPC researchers alike. With support extending from Ampere to the upcoming Blackwell and Rubin architectures, these updates promise to streamline development across the entire GPU ecosystem.
For the full technical details on CUDA Tile and Green Contexts, visit the NVIDIA developer portal.

