Publications
Research Interests
Thematic Questions
- Do current programming paradigms align with the true world consisting of distributed memories (System RAM, GPU RAM, Networked)?
- Do we take advantage of spatial and temporal localities as well as we can?
- Are the higher levels of the system stack a limiting factor?
- Has the absence of locality-optimal thinking from typical programming gone too far?
- Should the programmer bear more or different responsibility?
- If we had a different interface + compiler could we find a natural way for programmers to optimize locality?
- Are the existing resource managers for clusters (such as SLURM, Kubernetes, or general cloud-infra) efficient, portable, and friendly enough to naturally support AI workloads?
- Accelerator-hungry workloads typically require substantial communication and synchronization to ensure correct computation. Shouldn't there be a natural way for indepedent workloads time/space share expensive accelerators...?
- The underlying mathematical formulas for many ML workloads can be reduced to GEMM / GEMV + a small set of well-defined functions (normalization, attention, etc.). It would be nice for end-users to be unburdened by the make/model of the number-crunching machine, and simply request "computation" as opposed to a specific piece of hardware. Shouldn't there be better abtractions to completely hide the backend, hardware-vendor facing details...?
- Is the fear of parallel programming inflated?
- With more and more data, and hungrier and hungrier algorithms, should/will knowing parallel programming techniques be necessary?
- If people were exposed to parallel programming earlier on in CS education, would the concepts be easier? (i.e. do people get stuck in their sequential mindset?)
- Does having a small circle of expert developers publish parallel computing libraries (e.g. cuBLAS, cuDNN, MIOpen, oneDNN, etc. => PyTorch, TensorFlow, and the like) have long-term negative externalities for effective programming practice?
- Don't we want people to understand how the machines operate (i.e. what software is fully doing) and not just "guess"...?
- How much more efficient would the world's software be if parallel programming paradigm was natural? How many cycles are wasted?
- Would an efficient data-flow system, render all these questions moot?