Howard Shan

Building technical scarcity in AI Infrastructure.

I am focused on LLM inference systems, CUDA kernels, source code reading, and high-performance computing.

Current Focus

LLM inference systems
SGLang / vLLM source code reading
CUDA / Triton kernel optimization
FlashAttention and attention optimization
Distributed inference and serving systems

Latest Notes