Skip to content

Howard Shan

Building technical scarcity in AI Infrastructure.

I am focused on LLM inference systems, CUDA kernels, source code reading, and high-performance computing.

Current Focus

  • LLM inference systems
  • SGLang / vLLM source code reading
  • CUDA / Triton kernel optimization
  • FlashAttention and attention optimization
  • Distributed inference and serving systems

Latest Notes