We develop architectures and system-level optimization techniques for AI workloads, including large language models, hybrid Transformer–SSM models, mixture-of-experts models, and edge AI applications. Our research explores workload-driven accelerator design, runtime scheduling, memory-system optimization, and hardware/software co-design for efficient inference and training.