[Afternoon session] vLLM Asia Developer Day – SGInnovate Tech Talk

Overview

Due to the limited capacity for this event, participants who have successfully registered will receive a confirmation of registration.

Please lookout for a confirmation of registration which will be sent directly to your email. (Note: Only participants who have received this confirmation will be able to attend this event).

Join us for the Inaugural vLLM Asia Developer Day in Singapore, a premier gathering of AI developers, researchers, and industry experts shaping the future of large language model (LLM) inference. Co-organized together with SGInnovate, AMD, Embedded LLM, and vLLM, this event is designed to bring together the brightest minds in AI, providing deep technical insights, hands-on learning, and unparalleled networking opportunities.

This afternoon session for the vLLM Asia Developer Day will include Technical Talks, Deep Dive Sessions, and a Hands-On Workshop.

If you plan to attend multiple sessions at the vLLM Asia Developer Day, please use the Full-Day Registration Form instead.

Course Description & Learning Outcomes

Target audience:

Software Developers, Data Scientists, AI Engineers and AI Enthusiasts focused on high-performance computing and model deployment.

Also, particularly relevant for professionals working with LLM infrastructures, inference optimization, PyTorch, vLLM, GPU kernels, OpenAI Triton, ROCm, Quantization, RDMA, parallel file systems, and related technologies.

By attending this afternoon session, you will learn to:

Demystify Advanced LLM Inference
Whether you are new to LLMs or an experienced practitioner, explore optimization techniques from fundamentals to cutting-edge implementations like MLA decoding and efficient memory management.
Bridge Theory and Practice
Connect theoretical concepts with real-world applications through interactive demonstrations and hands-on sessions suitable for all experience levels.
Scale Your LLM Infrastructure
Learn practical approaches to deploying and scaling LLM systems - from desktop single GPU setups to distributed multi-node architectures using modern parallelism strategies.
Optimize for Production
Understand the performance-cost tradeoffs in LLM deployment and discover techniques to improve throughput, reduce latency, and maximize hardware efficiency in real-world scenarios.

Recommended Prerequisites

GitHub Account (github.com)

Discord Account

Basic Python Knowledge

Pre-course instructions

Bring a laptop for the hands-on workshop session

Laptop & SSH Client:
- Windows: PuTTY (putty.org)
- macOS/Linux: Built-in

Schedule

Date: 03 Apr 2025, Thursday
Time: 1:30 PM - 6:30 PM (GMT +8:00) Kuala Lumpur, Singapore
Location: 32 Carpenter Street, 059911

Agenda

Day/Time	Agenda Activity/Description
1:00pm – 1:30pm	Doors Open and Check-In Registration
1:30pm – 3:30pm	Technical Talks and Deep-Dive AMD AI SW Introduction & LLMs Optimization with vLLM for AMD GPUs by George Wang DeepSeek R1 Inference Optimizations Case Study by Bruce Xue vLLM ROCm New Features by Haichen Zhang
3:30pm – 4:00pm	Networking Break Continue conversations with peers and speakers over food and refreshments
4:00pm – 6:30pm	Developer’s Hands-On Technical Workshop: From Zero to Production by Tun Jian, Tan and Pin Siang, Tan (Dr) Deploy Optimized LLMs with vLLM on AMD (FREE Access to use of AMD GPU) Build GenAI use cases on JamAI Base

Skills Covered

PROFICIENCY LEVEL GUIDE
Beginner: Introduce the subject matter without the need to have any prerequisites.
Proficient: Requires learners to have prior knowledge of the subject.
Expert: Involves advanced and more complex understanding of the subject.

AI (Proficiency level: Proficient)
LLM (Proficiency level: Proficient)

Speakers

Trainer's Profile:

Tun Jian, Tan, LLM Principal Engineer, vLLM Committer, vLLM

Tun Jian, Tan is a key contributor to the advancement of open-source LLM inference on AMD platforms. His work with the vLLM project, including leading the very first PR to expand support to AMD and integrating PTPC-FP8 quantization, has dramatically improved performance on AMD ROCm. He is also a key contributor to the vLLM blog, sharing best practices for maximizing efficiency on the AMD MI300X. Beyond vLLM, Tun Jian collaborated with the LinkedIn/ Liger-Kernel team to bring significant performance improvements to LLM training on AMD GPUs.

Trainer's Profile:

George, Wang, Director, AI Software Product Engineering, AMD

George Wang is Director of AI Software Product Engineering in the AI Group at AMD, where he leads a talented team for AI software solutions, product management, and end-to-end performance optimizations across Data Center, Client, and Edge/Endpoint applications, driving cutting-edge AI capabilities for AMD’s customers, developers, and the broader community. George Issued two U.S. technical patents in the past. George holds master’s degree and with over 20 years of experience in the technology industry.

Trainer's Profile:

Bruce, Xue, AI Product Application Engineer, AMD

Bruce Xue is the AI Product Application Engineer at AMD. Expertise in developing system-level applications leveraging AMD Instinct GPUs and deploying inference solutions for large language models, specifically optimizing high-performance libraries and integrating essential operators within the ROCm ecosystem. Proficient in adapting inference frameworks, including vllm and sglang, to harness the full computational power of AMD GPUs for the seamless deployment of large language models.

Trainer's Profile:

Pin Siang, Tan (Dr), Co-Founder and CTO, Embedded LLM

Dr. Tan Pin Siang, Co-Founder and Chief Technology Officer of Embedded LLM, has over 14 years of in-depth, hands-on experience, spanning deep learning, Generative AI, computer vision, and geospatial analysis. He leads Embedded LLM's technical direction, focusing on optimizing Large Language Model (LLM) inference for unparalleled performance and scalability. His extensive project portfolio showcases his ability to leverage cutting-edge AI to solve complex, real-world problems for government agencies and major corporations across diverse sectors. Dr. Tan is deeply committed to advancing the field, making sophisticated AI solutions practical and accessible, and actively shaping the future of how AI is deployed and utilized.

Trainer's Profile:

Haichen, Zhang, Senior PM AI Product Marketing, AMD

Haichen is the Senior PM for AMD AI Product Marketing, specializing in accelerating training and inference for large language models, recommender systems, computer vision (CV), and natural language processing (NLP) tailored to internet customers. Prior to joining AMD, Haichen worked at NVIDIA as a Machine Learning Architect for four years.