[Full Day] Inaugural vLLM Asia Developer Day

Overview

Due to the limited capacity for this event, participants who have successfully registered will receive a confirmation of registration.

Please lookout for a confirmation of registration which will be sent directly to your email. (Note: Only participants who have received this confirmation will be able to attend this event).

Join us for the Inaugural vLLM Asia Developer Day in Singapore, a premier gathering of AI developers, researchers, and industry experts shaping the future of large language model (LLM) inference. Co-organized together with SGInnovate, AMD, Embedded LLM, and vLLM, this event is designed to bring together the brightest minds in AI, providing deep technical insights, hands-on learning, and unparalleled networking opportunities.

The vLLM Asia Developer Day includes three sessions:

Morning: vLLM Asia Community Launch, vLLM Updates & Technical Talks, plus a Lunch & Social Hour

Afternoon: Technical Talks, Deep Dive Sessions, and a Hands-On Workshop

Evening: Social Mixer & Networking

Course Description & Learning Outcomes

Target audience:

Software Developers, Data Scientists, AI Engineers and AI Enthusiasts focused on high-performance computing and model deployment.

Also, particularly relevant for professionals working with LLM infrastructures, inference optimization, PyTorch, vLLM, GPU kernels, OpenAI Triton, ROCm, Quantization, RDMA, parallel file systems, and related technologies.

By attending this full day event, you will learn to:

Morning Session

Demystify Advanced LLM Inference
Whether you are new to LLMs or an experienced practitioner, explore optimization techniques from fundamentals to cutting-edge implementations like MLA decoding and efficient memory management.

Afternoon Session

Bridge Theory and Practice
Connect theoretical concepts with real-world applications through interactive demonstrations and hands-on sessions suitable for all experience levels.
Scale Your LLM Infrastructure
Learn practical approaches to deploying and scaling LLM systems - from desktop single GPU setups to distributed multi-node architectures using modern parallelism strategies.
Optimize for Production
Understand the performance-cost tradeoffs in LLM deployment and discover techniques to improve throughput, reduce latency, and maximize hardware efficiency in real-world scenarios.
Unlock the Future of LLMs with Expert Insights
Gain insights from industry experts on the current state and future landscape of vLLM and production deployment strategies, AI inference infrastructure evolution, and gain a comprehensive understanding of scaling LLMs efficiently.

Evening Session

Join the Open-Source AI Community
Whether you're taking your first steps in AI or looking to contribute advanced optimizations, connect with peers and mentors who share your interests and can support your journey.

Recommended Prerequisites

GitHub Account (github.com)

Discord Account

Basic Python Knowledge

Pre-course instructions

For the afternoon session:

Bring a laptop for the hands-on workshop session

Laptop & SSH Client:
- Windows: PuTTY (putty.org)
- macOS/Linux: Built-in

Schedule

Date: 03 Apr 2025, Thursday
Time: 9:00 AM - 9:00 PM (GMT +8:00) Kuala Lumpur, Singapore
Location: 32 Carpenter Street, 059911

Agenda

Day/Time	Agenda Activity/Description
9:00am – 9:30am	Doors Open and Check-In Registration Morning Networking Connect with fellow developers over coffee before we begin
9:30am – 10:00am	vLLM Asia Community Launch – Welcome Address Introduction from the Core vLLM team with key announcements about our vLLM Asia Community initiative and its direction
10:00am – 10:30am	Networking Break Continue conversations with peers and speakers over food and refreshments
10:30am – 12:00pm	vLLM Updates and Technical Talks State of vLLM – Current Status & Future Roadmap by Chen, Zhang and Cyrus Leung (30mins) Running vLLM in Production by Tun Jian, Tan (30mins) Open Q&A with vLLM Experts by Chen, Zhang, Cyrus Leung, and Tun Jian, Tan (10 mins) Roundtable Panel Discussion: Infrastructure Evolution for AI Inference (20mins)
12:00pm – 1:00pm	Lunch and Social Hour Chance to mingle, share experiences, ask questions, and getting to know the vLLM community on a personal level
1:00pm – 1:30pm	Doors Open and Check-In Registration
1:30pm – 3:30pm	Technical Talks and Deep-Dive AMD AI SW Introduction & LLMs Optimization with vLLM for AMD GPUs by George Wang DeepSeek R1 Inference Optimizations Case Study by Bruce Xue vLLM ROCm New Features by Haichen Zhang
3:30pm – 4:00pm	Networking Break Continue conversations with peers and speakers over food and refreshments
4:00pm – 6:30pm	Developer’s Hands-On Technical Workshop: From Zero to Production by Tun Jian, Tan and Pin Siang, Tan (Dr) Deploy Optimized LLMs with vLLM on AMD (FREE Access to use of AMD GPU) Build GenAI use cases on JamAI Base
6:30pm – 9:00pm	Dark Mode Devs Chance to mingle, share experiences, ask questions, and getting to know fellow event attendees on a personal level.

Skills Covered

PROFICIENCY LEVEL GUIDE
Beginner: Introduce the subject matter without the need to have any prerequisites.
Proficient: Requires learners to have prior knowledge of the subject.
Expert: Involves advanced and more complex understanding of the subject.

LLM (Proficiency level: Proficient)
AI (Proficiency level: Proficient)

Speakers

Trainer's Profile:

Chen, Zhang, PhD Student at Tsinghua University, vLLM Committer, vLLM

Chen Zhang is a PHD student at Tsinghua University, and used to be a visiting scholar at UC Berkeley. Her research focus on deep learning systems. As a significant contributor to the vLLM project, Chen mainly works on a more flexible KV cache manager. Attend her talk to get first-hand information on the recent updates to vLLM as well as to get insights on the future development of vLLM.

Trainer's Profile:

Cyrus, Leung, PhD Candidate at Hong Kong University of Science and Technology, vLLM Maintainer, vLLM

Cyrus Leung is a Ph.D. candidate at HKUST and a key contributor to the vLLM project, authoring 300+ PRs and over 100k lines of code, as well as triaging more than 950 GH issues. Being the co-lead of the multi-modality workstream, Cyrus has integrated numerous generative and embedding MLLMs into vLLM and helped extend the OpenAI-compatible server with multi-modal capabilities. He currently heads the refactoring effort for V1-compatible multi-modal processor.

Trainer's Profile:

Tun Jian, Tan, LLM Principal Engineer, vLLM Committer, vLLM

Tun Jian, Tan is a key contributor to the advancement of open-source LLM inference on AMD platforms. His work with the vLLM project, including leading the very first PR to expand support to AMD and integrating PTPC-FP8 quantization, has dramatically improved performance on AMD ROCm. He is also a key contributor to the vLLM blog, sharing best practices for maximizing efficiency on the AMD MI300X. Beyond vLLM, Tun Jian collaborated with the LinkedIn/ Liger-Kernel team to bring significant performance improvements to LLM training on AMD GPUs.

Trainer's Profile:

George, Wang, Director, AI Software Product Engineering, AMD

George Wang is Director of AI Software Product Engineering in the AI Group at AMD, where he leads a talented team for AI software solutions, product management, and end-to-end performance optimizations across Data Center, Client, and Edge/Endpoint applications, driving cutting-edge AI capabilities for AMD’s customers, developers, and the broader community. George Issued two U.S. technical patents in the past. George holds master’s degree and with over 20 years of experience in the technology industry.

Trainer's Profile:

Bruce, Xue, AI Product Application Engineer, AMD

Bruce Xue is the AI Product Application Engineer at AMD. Expertise in developing system-level applications leveraging AMD Instinct GPUs and deploying inference solutions for large language models, specifically optimizing high-performance libraries and integrating essential operators within the ROCm ecosystem. Proficient in adapting inference frameworks, including vllm and sglang, to harness the full computational power of AMD GPUs for the seamless deployment of large language models.

Trainer's Profile:

Pin Siang, Tan (Dr), Co-Founder and CTO, Embedded LLM

Dr. Tan Pin Siang, Co-Founder and Chief Technology Officer of Embedded LLM, has over 14 years of in-depth, hands-on experience, spanning deep learning, Generative AI, computer vision, and geospatial analysis. He leads Embedded LLM's technical direction, focusing on optimizing Large Language Model (LLM) inference for unparalleled performance and scalability. His extensive project portfolio showcases his ability to leverage cutting-edge AI to solve complex, real-world problems for government agencies and major corporations across diverse sectors. Dr. Tan is deeply committed to advancing the field, making sophisticated AI solutions practical and accessible, and actively shaping the future of how AI is deployed and utilized.

Trainer's Profile:

Haichen, Zhang, Senior PM AI Product Marketing, AMD

Haichen is the Senior PM for AMD AI Product Marketing, specializing in accelerating training and inference for large language models, recommender systems, computer vision (CV), and natural language processing (NLP) tailored to internet customers. Prior to joining AMD, Haichen worked at NVIDIA as a Machine Learning Architect for four years.