×
 
 Back to all courses

[Full Day] Inaugural vLLM Asia Developer Day

 

03 Apr 2025, Thursday9:00 AM - 9:00 PM (GMT +8:00) Kuala Lumpur, Singapore

 

32 Carpenter Street, 059911

0%

Overview

Due to the limited capacity for this event, participants who have successfully registered will receive a confirmation of registration. 

Please lookout for a confirmation of registration which will be sent directly to your email. (Note: Only participants who have received this confirmation will be able to attend this event). 

Join us for the Inaugural vLLM Asia Developer Day in Singapore, a premier gathering of AI developers, researchers, and industry experts shaping the future of large language model (LLM) inference. Co-organized together with SGInnovate, AMD, Embedded LLM, and vLLM, this event is designed to bring together the brightest minds in AI, providing deep technical insights, hands-on learning, and unparalleled networking opportunities. 

The vLLM Asia Developer Day includes three sessions: 

  • Morning: vLLM Asia Community Launch, vLLM Updates & Technical Talks, plus a Lunch & Social Hour 

  • Afternoon: Technical Talks, Deep Dive Sessions, and a Hands-On Workshop 

  • Evening: Social Mixer & Networking 

Course Description & Learning Outcomes

Target audience:  

  • Software Developers, Data Scientists, AI Engineers and AI Enthusiasts focused on high-performance computing and model deployment. 

  • Also, particularly relevant for professionals working with LLM infrastructures, inference optimization, PyTorch, vLLM, GPU kernels, OpenAI Triton, ROCm, Quantization, RDMA, parallel file systems, and related technologies. 

By attending this full day event, you will learn to:

Morning Session

  • Demystify Advanced LLM Inference  

    Whether you are new to LLMs or an experienced practitioner, explore optimization techniques from fundamentals to cutting-edge implementations like MLA decoding and efficient memory management.  

Afternoon Session

  • Bridge Theory and Practice  

    Connect theoretical concepts with real-world applications through interactive demonstrations and hands-on sessions suitable for all experience levels.  

  • Scale Your LLM Infrastructure  

    Learn practical approaches to deploying and scaling LLM systems - from desktop single GPU setups to distributed multi-node architectures using modern parallelism strategies.  

  • Optimize for Production  

    Understand the performance-cost tradeoffs in LLM deployment and discover techniques to improve throughput, reduce latency, and maximize hardware efficiency in real-world scenarios.  

  • Unlock the Future of LLMs with Expert Insights 

    Gain insights from industry experts on the current state and future landscape of vLLM and production deployment strategies, AI inference infrastructure evolution, and gain a comprehensive understanding of scaling LLMs efficiently. 

Evening Session

  • Join the Open-Source AI Community  

    Whether you're taking your first steps in AI or looking to contribute advanced optimizations, connect with peers and mentors who share your interests and can support your journey. 

Pre-course instructions

For the afternoon session: 

  • Bring a laptop for the hands-on workshop session 

  • Laptop & SSH Client:

    • Windows: PuTTY (putty.org)

    • macOS/Linux: Built-in 

Schedule

Date: 03 Apr 2025, Thursday
Time: 9:00 AM - 9:00 PM (GMT +8:00) Kuala Lumpur, Singapore
Location: 32 Carpenter Street, 059911

Agenda

Day/TimeAgenda Activity/Description
9:00am – 9:30amDoors Open and Check-In Registration Morning Networking Connect with fellow developers over coffee before we begin
9:30am – 10:00amvLLM Asia Community Launch – Welcome Address Introduction from the Core vLLM team with key announcements about our vLLM Asia Community initiative and its direction
10:00am – 10:30amNetworking Break Continue conversations with peers and speakers over food and refreshments
10:30am – 12:00pm vLLM Updates and Technical Talks State of vLLM – Current Status & Future Roadmap by Chen, Zhang and Cyrus Leung (30mins) Running vLLM in Production by Tun Jian, Tan (30mins) Open Q&A with vLLM Experts by Chen, Zhang, Cyrus Leung, and Tun Jian, Tan (10 mins) Roundtable Panel Discussion: Infrastructure Evolution for AI Inference (20mins)
12:00pm – 1:00pmLunch and Social Hour Chance to mingle, share experiences, ask questions, and getting to know the vLLM community on a personal level
1:00pm – 1:30pmDoors Open and Check-In Registration
1:30pm – 3:30pmTechnical Talks and Deep-Dive AMD AI SW Introduction & LLMs Optimization with vLLM for AMD GPUs by George Wang DeepSeek R1 Inference Optimizations Case Study by Bruce Xue vLLM ROCm New Features by Haichen Zhang
3:30pm – 4:00pm Networking Break Continue conversations with peers and speakers over food and refreshments
4:00pm – 6:30pmDeveloper’s Hands-On Technical Workshop: From Zero to Production by Tun Jian, Tan and Pin Siang, Tan (Dr) Deploy Optimized LLMs with vLLM on AMD (FREE Access to use of AMD GPU) Build GenAI use cases on JamAI Base
6:30pm – 9:00pm Dark Mode Devs Chance to mingle, share experiences, ask questions, and getting to know fellow event attendees on a personal level.

Skills Covered

PROFICIENCY LEVEL GUIDE
Beginner: Introduce the subject matter without the need to have any prerequisites.
Proficient: Requires learners to have prior knowledge of the subject.
Expert: Involves advanced and more complex understanding of the subject.

  • LLM (Proficiency level: Proficient)
  • AI (Proficiency level: Proficient)

Speakers

Trainer's Profile:

Chen, Zhang, PhD Student at Tsinghua University, vLLM Committer, vLLM
Chen, Zhang

Chen Zhang is a PHD student at Tsinghua University, and used to be a visiting scholar at UC Berkeley. Her research focus on deep learning systems. As a significant contributor to the vLLM project, Chen mainly works on a more flexible KV cache manager. Attend her talk to get first-hand information on the recent updates to vLLM as well as to get insights on the future development of vLLM.

Trainer's Profile:

Cyrus, Leung, PhD Candidate at Hong Kong University of Science and Technology, vLLM Maintainer, vLLM
Cyrus, Leung

Cyrus Leung is a Ph.D. candidate at HKUST and a key contributor to the vLLM project, authoring 300+ PRs and over 100k lines of code, as well as triaging more than 950 GH issues. Being the co-lead of the multi-modality workstream, Cyrus has integrated numerous generative and embedding MLLMs into vLLM and helped extend the OpenAI-compatible server with multi-modal capabilities. He currently heads the refactoring effort for V1-compatible multi-modal processor.

Trainer's Profile:

Tun Jian, Tan, LLM Principal Engineer, vLLM Committer, vLLM
Tun Jian, Tan

Tun Jian, Tan is a key contributor to the advancement of open-source LLM inference on AMD platforms. His work with the vLLM project, including leading the very first PR to expand support to AMD and integrating PTPC-FP8 quantization, has dramatically improved performance on AMD ROCm. He is also a key contributor to the vLLM blog, sharing best practices for maximizing efficiency on the AMD MI300X. Beyond vLLM, Tun Jian collaborated with the LinkedIn/ Liger-Kernel team to bring significant performance improvements to LLM training on AMD GPUs.

Trainer's Profile:

George, Wang, Director, AI Software Product Engineering, AMD
George, Wang

George Wang is Director of AI Software Product Engineering in the AI Group at AMD, where he leads a talented team for AI software solutions, product management, and end-to-end performance optimizations across Data Center, Client, and Edge/Endpoint applications, driving cutting-edge AI capabilities for AMD’s customers, developers, and the broader community. George Issued two U.S. technical patents in the past. George holds master’s degree and with over 20 years of experience in the technology industry.

Trainer's Profile:

Bruce, Xue, AI Product Application Engineer, AMD
Bruce, Xue

Bruce Xue is the AI Product Application Engineer at AMD. Expertise in developing system-level applications leveraging AMD Instinct GPUs and deploying inference solutions for large language models, specifically optimizing high-performance libraries and integrating essential operators within the ROCm ecosystem. Proficient in adapting inference frameworks, including vllm and sglang, to harness the full computational power of AMD GPUs for the seamless deployment of large language models.

Trainer's Profile:

Pin Siang, Tan (Dr), Co-Founder and CTO, Embedded LLM
Pin Siang, Tan (Dr)

Dr. Tan Pin Siang, Co-Founder and Chief Technology Officer of Embedded LLM, has over 14 years of in-depth, hands-on experience, spanning deep learning, Generative AI, computer vision, and geospatial analysis. He leads Embedded LLM's technical direction, focusing on optimizing Large Language Model (LLM) inference for unparalleled performance and scalability. His extensive project portfolio showcases his ability to leverage cutting-edge AI to solve complex, real-world problems for government agencies and major corporations across diverse sectors. Dr. Tan is deeply committed to advancing the field, making sophisticated AI solutions practical and accessible, and actively shaping the future of how AI is deployed and utilized.

Trainer's Profile:

Haichen, Zhang, Senior PM AI Product Marketing, AMD
Haichen, Zhang

Haichen is the Senior PM for AMD AI Product Marketing, specializing in accelerating training and inference for large language models, recommender systems, computer vision (CV), and natural language processing (NLP) tailored to internet customers. Prior to joining AMD, Haichen worked at NVIDIA as a Machine Learning Architect for four years.

Partners

AMDEmbedded LLMvLLM
Technology:
Industries: