Building and Deploying Large Language Model Applications Efficiently and Verifiably | Ying Sheng
Computer Science & Engineering Building (CSE), Room 1242 3234 Matthews Ln, La Jolla, CA, United StatesThe applications of large language models (LLMs) are increasingly complex and diverse, necessitating efficient and reliable frameworks for building and deploying them. In this talk, I will begin with algorithms and systems for serving LLMs for everyone (FlexGen, S-LoRA, VTC), highlighting the growing trend of personalized LLM services. My work addresses the need to run LLMs locally for isolated individual needs. It also tackles the problem of efficiency and service fairness when resource sharing among many users is required. Once we have efficient deployment, a primary concern is the reliability of generation. The second part of this talk aims to address this issue by exploring verifiable code generation. To achieve this, I adopt tools in formal verification to facilitate LLMs in generating correctness certificates alongside other artifacts (Clover). Finally, I will touch on future research avenues, such as integrating formal methods with LLMs and developing programming systems for generative AI.