Ai Developer
Senior Gen Ai Engineer
Senior Generative AI Engineer with 6 years of experience in NLP including LLM post-training, reward modeling, and production AI agent systems.
Currently technical owner of the full post-training pipeline and agent architecture for an emotional-companion AI serving 12M+ users — encompassin SFT data generation, reward model development, and reinforcement learning. Background includes domain pre-training at scale (Xiaomi, 100M+ users) and end-to-end dialogue systems for automotive AI (XPeng Inc.).
Experience: 6 years
Yearly salary: $110,000
Hourly rate: $60
Nationality: 🇨🇳 China
Residency: 🇨🇦 Canada
Experience
Senior Generative AI Engineer
Cylingo group 2024 - 2026
Emotional-Companion AI Agent. Solely led and implemented the full post-training pipeline, agent architecture, and infrastructure for a production emotional-companion AI serving 12M+ users within 10 months of launch. Improved Day-1 user retention from 25% to 40% (+60% relative). Post-Training Pipeline (Qwen3-235B-A22B — MoE, 22B active parameters) Independently designed and executed the complete post-training pipeline: supervised fine-tuning (SFT) for emotional-companion dialogue, followed by reinforcement learning (GSPO) targeting instruction following, format adherence, preference alignment, and agentic capabilities. Built a three-stage SFT data generation pipeline: Off-Policy distillation from Claude Sonnet 4 for cold-start; Trained a User Simulation Model on large-scale online chat logs to synthetically generate diverse and edge-case user queries — covering longtail emotional scenarios that were underrepresented in distilled data; Rejection sampling guided by a hybrid reward model; Producing 2B+ high-quality tokens for SFT. The synthetic data generation and rejection sampling stage improved Day-1 retention from 25% to 40%. Developed a hybrid reward signal combining a rule-based reward model (LLM-as-judge with structured prompt engineering) and an SFT-trained reward model, achieving 90% agreement with human annotators and 93% accuracy on online preference data. The hybrid approach mitigated individual model blind spots — the LLM-as-judge excelled at format and instruction-following scoring while the SFT-based model is aligned to online user preference. Served as the reward signal for both rejection sampling in SFT and the GSPO training loop. Selected GSPO over GRPO and PPO for training stability and simplicity at MoE scale. Agent Architecture Designed a memory-augmented multi-session system persisting user preferences, emotional context, and conversation summaries across sessions — improved user Like-to-Dislike ratio from 1.05 to 1.5 (+43% relative), enabling long-term relationship continuity without retraining. Implemented a routing layer using a lightweight 3B LLM as intent classifier, dispatching across emotional companion, tool-use, and memoryretrieval pathways with 97% routing accuracy. Built a RAG-based memory retrieval pipeline that retrieves relevant user memories (preferences, emotional history, past conversation context) based on real-time user input, providing the companion model with personalized long-term context for each conversation turn. Context Management Design System Prompt Caching with a dynamic/static boundary — static content (persona, hard constraints) placed at prompt prefix for KV cache reuse, reaching 20% reduction in TTFT. Context Structure engineered for LLM recency bias exploitation — critical information positioned at context boundaries (beginning/end) where attention is strongest; tool definitions loaded dynamically based on task phase to minimize token waste. Rolling Summarization — long-horizon dialogue compressed via rolling summaries. Memory Offloading — per-user episodic memories stored externally and retrieved via hybrid retrieval (embedding similarity + LLM-based reranking).
Senior Machine Learning Engineer
Xpeng 2021 - 2024
LLM Integration for Digital Cockpit System deployed across over 400,000 vehicle models Conducted systematic SFT experiments across multiple base models and scales in the preliminary phase of LLM such as Llama / Alpaca / GLM / Baichuan/ Qwen for digital cockpit system, evaluating model performance, latency, and task completion accuracy. The final model achieved 96% task completion rate / 10% improvement over the previous rule-based system. Task-Oriented End-to-End Dialogue System Migrated the in-car dialogue system from a traditional pipeline architecture (separate NLU → DM → NLG) to an end-to-end framework, reducing new business onboarding time by 30% and pipeline maintenance overhead by 15%. Built end-to-end action prediction and named entity recognition modules, achieving 95% accuracy / 90% macro-F1 on in-car dialogue benchmarks. Designed a nested NER model using multi-head labeling with bi-affine transformation, resolving overlapping entity extraction that the previous BIO-tagging approach could not handle — improved NER macro-F1 from 50% to 80% on nested entity cases, particularly addressing minority class performance.
NLP Engineer
Xiaomi 2020 - 2021
Domain Pre-Training for XIAOMI AI Assistant Conducted domain-adaptive pre-training on BERT using 120M+ real user query samples from Xiaoai (Xiaomi's AI assistant, serving 100M+ users), applying MLM and Whole Word Masking with dynamic masking strategies. The domain pre-trained model delivered consistent improvements across downstream tasks including intent classification and NER improving intent classification accuracy by 3% and NER F1 score by 2%, and was adopted as the base model for the Xiaoai NLU pipeline.
Skills
machine-learning
nlp
pytorch
tensorflow
ai
english
chinese-mandarin
french