Lightweight LLM Reproduction and Multimodal Extension

This project reproduces a compact decoder-only large language model pipeline from data preparation to training and deployment. It covers tokenizer construction, pretraining data organization, supervised fine-tuning, LoRA adaptation, reinforcement learning training, sampling, KV cache, and inference deployment.

The project also extends the text-only pipeline toward speech-vision-language multimodal modeling, including visual encoder to LLM alignment, image-text instruction data organization, and VQA/caption prototypes.