NanoGPT Speedrun Living Worklog
Reducing VRAM Footprint in PPO and GRPO Using Selective Log-Softmax
An Extension to BADGE Active Learning for Variable-Sized Batches
Direct Preference Optimization Explained In-depth