Lequn Chen || abcdabcd987
Journey to 2-second Inter-node RL Weight Transfer
Harnessing 3200Gbps Network (15): Lazy Posting
Harnessing 3200Gbps Network (14): Batch Posting
Harnessing 3200Gbps Network (13): State Sharding
Harnessing 3200Gbps Network (12): CPU Core Pinning
Harnessing 3200Gbps Network (11): Multi-threading
Harnessing 3200Gbps Network (10): Pre-benchmark Warmup
Harnessing 3200Gbps Network (9): Using 32 Network Cards
Harnessing 3200Gbps Network (8): Bus Topology
Harnessing 3200Gbps Network (7): Queuing and Benchmark
Harnessing 3200Gbps Network (6): GPUDirect RDMA WRITE
Harnessing 3200Gbps Network (5): Bidirectional SEND and RECV
Harnessing 3200Gbps Network (4): Unidirectional SEND and RECV
Harnessing 3200Gbps Network (3): libfabric
Harnessing 3200Gbps Network (2): High-Performance Network System Design Philosophy
Harnessing 3200 Gbps Network (1): RDMA and EFA
Harnessing 3200 Gbps Network: A Journey with RDMA, EFA, and libfabric
Potentials of Multitenancy Fine-Tuned LLM Serving
Dissecting Batching Effects in GPT Inference
How we discovered why C++ exceptions disappear in stack trace
Coq Tricks for Beginners with Too Many Examples