RSS.Social

Eugene Yan

follow: @[email protected]

Posts

Evaluating Long-Context Question & Answer Systems

AI Engineer 2025 - Improving RecSys & Search with LLM techniques

Exceptional Leadership: Some Qualities, Behaviors, and Styles

Building News Agents for Daily News Recaps with MCP, Q, and tmux

An LLM-as-Judge Won't Save The Product—Fixing Your Process Will

Frequently Asked Questions about My Writing Process

NVIDIA GTC 2025 - Building LLM-Powered Applications

Improving Recommendation Systems & Search in the Age of LLMs

Building AI Reading Club: Features & Behind the Scenes

2024 Year in Review

A Spark of the Anti-AI Butlerian Jihad (on Bluesky)

Seemingly Paradoxical Rules of Writing

How to Run a Weekly Paper Club (and Build a Learning Community)

My Minimal MacBook Pro Setup Guide

39 Lessons on Building ML Systems, Scaling, Execution, and More

AlignEval: Building an App to Make Evals Easy, Fun, and Automated

Weights & Biases LLM-Evaluator Hackathon - Hackathon Judge

Building the Same App Using Various Web Frameworks

Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)

How to Interview and Hire ML/AI Engineers

AI Engineer 2024 Keynote - What We Learned from a Year of LLMs

Netflix PRS 2024 - Applying LLMs to Recommendation Experiences

Prompting Fundamentals and How to Apply them Effectively

What We've Learned From A Year of Building with LLMs

Building an AI Coach to Help Tame My Monkey Mind

Task-Specific LLM Evals that Do & Don't Work

Don't Mock Machine Learning Models In Unit Tests

How to Generate and Use Synthetic Data for Finetuning

Language Modeling Reading List (to Start Your Paper Club)

2023 Year in Review

Push Notifications: What to Push, What Not to Push, and How Often

Out-of-Domain Finetuning to Bootstrap Hallucination Detection

Reflections on AI Engineer Summit 2023

AI Engineer 2023 Keynote - Building Blocks for LLM Systems

Evaluation & Hallucination Detection for Abstractive Summaries

How to Match LLM Patterns to Problems

Patterns for Building LLM-based Systems & Products

Obsidian-Copilot: An Assistant for Writing & Reflecting

Some Intuition on Attention and the Transformer

Open-LLMs - A list of LLMs for Commercial Use

Interacting with LLMs with Minimal Chat

More Design Patterns For Machine Learning Systems

Raspberry-LLM - Making My Raspberry Pico a Little Smarter

Experimenting with LLMs to Research, Reflect, and Plan

LLM-powered Biographies

How to Write Data Labeling/Annotation Guidelines

Content Moderation & Fraud Detection - Patterns in Industry

Mechanisms for Effective Technical Teams

Mechanisms for Effective Machine Learning Projects

Goodbye Roam Research, Hello Obsidian

What To Do If Dependency Teams Can’t Help

2022 in Review & 2023 Goals

Autoencoders and Diffusers: A Brief Comparison

Text-to-Image: Diffusion, Text Conditioning, Guidance, Latent Space

RecSys 2022: Recap, Favorite Papers, and Lessons

RecSys 2022 Keynote - Is the Juice Worth the Squeeze?

Writing Robust Tests for Data & Machine Learning Pipelines

Simplicity is An Advantage but Sadly Complexity Sells Better

Uncommon Uses of Python in Commonly Used Libraries

Why You Should Write Weekly 15-5s

Design Patterns in Machine Learning Code and Systems

What I Wish I Knew About Onboarding Effectively

Bandits for Recommender Systems

How to Measure and Mitigate Position Bias

Counterfactual Evaluation for Recommendation Systems

Traversing High-Level Intent and Low-Level Requirements

Data Science Project Quick-Start

Mailbag: How to Define a Data Team's Vision and Roadmap

Red Flags to Look Out for When Joining a Data Team

How to Keep Learning about Machine Learning

The Data Scientist Show - Building end-to-end ML systems

2021 Year in Review

Informal Mentors Grew into ApplyingML.com!

5 Lessons I Learned from Writing Online (Guest post by Susan Shu)

What I Learned from Writing Online - For Fellow Non-Writers

RecSys 2021 - Papers and Talks to Chew on

The First Rule of Machine Learning: Start without Machine Learning

MLOps Community - System Design for RecSys & Search

Reinforcement Learning for Recommendations and Search

Amazon Science - Eugene Yan and the Art of Writing about Science

Bootstrapping Labels via ___ Supervision & Human-In-The-Loop

Mailbag: How to Bootstrap Labels for Relevant Docs in Search

SF Big Analytics - System Design for RecSys & Search

Influencing without Authority for Data Scientists

System Design for Recommendations and Search

Patterns for Personalization in Recommendations and Search

Towards Data Science - Author Spotlight with Eugene Yan

The Metagame of Applying Machine Learning

Search: Query Matching via Lexical, Graph, and Embedding Methods

My Impostor Syndrome Stories (Guest Post by Susan Shu)

How to Live with Chronic Imposter Syndrome

Planning Your Career: Values and Superpowers

Bukalapak - Fireside Chat with the Data Science team

TalkPython - What ML can Teach Us About Life

Choosing Problems in Data Science and Machine Learning

Seven Habits that Shaped My Last Decade

How to Write Design Docs for Machine Learning Systems

How to Write Better with The Why, What, How Framework

Feature Stores: A Hierarchy of Needs

How to Win a Data Hackathon (Hacklytics 2021)

DataTalksClub - Building an ML System; Behind the Scenes

Growing and Running Your Data Science Team

You Don't Really Need Another MOOC

DataTalksClub - The Importance Of Writing In A Tech Career

Mailbag: How to Get Experienced DS Resume Noticed by Recruiters?

Real-time Machine Learning For Recommendations

2021 Roadmap: Sharing, Helping, and Living More

2020 Retrospective: New Country, New Role, New Habit

Catch the Flying Daggers

How I’m Reflecting on 2020 and Planning for 2021

Alexey Grigorev on His Career, Data Science, and Writing

Mailbag: What's the Architecture for your Blog?

What Machine Learning Can Teach Us About Life - 7 Lessons

How to Prevent or Deal with a Data Science Role or Title Mismatch

Applied / Research Scientist, ML Engineer: What’s the Difference?

Chip Huyen on Her Career, Writing, and Machine Learning

Data Discovery Platforms and Their Open Source Solutions

Why I switched from Netlify back to GitHub Pages

Why Have a Data Science Portfolio and What It Shows

How to Install Google Scalable Nearest Neighbors (ScaNN) on Mac

How Prototyping Can Help You to Get Buy-In

Is Writing as Important as Coding?

RecSys 2020: Takeaways and Notable Papers

Appreciating the Present

CareerFair - Day-to-day as an Applied Scientist at Amazon

Routines and Tools to Optimize My Day (Guest Post by Susan Shu)

How to Accomplish More with Less - Useful Tools & Routines

Migrating Site Comments to Utterances

How to Test Machine Learning Code and Systems

Mailbag: Parsing Fields from PDFs—When to Use Machine Learning?

Datacast Podcast - Effective Data Science with Eugene Yan

How Reading Papers Helps You Be a More Effective Data Scientist

Mailbag: I'm Now a Senior DS—How should I Approach this?

Embrace Beginner's Mind; Avoid The Wrong Way To Be An Expert

NLP for Supervised Learning - A Brief Survey

Unpopular Opinion: Data Scientists Should be More End-to-End

Adding a Checkbox & Download Button to a FastAPI-HTML app

What I Did Not Learn About Writing In School

Georgia Tech's OMSCS FAQ (based on my experience)

How to Set Up a HTML App with FastAPI, Jinja, Forms & Templates

Why You Need to Follow Up After Your Data Science Project

What I Do During A Data Science Project To Deliver Success

How to Update a GitHub Profile README Automatically

The 85% Rule: When Giving It Your 100% Gets You Less than 85%

My Notes From Spark+AI Summit 2020 (Application-Specific Talks)

My Notes From Spark+AI Summit 2020 (Application-Agnostic Talks)

How to Set Up a Python Project For Automation and Collaboration

Mailbag: Qns on the Intersection of Data Science and Business

Why Are My Airflow Jobs Running “One Day Late”?

What I Do Before a Data Science Project to Ensure Success

What I Love about Scrum for Data Science

How to Apply Crocker's Law for Feedback and Growth

A Practical Guide to Maintaining Machine Learning in Production

6 Little-Known Challenges After Deploying Machine Learning

How to Write: Advice from David Perell and Sahil Lavingia

A Hackathon Where the Dinkiest Idea Won. Why?

Serendipity: Accuracy’s Unpopular Best Friend in Recommenders

How to Give a Kick-Ass Data Science Talk

Commando, Soldier, Police and Your Career Choices

Stop Taking Regular Notes; Use a Zettelkasten Instead

Writing is Learning: How I Learned an Easier Way to Write

Simpler Experimentation with Jupyter, Papermill, and MLflow

My Journey from Psych Grad to Leading Data Science at Lazada

DataScience SG Meetup - RecSys, Beyond the Baseline

Beating the Baseline Recommender with Graph & NLP in Pytorch

Building a Strong Baseline Recommender in PyTorch, on a Laptop

OMSCS CS6200 (Introduction to OS) Review and Tips

DataScience SG x ODSC Meetup - Applying ML to Healthcare

OLX Prod Tech 2019 Keynote - Asia's Tech Giants & SuperApps

OMSCS CS6750 (Human Computer Interaction) Review and Tips

Goodbye Wordpress, Hello Jekyll!

OMSCS CS6440 (Intro to Health Informatics) Review and Tips

OMSCS CS7646 (Machine Learning for Trading) Review and Tips

What does a Data Scientist really do?

DATAx - A Production ML system for SEA's Biggest Hospital Group

Data Science and Agile (Frameworks for Effectiveness)

Data Science and Agile (What Works, and What Doesn't)

DataScience SG Meetup - Panel On the Different Roles in Data

OMSCS CS6601 (Artificial Intelligence) Review and Tips

GovTech Conference - Data Science and Agile—Can or Not?

OMSCS CS6460 (Education Technology) Review and Tips

OMSCS CS7642 (Reinforcement Learning) Review and Tips

Big Data & Analytics Summit - Data Science Challenges @ Lazada

Building a Strong Data Science Team Culture

INSEAD Lunchtime Talks - How Lazada uses Data

OMSCS CS7641 (Machine Learning) Review and Tips

My first 100 days as Data Science Lead

SMU - What is Data Analytics and How do I get into it?

OMSCS CS6300 (Software Development Process) Review and Tips

Tech in Asia - My Journey in Data Science and Advice for others

SMU Masters in IT - How to get started in Data Science

How to get started in Data Science

OMSCS CS6476 (Computer Vision) Review and Tips

One way to help a data science team innovate successfully

Product Categorization API Part 3: Creating an API

Image search is now live!

Product Classification API Part 2: Data Preparation

Strata x Hadoop 2016 - How Lazada Ranks Products

Image classification API is now live!

I'm going back to school

SortMySkills is now live!

Product Classification API Part 1: Data Acquisition

Thoughts on Functional Programming in Scala Course (Coursera)

First post!

DataKind Singapore’s Latest Project Accelerator

DataScience SG Meetup - How we got top 3% in Kaggle