RSS.Social

LessWrong

follow: @[email protected]

Posts

Monthly Roundup #37: December 2026

Information in circulation is self-organised critical. Small changes in environment can make large, discontinuous changes in the information space.

The Fly Farm

New 80k problem profile: extreme power concentration

The point of view of the universe

The Fantastic Piece of Tinfoil in my Wallet

AISN #66: Evaluating Frontier Models, New Gemini and Claude, Preemption is Back

Annals of Counterfactual Han

Does dissolving newcomb's paradox matter?

AI performance has surpassed a human baseline on ARC-AGI-2

Designing the World’s Safest AI based on Morality Models

ASI Already Knows About Torture - In Defense of Talking Openly About S-Risks

Cognitive Tech from Algorithmic Information Theory

Announcing Progress in Medicine, a high school summer career exploration program

Weird Generalization & Inductive Backdoors

The tree, the fly, the ant, the dog, the farmer and the businessman

Ships in the Night – A Short Story

If Anyone Builds It Everyone Dies, another semi-outsider review

North Sentinelese Post-Singularity

Flock – work in public with friends (beta testers wanted)

AI #146: Chipping In

Sea snails in a cocaine vaccine

Resources for parents

Steganographic Chains of Thought Are Low-Probability but High-Stakes: Evidence and Arguments

Brain-inspired LLM alignment

Seven Perspectives on LLMs

MIRI Comms is hiring

Some evidence against the idea strange CoT stems from incentives to compress language

Follow-through on Bay Solstice

Rock Paper Scissors is Not Solved, In Practice

Childhood and Education #15: Got To Get Out

Apply to ESPR & PAIR 2026, Rationality and AI Camps for Ages 16-21

Evaluation as a (Cooperation-Enabling?) Tool

Consider calling the NY governor about the RAISE Act

No ghost in the machine

Most Algorithmic Progress is Data Progress [Linkpost]

Fibonacci Holds Information

Register for SPAR Demo Day on Saturday, Dec 13

We don't know what most microbial genes do. Can genomic language models help?

Artifacts I'd like to try

AI Safety – Analyse Affordances

Caesar Derangement Syndrome

How to have a debate on this platform?

Living on a ball of hair

The funding conversation we left unfinished

Do you expect the first AI to cross NY's RAISE Act's "Critical Harm" threshold to be contained?

TT Self Study Journal # 5

Lorxus Does Halfhaven: 11/29, 11/30, Highlights, Postmortem

Tristan's list of things to write

Tate Modern 2150

Selling H200s to China Is Unwise and Unpopular

Non-optimized beauty

Localizing Finetuned Information in Transformers with Dynamic Weight Grafting

[paper] Auditing Games for Sandbagging

Insights into Claude Opus 4.5 from Pokémon

Gradual Disempowerment Monthly Roundup #3

Every house has a chemistry lab

Ways we can fail to answer

Do you take joy in effective altruism?

My experience running a 100k

Seriously, use text expansions

The reverse sear as a worthwhile life skill

Every point of intervention

D&D Sci Thanksgiving: the Festival Feast Evaluation & Ruleset

Towards a Categorization of Adlerian Excuses

A Falsifiable Causal Argument for Substrate Independence

Gödel's Ontological Proof

Prompting Models to Obfuscate Their CoT

High-level approaches to rigor in interpretability

Human Dignity: a review

A few quick thoughts on measuring disempowerment

How Stealth Works

I have hope

"Carve Reality at Its Joints" is poetry

The Possibility of an Ongoing Moral Catastrophe

Zen Wisdom, Diffused

[Paper] Does Self-Evaluation Enable Wireheading in Language Models?

Algorithmic thermodynamics and three types of optimization

Little Echo

From Barriers to Alignment to the First Formal Corrigibility Guarantees

Scaling what used not to scale

The effectiveness of systematic thinking

I said hello and greeted 1,000 people at 5am this morning

Your Digital Footprint Could Make You Unemployable

2025 Unofficial LessWrong Census/Survey

AI in 2025: gestalt

[Linkpost] Theory and AI Alignment (Scott Aaronson)

Lawyers are uniquely well-placed to resist AI job automation

How a bug of AI hardware may become a feature for AI governance

Karlsruhe - If Anyone Builds It, Everyone Dies

Eliezer's Unteachable Methods of Sanity

Ordering Pizza Ahead While Driving

Existential despair, with hope

I Need Your Help

Crazy ideas in AI Safety part 1: Narrow-Defined Communication

The corrigibility basin of attraction is a misleading gloss

LW Transcendence

The Adequacy of Class Separation

Answering a child's questions

AI Mood Ring: A Window Into LLM Emotions

Critical Meditation Theory

Tools, Agents, and Sycophantic Things

What Happens When You Train Models on False Facts?

why america can't build ships

Reasons to care about Canary Strings

An AI-2027-like analysis of humans' goals and ethics with conservative results

Management of Substrate-Sensitive AI Capabilities (MoSSAIC) Part 3: Resolution

Announcing: Agent Foundations 2026 at CMU

DeepSeek v3.2 Is Okay And Cheap But Slow

Who is AGI for, and who benefits from AGI?

Journalist's inquiry into a core organiser breaking his nonviolence commitment and leaving Stop AI

Try Training SAEs with RLAIF

On the Aesthetic of Wizard Power

Will misaligned AIs know that they're misaligned?

Thresholding

An Abstract Arsenal: Future Tokens in Claude Skills

Cross Layer Transcoders for the Qwen3 LLM Family

Management of Substrate-Sensitive AI Capabilities (MoSSAIC) Part 2: Conflict

Livestream for Bay Secular Solstice

Center on Long-Term Risk: Annual Review & Fundraiser 2025

Power Overwhelming: dissecting the $1.5T AI revenue shortfall

AI #145: You’ve Got Soul

Is Friendly AI an Attractor? Self-Reports from 22 Models Say Probably Not

Modelling Trajectories - Interim results

Emergent Machine Ethics: A Foundational Research Framework for the Intelligence Symbiosis Paradigm

Help us find founders for new AI safety projects

Do we have terminology for "heuristic utilitarianism" as opposed to classical act utilitarianism or formal rule utilitarianism?

What is the most impressive game an LLM can implement from scratch?

Epistemology of Romance, Part 2

Front-Load Giving Because of Anthropic Donors?

Center for Reducing Suffering (CRS) S-Risk Introductory Fellowship applications are open!

An AI Capability Threshold for Funding a UBI (Even If No New Jobs Are Created)

Categorizing Selection Effects

Blog post: how important is the most spec if alignment fails?

Beating China to ASI

Management of Substrate-Sensitive AI Capabilities (MoSSAIC) Part 1: Exposition

Embedded Universal Predictive Intelligence

Human-AI identity coupling is emergent

On Dwarkesh Patel’s Second Interview With Ilya Sutskever

A Critique of Yudkowsky’s Protein Folding Heuristic

Recollection of a Dinner Party

Formalizing Newcombian problems with fuzzy infra-Bayesianism

Proof section to formalizing Newcombian problems with fuzzy infra-Bayesianism

Human art in a post-AI world should be strange

It's tricky to tell what % of the economy the state controls

I'm Skeptical of and Confused About The Multiplier in Macroeconomics

The AI Society Lens

Effective Pizzaism

LLMs Don't Have Superhuman Research Taste: Tastybench

AI Safety at the Frontier: Paper Highlights of November 2025

Five ways AI can tell you're testing it

Why Moloch is actually the God of Evolutionary Prisoner’s Dilemmas

Reward Mismatches in RL Cause Emergent Misalignment

Sci.STEPS invites mentee applications

How Claude Opus 4.5 describes its experience of various concepts

Safety Cases Explained: How to Argue an AI is Safe

The Hidden Asymmetry in Personal Preparedness: Early Costs, Late Losses

Halfhaven Digest 6 + Retrospective

Metric-haven (quick stats on how Inkhaven impacted LessWrong)

MIRI’s 2025 Fundraiser

Everyone Can Be High Status In Utopia

How to Write Fast, Weird, and Well

The Overthinking Problem in Stroke Recovery

Future Proofing Solstice

High learning rate can lead to depression

Announcing: OpenAI's Alignment Research Blog

AI Mental Health Chatbots for Low-Resource Settings: A Prioritization Framework

Which planet is closest to the Earth, and why is it Mercury?

How middle powers may prevent the development of artificial superintelligence

Becoming a Chinese Room

Well, Seasons Greatings Everyone! [Short Fiction]

23 thoughts on Artificial Intelligence (2025)

Lorxus Does Halfhaven: 11/22~11/28

Would ASI development in non-party states undermine a nonproliferation agreement?

Claude Opus 4.5 Is The Best Model Available

HERMES: Towards Efficient and Verifiable Mathematical Reasoning in LLMs

Alignment as an Evaluation Problem

Interview: What it's like to be a bat

A Statistical Analysis of Inkhaven

How To Deploy an AI

10 Aphorisms from 𝘛𝘩𝘦 𝘉𝘦𝘥 𝘰𝘧 𝘗𝘳𝘰𝘤𝘳𝘶𝘴𝘵𝘦𝘴

Inkhaven: 30 Days, 30 Memories

November Retrospective

Video quality is mainly not determined by resolution

Models of Mind: Are You Mono- or Multi-Threaded?

Inverting qualia with group theory

What makes a good egg?

Soul-Whore

Choose Your Failure Modes

Human Agency in a Superintelligent World

Inkhaven Retrospective

Hyperstition

Explosive Skill Acquisition

The Comet King (solstice reading)

My Three Years Wasted by University

A Blogger's Guide To The 21st Century

Alphabetical Conundra Vol 2.

Unconferences: A Better Way to Run Meetups

Ben's 10 Tips for Event Feedback Forms

Does SI Disfavor Computationalism?

14 Concerns About Immortality

Flashcards are worth it

Insulin Resistance and Glycemic Index

The Moonrise Problem

I wrote a blog post every day for a month, and all I got was this lousy collection of incoherent ramblings

Uncertain Updates: November 2025

Reflections on my woo-aversion

Manifest x DC After Action Report

Why do some people prefer gifts to money?

Silicon Morality Plays: The Hyperstition Progress Report

Slop and Beauty and Infinite Power

Scientists make sense of shapes in the minds of the models

Can We Secure AI With Formal Methods? November-December 2025

The Joke

I gave LLMs emotional damage

A Harried Meeting

Change My Mind: The Rationalist Community is a Gift Economy

Epistemology of Romance, Part 1

Unless its governance changes, Anthropic is untrustworthy

Claude 4.5 Opus' Soul Document

El Dorado

The Missing Genre: Heroic Parenthood - You can have kids and still punch the sun

The NPC → MC Spectrum

Symbolic Regression, Sparsification, and Kolmogorov-Arnold Networks

Not A Love Letter, But A Thank You Letter

College life with short AGI timelines

Why Abortion Prohibition is Dysgenic

Opus 4.5 is funny

Some notes on supply, demand, cost and utility

Drugs Aren't A Moral Category

Tests of LLM introspection need to rule out causal bypassing

‘Safe by Design’: A Speculative Paradigm for Positive AI Development

Claude Opus 4.5: Model Card, Alignment and Safety

The Best Lack All Conviction: A Confusing Day in the AI Village

Should you work with evil people?

Seemingly Irrational Voting

Where I Am Donating in 2025

The Responder

A Taxonomy of Bugs (Lists)

You Are Much More Salient To Yourself Than To Everyone Else

The Power of Purple: DnD Design Lessons from a 4-Year-Old

Bitter Lessons from Distillation Robustifies Unlearning

A Thanksgiving Memory

Legitimate Deliberation

Incoherent rambling about preference utilitarism

Tristan Harris Interview on AI Safety by Steven Bartlett (aka Diary Of A CEO)

AI #144: Thanks For the Models

Robust Software Isn't About Error Handling

The First Thanksgiving

P-hacking as focusing a microscope

Will We Get Alignment by Default? — with Adrià Garriga-Alonso

Alignment Bootstrapping Is Dangerous

Is there an analogue of Riemann's mapping theorem for split complex numbers, or otherwise?

The Big Nonprofits Post 2025

How Reducing Cognitive Interference Could Revolutionize Stroke Recovery

The crux on consciousness

Model Weight Preservation is not enough

AI Teddy Bears: A Brief Investigation

Despair, Serenity, Song and Nobility in "Hollow Knight: Silksong"

Information Hygiene

On Impact Certificates

Why Wouldn't A Rationalist Be Rational?

To write well; first, experience.

What it feels like to be enthusiastically time blind

A Technical Introduction to Solomonoff Induction without K-Complexity

Just explain it to someone

Architectures for Increased Externalisation of Reasoning

Courtship Confusions Post-Slutcon

The Big Nonprofits Post 2025

Making Sense of Consciousness Part 8: Summing Up

Management of Substrate-Sensitive AI Capabilities (MoSSAIC) Part 0: Overture

Subliminal Learning Across Models

ARENA 6.0 Impact Report

Postmodernism for STEM Types: A Clear-Language Guide to Conflict Theory

Training Models to Detect Activation Steering: Results and Implications

For A Short Period Of Time, You Can Save 21,000 Shrimp Per Dollar

AI discourse analyzed (we looked at essays, Twitter, Bluesky, Truth Social)

Quotes on AI and wisdom

Preference gaps as a safeguard against AI self-replication

Minor Wording Changes Produce Major Shifts in AI Behavior

A Clarifying Characterization of the Hard Problem of Consciousness

[Linkpost] "Higher glass transition temperatures reduce thermal stress cracking in aqueous solutions relevant to cryopreservation"

Neural Annealing: Directing Psychedelic Trips Towards Healing

Autostructures: Craftsmanship in the Age of Vibes.

Meetup Tip: The Second Organizer

Principles and Generators of a Rationality Dojo

Well-Being & Well-Working: 8 Ways I Get Things Done

Spatially distributed consciousness is not an abstract thought experiment if AI is conscious

Snippets on Living In Reality

Evolution & Freedom

Reasons Why I Cannot Sleep

Should we extend moral patienthood to LLMs?

Training PhD Students to be Fat Newts (Part 2)

Things I wish I knew to save GPU minutes on Llama 405b model (and other beasts)

Three positive updates I made about technical grantmaking at Coefficient Giving (fka Open Phil)

Want a single job to serve many AI safety projects? Ashgro is hiring an Operations Associate

Beware boolean disagreements

EA ITT: An attempt

The Economics of Replacing Call Center Workers With AIs

Evaluating honesty and lie detection techniques on a diverse suite of dishonest models

HTTP402: musings about an ad-free internet

ChatGPT 5.1 Codex Max

Studying Mechanistic of Alignment Faking in Llama-3.1-405B

Security Complacency Meets Frontier AI: The Coming Collapse of ‘Secure by Apathy’

Avoid Fooling Yourself By Believing Two Opposing Things At Once

Alignment will happen by default. What’s next?

Ruby's Inkhaven Retrospective

Against Making the Same Mistake Twice

Training PhD Students to be Fat Newts (Part 1)

How to love Battlefield as much as I do

The reasonable effectiveness of mathematics

Toy Models of Superposition in the dense regime

The Ease Disease

Takeaways from the Eleos Conference on AI Consciousness and Welfare

Is Rationalism a Religion

Market Truth

Making Sense of Consciousness Part 7: Volition

A One-Minute ADHD Test

Where is Online?

AI Safety Interventions

Thou art rainbow: Consciousness as a Self-Referential Physical Process

On morality, defection-robustness, and legibility incentives

Where is Online?

What would an IRB-like policy for AI experiments look like?

The Coalition

OpenAI finetuning metrics: What is going on with the loss curves?

Alignment may be localized: a short (and albeitly limited) experiment

Maybe Insensitive Functions are a Natural Ontology Generator?

ACX Atlanta December Meetup

The Penicillin Myth

Gemini 3 Pro Is a Vast Intelligence With No Spine

Formal confinment prototype

On negotiated settlements vs conflict with misaligned AGI

NATO is dangerously unaware of its military vulnerability

The LessWrong Team Was Selling Dollars For 86 Cents

Want a single job to serve many AI safety projects? Ashgro is hiring an Operations Associate

I am a rationalist

Continuity

Inkhaven Retrospective

Androgenetic haploid selection

How to Talk to Journalists

Formality

Why Talk to Journalists

I made a tool for learning absolute pitch as an adult

"Self-esteem" is distortionary

Rationalist Techno-Buddhist Jargon 1.0

Finding the uncertainty vector in GPT2-scale transformers

Stop Applying And Get To Work

Halfhaven Digest #5

Emotions, Fabricated

I'll be sad to lose the puzzles

Show Review: Masquerade

If you cannot be good, at least be bad correctly

Please Measure Verification Burden

Solstice Singalong Watch Party

Busking Practice

The Enemy Gets The Last Hit

A list of people who could’ve started a nuclear war, but chose not to

Traditional Food

Memories of a British Boarding School #2.5

Dipole Nature

What kind of person is DeepSeek's founder, Liang Wenfeng? An answer from his old university classmate.

Comment on Natural Emergent Misalignment Paper by Anthropic

How to throw parties

Stream of Consciousness as a Scaffolding Skill

Literacy is Decreasing Among the Intellectual Class

Market Logic II

You can just do things

Easy vs Hard Emotional Vulnerability

Why your sports car isn't a racecar (tradeoffs everywhere)

Assorted Thoughts on "Pivoting" to AI

OpenAI Locks Down San Francisco Offices Following Alleged Threat From Activist

Sorry, I still think kidney donation makes no sense for an EA

Automatic alt text generation

My frustrations: AI doom

Introspection in LLMs: A Proposal For How To Think About It, And Test For It

AI Red Lines: A Research Agenda

Book Review: Wizard's Hall

Market Logic I

Animal welfare concerns are dominated by post-ASI futures

Habitual mental motions might explain why people are content to get old and die

D&D.Sci Thanksgiving: the Festival Feast

Diplomacy during AI takeoff

Easy Opportunity to Help Many Animals

Why Not Just Train For Interpretability?

Complaining about my inability to focus on uninteresting things

Models not making it clear when they're roleplaying seems like a fairly big issue

Natural Emergent Misalignment from Reward Hacking

We won't solve non-alignment problems by doing research

Can Artificial Intelligence Be Conscious?

Gemini 3: Model Card and Safety Framework Report

Lorxus Does Halfhaven: 11/15~11/21

URGENT @everyone - help us kill AI preemption (again) before this Friday

EA Hotel Solstice

Why Does Empathy Have an Off-Switch?

What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village

Let Us Stop the Train Before it Crashes! 🙏

Should I Apply to a 3.5% Acceptance-Rate Fellowship? A Simple EV Calculator

Towards Humanist Superintelligence

16 Writing Tips from Inkhaven

Reading My Diary: 10 Years Since CFAR

10 Key Insights from the “Frontier AI Risk Monitoring Platform”

Contra Collisteru: You Get About One Carthage

Rescuing truth in mathematics from the Liar's Paradox using fuzzy logic

Preferences are confusing

Suicide Prevention Ought To Be Illegal

PSA: For Chronic Infections, Check Teeth

[Paper] Output Supervision Can Obfuscate the CoT

The Boring Part of Bell Labs

What the term "Mass Communication" gestures at

Dominance: The Standard Everyday Solution To Akrasia

Do One Neat Thing vs. Get Work Done

Gemini 3 is Evaluation-Paranoid and Contaminated

Evaluating and Understanding Scheming Propensity

Try seeing art

AI #143: Everything, Everywhere, All At Once

Thinking about reasoning models made me less worried about scheming

Coming soon

July 2024 Welcome Edition

August 2024 Progress in Guaranteed Safe AI

September 2024 Progress in Guaranteed Safe AI

October 2024 Progress in Guaranteed Safe AI

November-December 2024 Progress in Guaranteed Safe AI

January-February 2025 Progress in Guaranteed Safe AI

March-April 2025 Progress in Guaranteed Safe AI

May-June 2025 Progress in Guaranteed Safe AI

July-October 2025 Progress in Guaranteed Safe AI

Restricting Dangerous Research: Has It Worked Before, and Could It Work for AI?

Should we shun the legibly evil?

Rumored Trump EO

What would adults in the room know about AI risk?

10 Wrong and Dumb Grammar Rules

My burnout journey

What Is The Basin Of Convergence For Kelly Betting?

Exclusive: Here's the draft Trump executive order on AI preemption

How critical is ASML to GPU progress?

In Defense of Goodness

Preventing covert ASI development in countries within our agreement

A review of Red Heart, the new alignment novel by Max Harms

Monthly Roundup #36: November 2025

MLSN #17: Measuring General AI Abilities and Mitigating Deception

Review: The Most Dangerous Writing App

Dense reconstruction is the scaffold of machine learning

Better Writing Through Claude

I give up.

The Bughouse Effect

Memories of a British Boarding School #2

On Wanting

Automate, automate it all

My Ethical Conundrum Around Writing About Meditation

A day in the life of a LW developer

An antibiotic for parasitic AI

Against Money Maximalism

Error Bars as Degrees of Belief

How the aliens next door shower

Anthropic is (probably) not meeting its RSP security commitments

Considerations for setting the FLOP thresholds in our example international AI agreement

Jailbreaking AI models to Phish Elderly Victims

Beren's Essay on Obedience and Alignment

Why are FICO scores effective?

The Aura of a Dark Lord

New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence

Sign language as a generally-useful means of communication (even if you have good hearing)

Victor Taelin's notes on Gemini 3

On Writing #2

GPT 5.1 Follows Custom Instructions and Glazes

ARC progress update: Competing with sampling

Status Is The Game Of The Losers' Bracket

Reconstellation: construct a flywheel for personal change

Kairos is the new home for the Global Challenges Project, and we’re hiring for a GCP Director

The Illegible Chain-of-Thought Menagerie

An urgent call for better Risk Modelling

Eat The Richtext

Memories of a British Boarding School #1

Preference Weighting and the Abilene Paradox

Continuity

How Colds Spread

Small batches and the mythical single piece flow

I store some memories spatially and I don't know why

An Analogue Of Set Relationships For Distribution

No One Reads the Original Work

Thoughts and experiences on using AI for learning

Middlemen Are Eating the World (And That's Good, Actually)

Are there examples of communities where AI is making epistemics better now?

Generalisation Hacking: a first look at adversarial generalisation failures in deliberative alignment

Varieties Of Doom

Omniscience one bit at a time: Chapter 5

AI 2025 - Last Shipmas

A Market of Whispering Earrings

Knowing Whether AI Alignment Is a One-Shot Problem Is a One-Shot Problem

Lobsang's Children

50 Shades of Red

75 and 750 Words on Legal Personhood

Considerations regarding being nice to AIs

Human behavior is an intuition-pump for AI risk

On Comparative Advantage & AGI

Sunnyvale EA/LW/ACX meetup

Approximating arbitrary complex-valued continuous functions

Mnemonic Exposition

In-Context Writing with Sonnet 4.5

Mediators: a different route through conflict

Video games are philosophy's playground

Why is American mass-market tea so terrible?

Mixed Feelings on Social Munchkinry

What the new generation of AI believers sees

What would my 12-year-old self think of agent foundations?

A year-long journey of getting fast internet in Paris

The Control System Going Out of Control

Why So Much Moloch?

Rethinking everything

Where is the Capital? An Overview

Don't defer to people when they're clearly lying

[Part 1] Why are Silicon Valley people so optimistic about AI?

Process Crimes and Pedantic Rules

Omniscience one bit at a time: Chapter 4

Arrows of time and space

I Spent 30 Days Learning to Smile More Charismatically

The new Pluribus TV show is a great and unusual analogy for AI.

Considering the Relevance of Computational Uncertainty for AI Safety

The Badness of Death in Different Metaethical Theories

Brand New Experience Salesman

7 Vicious Vices of Rationalists

Support the Movement against AI extinction risk

Finding My Internal Compass, Literally

Now, “Tethering” is a Bigger Educational Problem Than “Involution”

The Ambiguity Of "Human Values" Is A Feature, Not A Bug

Racing For AI Safety™ was always a bad idea, right?

AI safety undervalues founders

Sharpening Your Map: Introducing Calibrate

Why does ChatGPT think mammoths were alive December?

The skills and physics of high-performance driving, Pt. 1

Your Clone Wants to Kill You Because You Assumed Too Much

Writing Hack: Write It Just Like That

AI loves octopuses

Punctuation & Quotation Conventions

Matrices map between biproducts

Don't use the phrase "human values"

Generative AI and the P=NP problem

Halfway there; on desperation management

"Middlemarch" is inane and also one of my favorite books

Just Another Five Minutes

Same cognitive paints, exceedingly different mental pictures

A Love Song to Nicotine

Private Latent Notation and AI-Human Alignment

On Battle-Short: What, How, and Why Not To

The Flaw in the Paperclip Maximizer Thought Experiment

"But You'd Like To Feel Companionate Love, Right? ... Right?"

Generation Ship: A Protest Song For PauseAI

Will AI systems drift into misalignment?

Everyday Clean Air

Some Sun Tsu quotes sound like they're actually about debates/epistemics

What are your impossible problems?

Prediction markets for social deduction games without weird incentives

List of great filk songs

a sketch of how we might go about getting basins of corrigibility from RL

Lambda Calculus Prior

AI Craziness: Additional Suicide Lawsuits and The Fate of GPT-4o

Understanding and Controlling LLM Generalization

Lorxus Does Halfhaven: 11/08~11/14

From Anthony: Control Inversion

LLM would have said this better, and without all these typos too

The Charge of the Hobby Horse

The Eightfold Path To Enlightened Disagreement

10 Types of LessWrong Post

Creditworthiness should not be for sale

Everyone has a plan until they get lied to the face

Notes on the book "Talent"

How do you read Less Wrong?

Halfhaven Digest #4

AI Corrigibility Debate: Max Harms vs. Jeremy Gillen

Types of systems that could be useful for agent foundations

The rare, deadly virus lurking in the Southwest US, and the bigger picture

Tell people as early as possible it's not going to work out

Questioning Computationalism

Orient Speed in the 21st Century

Evaluation Avoidance: How Humans and AIs Hack Reward by Disabling Evaluation Instead of Gaming Metrics

Self-interpretability: LLMs can describe complex internal processes that drive their decisions

(Fantasy) -> (Planning): A Core Mental Move For Agentic Humans?

How does one tell apart results in ethics and decision theory?

Handover to AI R&D Agents - relevant research?

Supervised fine-tuning as a method for training-based AI control

Perhaps you should suspect me as well

The Transformer and the Hash

just another potential man

Low-Temperature Evaluations Can Mask Critical AI Behaviors

Epistemic Spot Check: Expected Value of Donating to Alex Bores's Congressional Campaign

Weight-sparse transformers have interpretable circuits

AI #142: Common Ground

Mortgage houses not land?

ClaudoBiography: The Unauthorized Autobiography of Claude, or: The Life of Claude and of His Fortunes and Adversities

Paranoia rules everything around me

8 Questions for the Future of Inkhaven

Strategically Procrastinate as an Anti-Rabbit-Hole Strategy

Favorite quotes from "High Output Management"

Turing-Complete vs Turing-Universal

Are AI time horizons inherently superexponential?

Meetup Tip: Food

Two can keep a secret if one is dead. So please share everything with at least one person.

Utilitarian inequality metrics

Being The Target Demographic

Lorxus Favors: An Experiment in Self-Backed Giftlike Macroeconomics (+ Extra Bits)

A bad review != a bad book

The Pope Offers Wisdom

Why Truth First?

Bitcoin Halvings and the Trisolaran Mistake: When External Actors Masquerade as Natural Laws

Social drives 2: “Approval Reward”, from norm-enforcement to status-seeking

OpenAI Releases GPT 5.1

Is SGD capabilities research positive?

Lighthaven-ish Ticket Strategy: Three Pillars of FOMO

Personal Account: To the Muck and the Mire

We live in the luckiest timeline

AI for Safety & Science Nodes in Berlin & the Bay Area

Reflections on being Sorted

Lorxus Does Halfhaven: 11/01~11/07

Undissolvable Problems: things that still confuse me

Introducing faruvc.org

Warning Aliens About the Dangerous AI We Might Create

9+ weeks of mentored AI safety research in London | Pivotal Research Fellowship

I Read Red Heart and I Heart It

Miscellaneous observations about board games

Why to Commit to a Writing and Publishing Schedule

5 Things I Learned After 10 Days of Inkhaven

Response to "Taking AI Welfare Seriously": The Indirect Approach to Moral Patienthood

Do not hand off what you cannot pick up

Better than Baseline

How human-like do safe AI motivations need to be?

Teleosemantics & Swampman

How I Learned That I Don't Feel Companionate Love

Conceptual reasoning dataset v0.1 available (AI for AI safety/AI for philosophy)

Flirt like it’s predetermined

“Wait, feelings are supposed to be IN THE BODY?”

Fairly Breaking Ties Without Fair Coins

Kimi K2 Thinking

Dulling the Bite of Mere Addition

Not-A-Book Review: The Attractive Man (Dating Coach Service)

Don't Get One-Shotted

Learnings from the Zurich AI Safety Day

Announcing the Society of Teen Scientists

What is Happening in AI Governance?

Human Agency at Stake

Omniscience one bit at a time: Chapter 3

Evolution's Alignment Solution: Why Burnout Prevents Monsters

Thick practices for AI tools

The problem of graceful deference

See Your Word Count While You Write

On Stance

Breaking the Hedonic Rubber Band

Rejecting "Goodness" Does Not Mean Hammering The Defect Button

On the Normativity of Debate: A Discussion With Said Achmiz

Questioning the Requirements

France is ready to stand alone

Love is Willingness to do Violence

Turning Grey

The AI bubble covered in the Atlantic

A Simple Sing-along Solstice

Universal Basic Income in an AGI Future

Ternary plots are underrated

How likely is dangerous AI in the short term?

[Linkpost] Galaxy brain resistance

A pencil is not a pencil is not a pencil

The Open Strategy Dictator Game: An Experiment in Transparent Cooperation

DC/Maryland Secular Solstice

What I learned building a language-learning app

Andrej Karpathy on LLM cognitive deficits

Consciousness as a Distributed Ponzi Scheme

Maat - Intro Post

Variously Effective Altruism

Why does everything feel so urgent?

Omniscience one bit at a time: Chapter 2

Science Fiction Trail: The Compressed Universe

Social drives 1: “Sympathy Reward”, from compassion to dehumanization

Ontology for AI Cults and Cyber Egregores

From Vitalik: Galaxy brain resistance

The jailbreak argument against LLM values

The grapefruit juice effect

Against Powerful Text Editors

Duncan Sabien and Politics

The only important ASI timeline

Book Announcement: The Gentle Romance

Three Kinds Of Ontological Foundations

When does Claude sabotage code? An Agentic Misalignment follow-up

Manifest X DC Opening Benediction - Making Friends Along the Way

Relearning how to be human

Condensation

Omniscience one bit at a time: Chapter 1

We're Not The Center of the Moral Universe

Gradual Disempowerment Monthly Roundup #2

We're Already Living in a Sci-Fi World

AI hasn't seen widespread adoption because the labs are focusing on automating AI R&D

Heroic responsibility is morally neutral

The General Social Survey and the ACX Survey

There should be unicorns

One Shot Singalonging is an attitude, not a skill or song-difficulty-level

Where Our Engineering Education Went Wrong

A sonnet, a sestina, a villanelle

n-ary Huffman coding

Liouville's Theorem and the Second Law

Insofar As I Think LLMs "Don't Really Understand Things", What Do I Mean By That?

Why AC is cheap, but AC repair is a luxury

Myopia Mythology

Omniscaling to MNIST

Why can planarians regrow their bodies?

Unexpected Things that are People

A humanist critique of technological determinism

Bottom-Up: Principled Compression to Shrink LLMs

Five very good reasons to not write down literally every single thought you have

Review: Parsifal at the SF Opera

Escalation and perception

The Snaw

Augustine of Hippo's Handbook on Faith, Hope, and Love in Latin (or: Claude as Pandoc++)

Mourning a life without AI

Two Times I Was Surprised By My Own Values

On slack and akrasia

Anthropic & Dario’s dream

Against “You can just do things”

Agent Foundations: Paradigmatizing in Math and Science

Entity Review: Pythia

Announcing “Computational Functionalism Debate” (soliciting paid feedback): Test your intuitions about consciousness

AI Safety's Berkeley Bubble and the Allies We're Not Even Trying to Recruit

Start an AI safety group with the Pathfinder Fellowship

AI is not inevitable.

The Hawley-Blumenthal AI Risk Evaluation Act

Secular Solstice Roundup 2025

A country of alien idiots in a datacenter: AI progress and public alarm

On Sam Altman’s Second Conversation with Tyler Cowen

Plans to build AGI with nuclear reactor-like safety lack 'systematic thinking,' say researchers

13 Arguments About a Transition to Neuralese AIs

Open Letter to Ohio House Reps

Two easy digital intentionality practices

Is it really paranoia if I'm really Out to Get Me?

Did you know you can just buy blackbelts?

GPTF-8: A tokenizer-based character encoding

Cancer; A Crime Story (and other tales of optimization gone wrong)

Liberation Clippy

Minimizing Loss ≠ Maximizing Intelligence

Solstice Season Megameetups

My new nonprofit Evitable is hiring.

Willpower is exhausting, use content blockers

A review of MSUM's AI Innovation Summit: Day Two

Brutalist Prose

Can we do useful meta-analysis? Unjournal evaluations of "Meaningfully reducing consumption of meat... is an unsolved problem..."

Toward Statistical Mechanics Of Interfaces Under Selection Pressure

Sex, Drugs, and the Future of American Politics

What is the (LW) consensus on jump from qualia to self-awareness in AI?

OpenAI Does Not Appear to be Applying Watermarks Honestly

Genetic Enhancements of Color Qualia

Anticheat: a non-technical look without psychoanalysis

Hyperstition Progress Report, and, Any Ideas for Making AI-Generated Fiction Actually Good?

It is our responsibility to develop a healthy relationship with our technology

Debunking “When Prophecy Fails”

[Linkpost] How to Win Board Games

SPAR Spring ‘26 mentor apps open—now accepting biosecurity, AI welfare, and more!

AI #141: Give Us The Money

Fake media seems to be a fact of life now

Our ancestors didn't know their faces

Review: K-Pop Demon Hunters (2025)

A memo on Takeoff

announcing my modular coal startup

Halloween Tombstone Simulacra

Halfway to Anywhere

People Seem Funny In The Head About Subtle Signals

Why I think consciousness is fundamental rather than structural/neural.

Technical AI Safety Roles at Open Philanthropy

Career Advice for High Schoolers and College Students

A 2032 Takeoff Story

Continuous takeoff is a bad name

An ARENA 6.0 Capstone: Model Organism of Encoded Reasoning

Breaking Books: A tool to bring books to the social sphere

Digital minimalism is out, digital intentionality is in

Anthropic Commits To Model Weight Preservation

Living in the Shadow of The Sort

Hardening against AI takeover is difficult, but we should try

AI Safety at the Frontier: Paper Highlights of October 2025

New homepage for AI safety resources – AISafety.com redesign

An atheist's guide to prayer

Theory of Change for US Govt Whistleblower Database and Guide

Suffering is what makes it special

Maxwell's Demon and the Arrow of Time

A Guide To Being Persuasive About AI Dangers

Why Safety Constraints in LLMs Are Easily Breakable? Knowledge as a Network of Gated Circuits

Sable and Able: A Tale of Two ASIs

Dario Amodei's "Machines of Loving Grace" sounds incredibly dangerous, for Humans

What are you excited about doing?

Intentionality

Food-related things that have made my life a little better

Gerrymandering California

How to survive until AGI

Heroic Responsibility

[Linkpost] Competing Motivations: When More Incentives Lead To Less Effort

Not Over Or Under Indexed

Parsing Validation

A/B testing could lead LLMs to retain users instead of helping them

OpenAI: The Battle of the Board: Ilya’s Testimony

Berkeley Solstice Weekend

Modeling the geopolitics of AI development

Thoughts by a non-economist on AI and economics

AI Safety Camp 11

Keeping Ants and Spotting Queens

Letter to a close friend

Open-weight training practices and implications for CoT monitorability

Free Learning in Today’s Society: Some Personal Experiences and Reflections

A prayer for engaging in conflict

Rainbows, fractals, and crumpled paper: Hölder continuity

On tasting things

More notes on US govt whistleblower guide and DB

US Govt Whistleblower Guide

US Govt Whistleblower Database

The Mortifying Ordeal of Knowing Thyself

Build the life you actually want

Research Reflections

I ate bear fat with honey and salt flakes, to prove a point

Parleying with the Principled

The Zen Of Maxent As A Generalization Of Bayes Updates

Sam Altman's track record of manipulation: some quotes from Karen Hao's "Empire of AI"

Comparative advantage & AI

Just complaining about LLM sycophancy (filler episode)

The Tale of the Top-Tier Intellect

High-Resistance Systems to Change: Can a Political Strategy Apply to Personal Change?

Leaving Open Philanthropy, going to Anthropic

Red Heart

How Powerful AI Gets Cheap

The EU could hold AI capabilities development hostage if they wanted to

What's up with Anthropic predicting AGI by early 2027?

For Synthetic Situations

The Unreasonable Effectiveness of Fiction

Crime and Punishment #1

Solving a problem with mindware

Publishing academic papers on transformative AI is a nightmare

Trying to understand my own cognitive edge

Why you shouldn't write a blog post every day for a month

There's some chance oral herpes is pretty bad for you?

Spending Less by Doing More

Lack of Social Grace is a Lack of Skill

Live Conversational Threads: Not an AI Notetaker

A glimpse of the other side

Things I've Become More Confident About

How and why you should make your home smart (it's cheap and secure!)

Body Time and Daylight Savings Apologetics

Introducing the Hauntoloscope: a Counterfactual Timeline Generation Tool

Midwit-Savants Are Common

Are apples made of cells?

Halfhaven halftime

Ilya Sutskever Deposition Transcript

Donut Theory of Consciousness

Reason About Intelligence, Not AI

Human Values ≠ Goodness

A toy model of corrigibility

Doom from a Solution to the Alignment Problem

Ohio House Bill 469

I Wondered Why I Procrastinate Even On Things I Am "Passionate" About

Quick take: Economics of Intelligence 101

25 Que

My YC Pitch

Me consuming five different forms of media at once to minimize the chance of a thought occurring

A review of MSUM's AI Innovation Summit: Day One

FTL travel and scientific realism

Reflections on 4 years of meta-honesty

Better ways to do grayscale screens

2025 Unofficial LW Community Census, Request for Comments

Metformin 1000mg/day upon symptom onset may reduce your risk of long covid by 10-30%

Weak-To-Strong Generalization

Model welfare and open source

Why I Transitioned: A Case Study

Economics and Transformative AI (by Tom Cunningham)

Decision theory when you can't make decisions

You’re always stressed, your mind is always busy, you never have enough time

Re-rolling environment

Why Is Printing So Bad?

Some Meetups I Ran (2025 Q2)

Shouldn't taking over the world be easier than recursively self-improving, as an AI?

ACX Atlanta November Meetup

Seattle Secular Solstice 2025 – Dec 20th

Seattle Secular Solstice 2025 – Dec 20th

Fermi Paradox, Ethics and Astronomical waste

LLM-generated text is not testimony

Apply to the Cooperative AI PhD Fellowship by November 16th!

Vaccination against ASI

Strategy-Stealing Argument Against AI Dealmaking

Evidence on language model consciousness

Asking Paul Fussell for Writing Advice

Freewriting in my head, and overcoming the “twinge of starting”

2025 NYC Secular Solstice & East Coast Rationalist Megameetup

Supervillain Monologues Are Unrealistic

Secretly Loyal AIs: Threat Vectors and Mitigation Strategies

Ink without haven

Apply to the Cambridge ERA:AI Winter 2026 Fellowship

FAQ: Expert Survey on Progress in AI methodology

Social media feeds 'misaligned' when viewed through AI safety framework, show researchers

Crossword Halloween 2025: Manmade Horrors

Debugging Despair ~> A bet about Satisfaction and Values

Halfhaven Digest #3

OpenAI Moves To Complete Potentially The Largest Theft In Human History

A (bad) Definition of AGI

Resampling Conserves Redundancy & Mediation (Approximately) Under the Jensen-Shannon Divergence

Centralization begets stagnation

Summary and Comments on Anthropic's Pilot Sabotage Risk Report

AI #140: Trying To Hold The Line

AISLE discovered three new OpenSSL vulnerabilities

Sonnet 4.5's eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals

On The Conservation of Rights

When “HDMI-1” Lies To You

Why there is still one instance of Eliezer Yudkowsky?

What is LessWrong explicitly trying to optimize?(And what percentage of its structure currently serves that goal?)

Interview on the Hengshui Model High School

Emergent Introspective Awareness in Large Language Models

Genius is Not About Genius

Quotes on OpenAI's timelines to automated research, safety research, and safety collaborations before recursive self improvement

Why you shouldn't eat meat if you hate factory farming

Unsureism: The Rational Approach to Religious Uncertainty

The End of OpenAI’s Nonprofit Era

An intro to the Tensor Economics blog

Uncertain Updates: October 2025

AI Doomers Should Raise Hell

AISN #65: Measuring Automation and Superintelligence Moratorium Letter

TBC Episode with Max Harms - Red Heart and If Anyone Builds It, Everyone Dies

Thresholds for Pascal's Mugging?

Please Do Not Sell B30A Chips to China

Why Civilizations Are Unstable (And What This Means for AI Alignment)

Personal Debugging for Survival: How Can I Prevent Despair Without Imminent AI Death?

What can we learn from parent-child-alignment for AI?

Some data from LeelaPieceOdds

How can I prevent despair, without imminent AI death

A Very Simple Model of AI Dealmaking

300 por basil - And if I'm already in the shit, what's the most beautiful fart I can leave behind?

Workshop on Post-AGI Economics, Culture, and Governance

When Will AI Transform the Economy?

AI Craziness Mitigation Efforts

Introducing the Epoch Capabilities Index (ECI)

Mottes and Baileys in AI discourse

Temporarily Losing My Ego

The Memetics of AI Successionism

New 80,000 Hours problem profile on the risks of power-seeking AI

LLM robots can't pass butter (and they are having an existential crisis about it)

Call for mentors from AI Safety and academia. Sci.STEPS mentorship program

Heuristics for assessing how much of a bubble AI is in/will be

Q2 AI Benchmark Results: Pros Maintain Clear Lead

A Sketch of Helpfulness Theory With Equivocal Principals

All the lab's AI safety Plans: 2025 Edition

A Bayesian Explanation of Causal Models

Brainstorming Food on the Cheap+Healthy+Convenient+Edible Frontier

How Important is Inverting LLMs?

Would concentration of power be bad, given offense-dominant weapons?

Asking (Some Of) The Right Questions

life lessons from trading

Agentic Monitoring for AI Control

Model Parameters as a Steganographic Private Channel

Major survey on the HS/TS spectrum and gAyGP

Death of the Author

Exploring the multi-dimensional refusal subspace in reasoning models

AIs should also refuse to work on capabilities research

List of lists of project ideas in AI Safety

Uncommon Utilitarianism #3: Bounded Utility Functions

How valuable is money-in-market?

Credit goes to the presenter, not the inventor

On Fleshling Safety: A Debate by Klurl and Trapaucius.

Results of "Experiment on Bernoulli processes"

certain exotic neurotransmitters as SMART PILLS: or compounds that increase the capacity for mental work in humans

Cancer has a surprising amount of detail

Stability of natural latents in information theoretic terms

Lessons from Teaching Rationality to EAs in the Netherlands

Are We Their Chimps?

FWIW: What I noticed at a (Goenka) Vipassana retreat

Why are there no helpful infectious diseases?

Brightline is Actually Pretty Dangerous

Seven-ish Words from My Thought-Language

Remembrancy

Pygmalion's Wafer

Despair and information

Why is OpenAI releasing products like Sora and Atlas?

Origins and dangers of future AI capability denial

Do you completely trust that you are completely in the shit? - despair and information -

Assessing Far UVC Positioning

Musings on Reported Cost of Compute (Oct 2025)

Regardless of X, you can still just sign superintelligence-statement.org if you agree

The Future of Interpretability is Geometric

New Statement Calls For Not Building Superintelligence For Now

Highlights from Explaining AI Explainability

Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability

Dollars in political giving are less fungible than you might think

LW Reacts pack for Discord/Slack/etc

AI Timelines and Points of no return

Can we steer AI models toward safer actions by making these instrumentally useful?

Introducing ControlArena: A library for running AI control experiments

Plan 1 and Plan 2

Why I Don't Believe in True AGI

Guys I might be an e/acc

How transformers can compute distances along a curve locally.

How an AI company CEO could quietly take over the world

Worlds Where Iterative Design Succeeds?

Automated real time monitoring and orchestration of coding agents

Reminder: Morality is unsolved

The main way I've seen people turn ideologically crazy [Linkpost]

Empirical Partial Derivatives

Differences in Alignment Behaviour between Single-Agent and Multi-Agent AI Systems

Building a different kind of personal intelligence

Beliefs about formal methods and AI safety

Depressión and information - What my confidence level am I that I'm in absolute shit?

AI #139: The Overreach Machines

Should AI Developers Remove Discussion of AI Misalignment from AI Training Data?

Software Engineering at SecureBio

Is terminal lucidity real?

A Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring

LW Psychosis

Learning to Interpret Weight Differences in Language Models

AGI's Last Bottlenecks

Statement on Superintelligence - FLI Open Letter

Why AI alignment matters today

Any corrigibility naysayers outside of MIRI?

Which side of the AI safety community are you in?

Homomorphically encrypted consciousness and its implications

Consider donating to AI safety champion Scott Wiener

Postrationality: An Oral History

Penny's Hands

Is 90% of code at Anthropic being written by AIs?

LLM Self-Reference Language in Multilingual vs English-Centric Models

How Well Does RL Scale?

The Perpetual Technological Cage

Utopiography Interview

White House OSTP AI Deregulation Public Comment Period Ends Oct. 27

July-October 2025 Progress in Guaranteed Safe AI

In remembrance of Sonnet '3.6'

Stratified Utopia

Early stage goal-directednesss

On Dwarkesh Patel’s Podcast With Andrej Karpathy

Relearning Human Signaling — Experiments in Real-World Connection (Stockholm + virtual)

Samuel x Bhishma - Superintelligence by 2030?

Remarks on Bayesian studies from 1963

How the Human Lens Shapes Machine Minds

Symbiogenesis vs. Convergent Consequentialism

21st Century Civilization curriculum

Ramblings on the Self Indication Assumption

An epistemic theory of populism [link post to Joseph Heath]

EU explained in 10 minutes

"Tilakkhana", Gwern [poem]

Attending Your First Contra Dance in a Fragrance-Compliant Manner

How low can we go? Modern School Curricula

The If Anyone Builds It, Everyone Dies march assurance contract should indicate how many signatures it has received

A Mathematical Model of Alcor’s Economic Survival

Can you find the steganographically hidden message?

How cause-area specific conferences can strengthen the EA community

Secular Solstice: Bremen (Dec 13)

How Stuart Buck funded the replication crisis

Contra-Zombies? Contra-Zombies!: Chalmers as a parallel to Hume

Consider donating to Alex Bores, author of the RAISE Act

Bubble, Bubble, Toil and Trouble

Considerations around career costs of political donations

A Cup of Blue Tea

A Bayesian nightmare: Instagram and Sampling bias

Uncommon Utilitarianism #2: Positive Utilitarianism

Can you transfer a backdoor with subliminal learning?

Final-Exam-Tier Medical Problem With Handwavy Reasons We Can't Just Call A Licensed M.D.

Humanity Learned Almost Nothing From COVID-19

Frontier LLM Race/Sex Exchange Rates

Scenes, cliques and teams - a high level ontology of groups

Learnable Skills

Give Me Your Data: The Rationalist Mind Meld

The IABIED statement is not literally true

Libraries need more books

In defense of the goodness of ideas

Sample Interesting First

Comma v0.1 converted to GGUF

Using Bayes' Theorem to determine Optimal Protein Intake

Selected Graphics Showing Progress towards AGI

Networking for Spies: Translating a Cyrillic Text with Claude Code

How to survive in the storm of uncertainty (and shit)?

Space colonization and scientific discovery could be mandatory for successful defensive AI

Memory Decoding Journal Club: Functional connectomics reveals general wiring rule in mouse visual cortex

Meditation is dangerous

I handbound a book of Janus's essays for my girlfriend

The Dark Arts of Tokenization or: How I learned to start worrying and love LLMs' undecoded outputs

How To Vastly Increase Your Charitable Impact

Nontrivial pillars of IABIED

What Success Might Look Like

I’m an EA who benefitted from rationality

AI #138 Part 2: Watch Out For Documents

Mess AI – deliberate corruption of the training data to prevent superintelligence

Activation Plateaus: Where and How They Emerge

Can We Simulate Meiosis to Create Digital Gametes — and Are the Results Your Biological Offspring?

Steven Adler reports that NVIDIA is attempting to stifle pro-export-control speach

Book Review: To Explain the World

AISN#64: New AGI Definition and Senate Bill Would Establish Liability for AI Harms

Finding Features in Neural Networks with the Empirical NTK

Learning from the Luddites: Implications for a modern AI labour movement

Job Openings: SWE, PM, and Grants Coordinator to help improve grant-making

AI #138 Part 1: The People Demand Erotic Sycophants

Cheap Labour Everywhere

Quantum immortality and AI risk – the fate of a lonely survivor

Halfhaven Digest #2

Fragrance Free Confusion

The Three Levels of Agency

Electronics Mechanic -> AI Safety Researcher: A 30-Month Journey to Model Welfare

Some astral energy extraction methods

AI-202X-slowdown: can CoT-based AIs become capable of aligning the ASI?

Chemical Telescopes And The Process Of Science

Monthly Roundup #35: October 2025

Updating the name of Open Philanthropy's AI program

Open Global Investment: Comparisons and Critique

We are too comfortable with AI "magic"

Are calm introverts (like East Asians) uniquely suited for space travel & Mars missions?

Until the stars burn out? Assessing the stakes of AGI lock-in

It will cost you nothing to "bribe" a Utilitarian

How I Became a 5x Engineer with Claude Code

That Mad Olympiad

A New Global Risk: Large Comet’s Impact on Sun Could Cause Fires on Earth

Can LLMs Coordinate? A Simple Schelling Point Experiment

Minimal Prompt Induction of Self-Talk in Base LLMs

Humans Are Spiky (In an LLM World)

Gnashing of Teeth

Geometric Structure of Emergent Misalignment: Evidence for Multiple Independent Directions

Communism By Another Name

Situational Awareness as a Prompt for LLM Parasitism

Postrationality: An Oral History

The sum of its parts: composing AI control protocols

Why your boss isn’t worried about AI

Humanity AI Commits $500 million to AI and Democracy Protection, AI x Security, and more

Thinking Partners: Building AI-Powered Knowledge Management Systems

SS26 Color Stats

The Biochemical Beauty of Retatrutide: How GLP-1s Actually Work

My views on Lesswrong

Trade Escalation, Supply Chain Vulnerabilities and Rare Earth Metals

12 Angry Agents, or: A Plan for AI Empathy

The "Length" of "Horizons"

My Software Setup

Narcissism, Echoism, and Sovereignism: A 4-D Model of Personality

A personal take on why you should work at Forethought (maybe)

Discrete Generative Models

Survey Results: Far UVC and Glycol Vapors

How AI Manipulates—A Case Study

Recontextualization Mitigates Specification Gaming Without Modifying the Specification

AI Psychosis, with Tim Hua and Adele Lopez

What is Lesswrong good for?

Predictability is Underrated

The Mom Test for AI Extinction Scenarios

Is There a Sound Argument for Generality in AI?

Reasons to sign a statement to ban superintelligence (+ FAQ for those on the fence)

Water Above the Ocean

OpenAI #15: More on OpenAI’s Paranoid Lawfare Against Advocates of SB 53

The Thirteen-Circle Paradox

Pause House, Blackpool

Global vs. Local feedback

Sublinear Utility in Population and other Uncommon Utilitarianism

RiskiPedia

Don't Mock Yourself

Experiment: Test your priors on Bernoulli processes.

The Problem of Consciousness and AI as an Ethical Subject

Dr Evil & Realpolitik

How do we know when something is deserving of welfare?

The Narcissistic Spectrum

Non-copyability as a security feature

The Alignment Problem Isn't Theoretical

International Programme on AI Evaluations

If a Lioness Could Speak

How long do AI companies have to achieve significant capability gains before funding collapses?

You're not confused by thermodynamics

Subscribe to my Inkhaven feed!

The Most Common Bad Argument In These Parts

Experiments With Sonnet 4.5's Fiction

Letter to Heads of AI labs

Emil the Moose

Using complex polynomials to approximate arbitrary continuous functions

What does it feel like to understand?

The 5 Obstacles I Had to Overcome to Become Vegan

2025 State of AI Report and Predictions

Applications Open for a Weekend Exploring Civilisational Sanity

Maybe Use BioLMs To Mitigate Pre-ASI Biorisk?

The statement "IABIED" is true even if the book IABIED is mostly false

AI and Biological Risk: Forecasting Key Capability Thresholds

Why Future AIs will Require New Alignment Methods

Materialist Semiotics and the Nature of Qualia

Patience and Willingness to Be Slow

You should probably prepare for your chats to be fed to AI

We won’t get docile, brilliant AIs before we solve alignment

Labs lack the tools to course-correct

The Liberty Tractor

At odds with the unavoidable meta-message

Stars are a rounding error

Towards a Typology of Strange LLM Chains-of-Thought

Training Qwen-1.5B with a CoT legibility penalty

Would concentration of power be bad, given offense-dominant weapons?

Interview with a drone expert on the future of AI warfare

Investigating Neural Scaling Laws Emerging from Deep Data Structure

I take antidepressants. You’re welcome

Training fails to elicit subtle reasoning in current language models

Realistic Reward Hacking Induces Different and Deeper Misalignment

Why am I not currently starting a religion around AI or similar topics?

THE WISE, THE GOOD, THE POWERFUL

Hubris & Control

"Yes, and—" Requires the Possibility of "No, Because—"

Four Questions to Refine Your Policy Proposal

A Snippet On The Epistemically Hygienic Containment Of Faith-In-Reason-Itself

Alignment progress doesn’t compensate for higher capabilities

The Necronomicon Is Already Read

Biouploading: Preserving My Living Neurons and Connectome as a Spatially Distributed Mesh

self reflections of a striver

Hospitalization: A Review

AI #137: An OpenAI App For That

Are We Leaving Literature To The Psychotic?

Lessons from the Mountains

Probabilistic Societies

Inverting the Most Forbidden Technique: What happens when we train LLMs to lie detectably?

NEPA, Permitting and Energy Roundup #2

What shapes does reasoning take but circular?

The Oracle's Gift

Thinking Mathematically - Convergent Sequences

The Relationship Between Social Punishment and Shared Maps

IABIED: Paradigm Confusion and Overconfidence

The Wise Baboon of Loyalty

Spooky Collusion at a Distance with Superrational AI

The Architecture of the Narcissistic False Self

Reflections on The Curve 2025

2025-10-12 - London rationalish meetup - Periscope

Plans A, B, C, and D for misalignment risk

Three Paths Through Manifold

Halfhaven Digest #1

The "cool idea" bias

Irresponsible Companies Can Be Made of Responsible Employees

Heaven, Hell, and Mechanics

10 Ways to Waste a Decade

You Should Get a Reusable Mask

Replacing RL w/ Parameter-based Evolutionary Strategies

Intent alignment seems incoherent

The Tower of Babel in Reverse

Petri: An open-source auditing tool to accelerate AI safety research

Bending The Curve

Kairos is hiring: Founding Generalist & SPAR Contractor

Messy on Purpose: Part 2 of A Conservative Vision for the Future

Going Phoneless

The Alignment Paradox: Why Transparency Can Breed Deception

Notes on "Homology, Genes and Evolutionary Innovation"

Research Robots: When AIs Experiment on Us

Top Warning Signs Your Friends are Being Oneshotted By AI

LLMs as a limiter of social intercourse

Generalization and the Multiple Stage Fallacy?

Telling the Difference Between Memories & Logical Guesses

Notes from European Progress Conference

"Intelligence" -> "Relentless, Creative Resourcefulness"

Chaos Alone is No Bar to Superintelligence

We won’t get AIs smart enough to solve alignment but too dumb to rebel

Notes on the need to lose

Excerpts from my neuroscience to-do list

Experience Report - ML4Good Bootcamp Singapore, Sep'25

Gradual Disempowerment Monthly Roundup

Subliminal Learning, the Lottery-Ticket Hypothesis, and Mode Connectivity

The Origami Men

Medical Roundup #5

Why I think ECL shouldn’t make you update your cause prio

Did Tyler Robinson carry his rifle as claimed by the government?

AI Science Companies: Evidence AGI Is Near

LLMs one-box when in a "hostile telepath" version of Newcomb's Paradox, except for the one that beat the predictor

Alignment Faking Demo for Congressional Staffers

Do Things for as Many Reasons as Possible

One Does Not Simply Walk Away from Omelas

The quotation mark

The Sadism Spectrum and How to Access It

Maybe social media algorithms don't suck

Base64Bench: How good are LLMs at base64, and why care about it?

Behavior Best-of-N achieves Near Human Performance on Computer Tasks

How likely are “s-risks” (large-scale suffering outcomes) from unaligned AI compared to extinction risks?

LLMs are badly misaligned

The Counterfactual Quiet AGI Timeline

AISafety.com Reading Group session 328

Open Philanthropy’s Biosecurity and Pandemic Preparedness Team Is Hiring and Seeking New Grantees

$250 bounties for the best short stories set in our near future world & Brooklyn event to select them

How the NanoGPT Speedrun WR dropped by 20% in 3 months

a quick thought about AI alignment

Making Your Pain Worse can Get You What You Want

Markets in Democracy: What happens when you can sell your vote?

What I've Learnt About How to Sleep

The 'Magic' of LLMs: The Function of Language

To my client having a sudden ‘bad day’:

Consider Doing Small Walks at Work

Where does Sonnet 4.5's desire to "not get too comfortable" come from?

Goodness is harder to achieve than competence

Good is a smaller target than smart

Making Sense of Consciousness Part 6: Perceptions of Disembodiment

Recent AI Experiences