LessWrong
Monthly Roundup #37: December 2026
Information in circulation is self-organised critical. Small changes in environment can make large, discontinuous changes in the information space.
The Fly Farm
New 80k problem profile: extreme power concentration
The point of view of the universe
The Fantastic Piece of Tinfoil in my Wallet
AISN #66: Evaluating Frontier Models, New Gemini and Claude, Preemption is Back
Annals of Counterfactual Han
Does dissolving newcomb's paradox matter?
AI performance has surpassed a human baseline on ARC-AGI-2
Designing the World’s Safest AI based on Morality Models
ASI Already Knows About Torture - In Defense of Talking Openly About S-Risks
Cognitive Tech from Algorithmic Information Theory
Announcing Progress in Medicine, a high school summer career exploration program
Weird Generalization & Inductive Backdoors
The tree, the fly, the ant, the dog, the farmer and the businessman
Ships in the Night – A Short Story
If Anyone Builds It Everyone Dies, another semi-outsider review
North Sentinelese Post-Singularity
Flock – work in public with friends (beta testers wanted)
AI #146: Chipping In
Sea snails in a cocaine vaccine
Resources for parents
Steganographic Chains of Thought Are Low-Probability but High-Stakes: Evidence and Arguments
Brain-inspired LLM alignment
Seven Perspectives on LLMs
MIRI Comms is hiring
Some evidence against the idea strange CoT stems from incentives to compress language
Follow-through on Bay Solstice
Rock Paper Scissors is Not Solved, In Practice
Childhood and Education #15: Got To Get Out
Apply to ESPR & PAIR 2026, Rationality and AI Camps for Ages 16-21
Evaluation as a (Cooperation-Enabling?) Tool
Consider calling the NY governor about the RAISE Act
No ghost in the machine
Most Algorithmic Progress is Data Progress [Linkpost]
Fibonacci Holds Information
Register for SPAR Demo Day on Saturday, Dec 13
We don't know what most microbial genes do. Can genomic language models help?
Artifacts I'd like to try
AI Safety – Analyse Affordances
Caesar Derangement Syndrome
How to have a debate on this platform?
Living on a ball of hair
The funding conversation we left unfinished
Do you expect the first AI to cross NY's RAISE Act's "Critical Harm" threshold to be contained?
TT Self Study Journal # 5
Lorxus Does Halfhaven: 11/29, 11/30, Highlights, Postmortem
Tristan's list of things to write
Tate Modern 2150
Selling H200s to China Is Unwise and Unpopular
Non-optimized beauty
Localizing Finetuned Information in Transformers with Dynamic Weight Grafting
[paper] Auditing Games for Sandbagging
Insights into Claude Opus 4.5 from Pokémon
Gradual Disempowerment Monthly Roundup #3
Every house has a chemistry lab
Ways we can fail to answer
Do you take joy in effective altruism?
My experience running a 100k
Seriously, use text expansions
The reverse sear as a worthwhile life skill
Every point of intervention
D&D Sci Thanksgiving: the Festival Feast Evaluation & Ruleset
Towards a Categorization of Adlerian Excuses
A Falsifiable Causal Argument for Substrate Independence
Gödel's Ontological Proof
Prompting Models to Obfuscate Their CoT
High-level approaches to rigor in interpretability
Human Dignity: a review
A few quick thoughts on measuring disempowerment
How Stealth Works
I have hope
"Carve Reality at Its Joints" is poetry
The Possibility of an Ongoing Moral Catastrophe
Zen Wisdom, Diffused
[Paper] Does Self-Evaluation Enable Wireheading in Language Models?
Algorithmic thermodynamics and three types of optimization
Little Echo
From Barriers to Alignment to the First Formal Corrigibility Guarantees
Scaling what used not to scale
The effectiveness of systematic thinking
I said hello and greeted 1,000 people at 5am this morning
Your Digital Footprint Could Make You Unemployable
2025 Unofficial LessWrong Census/Survey
AI in 2025: gestalt
[Linkpost] Theory and AI Alignment (Scott Aaronson)
Lawyers are uniquely well-placed to resist AI job automation
How a bug of AI hardware may become a feature for AI governance
Karlsruhe - If Anyone Builds It, Everyone Dies
Eliezer's Unteachable Methods of Sanity
Ordering Pizza Ahead While Driving
Existential despair, with hope
I Need Your Help
Crazy ideas in AI Safety part 1: Narrow-Defined Communication
The corrigibility basin of attraction is a misleading gloss
LW Transcendence
The Adequacy of Class Separation
Answering a child's questions
AI Mood Ring: A Window Into LLM Emotions
Critical Meditation Theory
Tools, Agents, and Sycophantic Things
What Happens When You Train Models on False Facts?
why america can't build ships
Reasons to care about Canary Strings
An AI-2027-like analysis of humans' goals and ethics with conservative results
Management of Substrate-Sensitive AI Capabilities (MoSSAIC) Part 3: Resolution
Announcing: Agent Foundations 2026 at CMU
DeepSeek v3.2 Is Okay And Cheap But Slow
Who is AGI for, and who benefits from AGI?
Journalist's inquiry into a core organiser breaking his nonviolence commitment and leaving Stop AI
Try Training SAEs with RLAIF
On the Aesthetic of Wizard Power
Will misaligned AIs know that they're misaligned?
Thresholding
An Abstract Arsenal: Future Tokens in Claude Skills
Cross Layer Transcoders for the Qwen3 LLM Family
Management of Substrate-Sensitive AI Capabilities (MoSSAIC) Part 2: Conflict
Livestream for Bay Secular Solstice
Center on Long-Term Risk: Annual Review & Fundraiser 2025
Power Overwhelming: dissecting the $1.5T AI revenue shortfall
AI #145: You’ve Got Soul
Is Friendly AI an Attractor? Self-Reports from 22 Models Say Probably Not
Modelling Trajectories - Interim results
Emergent Machine Ethics: A Foundational Research Framework for the Intelligence Symbiosis Paradigm
Help us find founders for new AI safety projects
Do we have terminology for "heuristic utilitarianism" as opposed to classical act utilitarianism or formal rule utilitarianism?
What is the most impressive game an LLM can implement from scratch?
Epistemology of Romance, Part 2
Front-Load Giving Because of Anthropic Donors?
Center for Reducing Suffering (CRS) S-Risk Introductory Fellowship applications are open!
An AI Capability Threshold for Funding a UBI (Even If No New Jobs Are Created)
Categorizing Selection Effects
Blog post: how important is the most spec if alignment fails?
Beating China to ASI
Management of Substrate-Sensitive AI Capabilities (MoSSAIC) Part 1: Exposition
Embedded Universal Predictive Intelligence
Human-AI identity coupling is emergent
On Dwarkesh Patel’s Second Interview With Ilya Sutskever
A Critique of Yudkowsky’s Protein Folding Heuristic
Recollection of a Dinner Party
Formalizing Newcombian problems with fuzzy infra-Bayesianism
Proof section to formalizing Newcombian problems with fuzzy infra-Bayesianism
Human art in a post-AI world should be strange
It's tricky to tell what % of the economy the state controls
I'm Skeptical of and Confused About The Multiplier in Macroeconomics
The AI Society Lens
Effective Pizzaism
LLMs Don't Have Superhuman Research Taste: Tastybench
AI Safety at the Frontier: Paper Highlights of November 2025
Five ways AI can tell you're testing it
Why Moloch is actually the God of Evolutionary Prisoner’s Dilemmas
Reward Mismatches in RL Cause Emergent Misalignment
Sci.STEPS invites mentee applications
How Claude Opus 4.5 describes its experience of various concepts
Safety Cases Explained: How to Argue an AI is Safe
The Hidden Asymmetry in Personal Preparedness: Early Costs, Late Losses
Halfhaven Digest 6 + Retrospective
Metric-haven (quick stats on how Inkhaven impacted LessWrong)
MIRI’s 2025 Fundraiser
Everyone Can Be High Status In Utopia
How to Write Fast, Weird, and Well
The Overthinking Problem in Stroke Recovery
Future Proofing Solstice
High learning rate can lead to depression
Announcing: OpenAI's Alignment Research Blog
AI Mental Health Chatbots for Low-Resource Settings: A Prioritization Framework
Which planet is closest to the Earth, and why is it Mercury?
How middle powers may prevent the development of artificial superintelligence
Becoming a Chinese Room
Well, Seasons Greatings Everyone! [Short Fiction]
23 thoughts on Artificial Intelligence (2025)
Lorxus Does Halfhaven: 11/22~11/28
Would ASI development in non-party states undermine a nonproliferation agreement?
Claude Opus 4.5 Is The Best Model Available
HERMES: Towards Efficient and Verifiable Mathematical Reasoning in LLMs
Alignment as an Evaluation Problem
Interview: What it's like to be a bat
A Statistical Analysis of Inkhaven
How To Deploy an AI
10 Aphorisms from 𝘛𝘩𝘦 𝘉𝘦𝘥 𝘰𝘧 𝘗𝘳𝘰𝘤𝘳𝘶𝘴𝘵𝘦𝘴
Inkhaven: 30 Days, 30 Memories
November Retrospective
Video quality is mainly not determined by resolution
Models of Mind: Are You Mono- or Multi-Threaded?
Inverting qualia with group theory
What makes a good egg?
Soul-Whore
Choose Your Failure Modes
Human Agency in a Superintelligent World
Inkhaven Retrospective
Hyperstition
Explosive Skill Acquisition
The Comet King (solstice reading)
My Three Years Wasted by University
A Blogger's Guide To The 21st Century
Alphabetical Conundra Vol 2.
Unconferences: A Better Way to Run Meetups
Ben's 10 Tips for Event Feedback Forms
Does SI Disfavor Computationalism?
14 Concerns About Immortality
Flashcards are worth it
Insulin Resistance and Glycemic Index
The Moonrise Problem
I wrote a blog post every day for a month, and all I got was this lousy collection of incoherent ramblings
Uncertain Updates: November 2025
Reflections on my woo-aversion
Manifest x DC After Action Report
Why do some people prefer gifts to money?
Silicon Morality Plays: The Hyperstition Progress Report
Slop and Beauty and Infinite Power
Scientists make sense of shapes in the minds of the models
Can We Secure AI With Formal Methods? November-December 2025
The Joke
I gave LLMs emotional damage
A Harried Meeting
Change My Mind: The Rationalist Community is a Gift Economy
Epistemology of Romance, Part 1
Unless its governance changes, Anthropic is untrustworthy
Claude 4.5 Opus' Soul Document
El Dorado
The Missing Genre: Heroic Parenthood - You can have kids and still punch the sun
The NPC → MC Spectrum
Symbolic Regression, Sparsification, and Kolmogorov-Arnold Networks
Not A Love Letter, But A Thank You Letter
College life with short AGI timelines
Why Abortion Prohibition is Dysgenic
Opus 4.5 is funny
Some notes on supply, demand, cost and utility
Drugs Aren't A Moral Category
Tests of LLM introspection need to rule out causal bypassing
‘Safe by Design’: A Speculative Paradigm for Positive AI Development
Claude Opus 4.5: Model Card, Alignment and Safety
The Best Lack All Conviction: A Confusing Day in the AI Village
Should you work with evil people?
Seemingly Irrational Voting
Where I Am Donating in 2025
The Responder
A Taxonomy of Bugs (Lists)
You Are Much More Salient To Yourself Than To Everyone Else
The Power of Purple: DnD Design Lessons from a 4-Year-Old
Bitter Lessons from
Distillation Robustifies Unlearning
A Thanksgiving Memory
Legitimate Deliberation
Incoherent rambling about preference utilitarism
Tristan Harris Interview on AI Safety by Steven Bartlett (aka Diary Of A CEO)
AI #144: Thanks For the Models
Robust Software Isn't About Error Handling
The First Thanksgiving
P-hacking as focusing a microscope
Will We Get Alignment by Default? — with Adrià Garriga-Alonso
Alignment Bootstrapping Is Dangerous
Is there an analogue of Riemann's mapping theorem for split complex numbers, or otherwise?
The Big Nonprofits Post 2025
How Reducing Cognitive Interference Could Revolutionize Stroke Recovery
The crux on consciousness
Model Weight Preservation is not enough
AI Teddy Bears: A Brief Investigation
Despair, Serenity, Song and Nobility in "Hollow Knight: Silksong"
Information Hygiene
On Impact Certificates
Why Wouldn't A Rationalist Be Rational?
To write well; first, experience.
What it feels like to be enthusiastically time blind
A Technical Introduction to Solomonoff Induction
without K-Complexity
Just explain it to someone
Architectures for Increased Externalisation of Reasoning
Courtship Confusions Post-Slutcon
The Big Nonprofits Post 2025
Making Sense of Consciousness Part 8: Summing Up
Management of Substrate-Sensitive AI Capabilities (MoSSAIC) Part 0: Overture
Subliminal Learning Across Models
ARENA 6.0 Impact Report
Postmodernism for STEM Types: A Clear-Language Guide to Conflict Theory
Training Models to Detect Activation Steering: Results and Implications
For A Short Period Of Time, You Can Save 21,000 Shrimp Per Dollar
AI discourse analyzed (we looked at essays, Twitter, Bluesky, Truth Social)
Quotes on AI and wisdom
Preference gaps as a safeguard against AI self-replication
Minor Wording Changes Produce Major Shifts in AI Behavior
A Clarifying Characterization of the Hard Problem of Consciousness
[Linkpost] "Higher glass transition temperatures reduce thermal stress cracking in aqueous solutions relevant to cryopreservation"
Neural Annealing: Directing Psychedelic Trips Towards Healing
Autostructures: Craftsmanship in the Age of Vibes.
Meetup Tip: The Second Organizer
Principles and Generators of a Rationality Dojo
Well-Being & Well-Working: 8 Ways I Get Things Done
Spatially distributed consciousness is not an abstract thought experiment if AI is conscious
Snippets on Living In Reality
Evolution & Freedom
Reasons Why I Cannot Sleep
Should we extend moral patienthood to LLMs?
Training PhD Students to be Fat Newts (Part 2)
Things I wish I knew to save GPU minutes on Llama 405b model (and other beasts)
Three positive updates I made about technical grantmaking at Coefficient Giving (fka Open Phil)
Want a single job to serve many AI safety projects? Ashgro is hiring an Operations Associate
Beware boolean disagreements
EA ITT: An attempt
The Economics of Replacing Call Center Workers With AIs
Evaluating honesty and lie detection techniques on a diverse suite of dishonest models
HTTP402: musings about an ad-free internet
ChatGPT 5.1 Codex Max
Studying Mechanistic of Alignment Faking in Llama-3.1-405B
Security Complacency Meets Frontier AI: The Coming Collapse of ‘Secure by Apathy’
Avoid Fooling Yourself By Believing Two Opposing Things At Once
Alignment will happen by default. What’s next?
Ruby's Inkhaven Retrospective
Against Making the Same Mistake Twice
Training PhD Students to be Fat Newts (Part 1)
How to love Battlefield as much as I do
The reasonable effectiveness of mathematics
Toy Models of Superposition in the dense regime
The Ease Disease
Takeaways from the Eleos Conference on AI Consciousness and Welfare
Is Rationalism a Religion
Market Truth
Making Sense of Consciousness Part 7: Volition
A One-Minute ADHD Test
Where is Online?
AI Safety Interventions
Thou art rainbow: Consciousness as a Self-Referential Physical Process
On morality, defection-robustness, and legibility incentives
Where is Online?
What would an IRB-like policy for AI experiments look like?
The Coalition
OpenAI finetuning metrics:
What is going on with the loss curves?
Alignment may be localized: a short (and albeitly limited) experiment
Maybe Insensitive Functions are a Natural Ontology Generator?
ACX Atlanta December Meetup
The Penicillin Myth
Gemini 3 Pro Is a Vast Intelligence With No Spine
Formal confinment prototype
On negotiated settlements vs conflict with misaligned AGI
NATO is dangerously unaware of its military vulnerability
The LessWrong Team Was Selling Dollars For 86 Cents
Want a single job to serve many AI safety projects? Ashgro is hiring an Operations Associate
I am a rationalist
Continuity
Inkhaven Retrospective
Androgenetic haploid selection
How to Talk to Journalists
Formality
Why Talk to Journalists
I made a tool for learning absolute pitch as an adult
"Self-esteem" is distortionary
Rationalist Techno-Buddhist Jargon 1.0
Finding the uncertainty vector in GPT2-scale transformers
Stop Applying And Get To Work
Halfhaven Digest #5
Emotions, Fabricated
I'll be sad to lose the puzzles
Show Review: Masquerade
If you cannot be good, at least be bad correctly
Please Measure Verification Burden
Solstice Singalong Watch Party
Busking Practice
The Enemy Gets The Last Hit
A list of people who could’ve started a nuclear war, but chose not to
Traditional Food
Memories of a British Boarding School #2.5
Dipole Nature
What kind of person is DeepSeek's founder, Liang Wenfeng?
An answer from his old university classmate.
Comment on Natural Emergent Misalignment Paper by Anthropic
How to throw parties
Stream of Consciousness as a Scaffolding Skill
Literacy is Decreasing Among the Intellectual Class
Market Logic II
You can just do things
Easy vs Hard Emotional Vulnerability
Why your sports car isn't a racecar (tradeoffs everywhere)
Assorted Thoughts on "Pivoting" to AI
OpenAI Locks Down San Francisco Offices Following Alleged Threat From Activist
Sorry, I still think kidney donation makes no sense for an EA
Automatic alt text generation
My frustrations: AI doom
Introspection in LLMs: A Proposal For How To Think About It, And Test For It
AI Red Lines: A Research Agenda
Book Review: Wizard's Hall
Market Logic I
Animal welfare concerns are dominated by post-ASI futures
Habitual mental motions might explain why people are content to get old and die
D&D.Sci Thanksgiving: the Festival Feast
Diplomacy during AI takeoff
Easy Opportunity to Help Many Animals
Why Not Just Train For Interpretability?
Complaining about my inability to focus on uninteresting things
Models not making it clear when they're roleplaying seems like a fairly big issue
Natural Emergent Misalignment from Reward Hacking
We won't solve non-alignment problems by doing research
Can Artificial Intelligence Be Conscious?
Gemini 3: Model Card and Safety Framework Report
Lorxus Does Halfhaven: 11/15~11/21
URGENT @everyone - help us kill AI preemption (again) before this Friday
EA Hotel Solstice
Why Does Empathy Have an Off-Switch?
What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village
Let Us Stop the Train Before it Crashes! 🙏
Should I Apply to a 3.5% Acceptance-Rate Fellowship? A Simple EV Calculator
Towards Humanist Superintelligence
16 Writing Tips from Inkhaven
Reading My Diary: 10 Years Since CFAR
10 Key Insights from the “Frontier AI Risk Monitoring Platform”
Contra Collisteru: You Get About One Carthage
Rescuing truth in mathematics from the Liar's Paradox using fuzzy logic
Preferences are confusing
Suicide Prevention Ought To Be Illegal
PSA: For Chronic Infections, Check Teeth
[Paper] Output Supervision Can Obfuscate the CoT
The Boring Part of Bell Labs
What the term "Mass Communication" gestures at
Dominance: The Standard Everyday Solution To Akrasia
Do One Neat Thing vs. Get Work Done
Gemini 3 is Evaluation-Paranoid and Contaminated
Evaluating and Understanding Scheming Propensity
Try seeing art
AI #143: Everything, Everywhere, All At Once
Thinking about reasoning models made me less worried about scheming
Coming soon
July 2024 Welcome Edition
August 2024 Progress in Guaranteed Safe AI
September 2024 Progress in Guaranteed Safe AI
October 2024 Progress in Guaranteed Safe AI
November-December 2024 Progress in Guaranteed Safe AI
January-February 2025 Progress in Guaranteed Safe AI
March-April 2025 Progress in Guaranteed Safe AI
May-June 2025 Progress in Guaranteed Safe AI
July-October 2025 Progress in Guaranteed Safe AI
Restricting Dangerous Research: Has It Worked Before, and Could It Work for AI?
Should we shun the legibly evil?
Rumored Trump EO
What would adults in the room know about AI risk?
10 Wrong and Dumb Grammar Rules
My burnout journey
What Is The Basin Of Convergence For Kelly Betting?
Exclusive: Here's the draft Trump executive order on AI preemption
How critical is ASML to GPU progress?
In Defense of Goodness
Preventing covert ASI development in countries within our agreement
A review of Red Heart, the new alignment novel by Max Harms
Monthly Roundup #36: November 2025
MLSN #17: Measuring General AI Abilities and Mitigating Deception
Review: The Most Dangerous Writing App
Dense reconstruction is the scaffold of machine learning
Better Writing Through Claude
I give up.
The Bughouse Effect
Memories of a British Boarding School #2
On Wanting
Automate, automate it all
My Ethical Conundrum Around Writing About Meditation
A day in the life of a LW developer
An antibiotic for parasitic AI
Against Money Maximalism
Error Bars as Degrees of Belief
How the aliens next door shower
Anthropic is (probably) not meeting its RSP security commitments
Considerations for setting the FLOP thresholds in our example international AI agreement
Jailbreaking AI models to Phish Elderly Victims
Beren's Essay on Obedience and Alignment
Why are FICO scores effective?
The Aura of a Dark Lord
New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence
Sign language as a generally-useful means of communication (even if you have good hearing)
Victor Taelin's notes on Gemini 3
On Writing #2
GPT 5.1 Follows Custom Instructions and Glazes
ARC progress update: Competing with sampling
Status Is The Game Of The Losers' Bracket
Reconstellation: construct a flywheel for personal change
Kairos is the new home for the Global Challenges Project, and we’re hiring for a GCP Director
The Illegible Chain-of-Thought Menagerie
An urgent call for better Risk Modelling
Eat The Richtext
Memories of a British Boarding School #1
Preference Weighting and the Abilene Paradox
Continuity
How Colds Spread
Small batches and the mythical single piece flow
I store some memories spatially and I don't know why
An Analogue Of Set Relationships For Distribution
No One Reads the Original Work
Thoughts and experiences on using AI for learning
Middlemen Are Eating the World (And That's Good, Actually)
Are there examples of communities where AI is making epistemics better now?
Generalisation Hacking: a first look at adversarial generalisation failures in deliberative alignment
Varieties Of Doom
Omniscience one bit at a time: Chapter 5
AI 2025 - Last Shipmas
A Market of Whispering Earrings
Knowing Whether AI Alignment Is a One-Shot Problem Is a One-Shot Problem
Lobsang's Children
50 Shades of Red
75 and 750 Words on Legal Personhood
Considerations regarding being nice to AIs
Human behavior is an intuition-pump for AI risk
On Comparative Advantage & AGI
Sunnyvale EA/LW/ACX meetup
Approximating arbitrary complex-valued continuous functions
Mnemonic Exposition
In-Context Writing with Sonnet 4.5
Mediators: a different route through conflict
Video games are philosophy's playground
Why is American mass-market tea so terrible?
Mixed Feelings on Social Munchkinry
What the new generation of AI believers sees
What would my 12-year-old self think of agent foundations?
A year-long journey of getting fast internet in Paris
The Control System Going Out of Control
Why So Much Moloch?
Rethinking everything
Where is the Capital? An Overview
Don't defer to people when they're clearly lying
[Part 1] Why are Silicon Valley people so optimistic about AI?
Process Crimes and Pedantic Rules
Omniscience one bit at a time: Chapter 4
Arrows of time and space
I Spent 30 Days Learning to Smile More Charismatically
The new Pluribus TV show is a great and unusual analogy for AI.
Considering the Relevance of Computational Uncertainty for AI Safety
The Badness of Death in Different Metaethical Theories
Brand New Experience Salesman
7 Vicious Vices of Rationalists
Support the Movement against AI extinction risk
Finding My Internal Compass, Literally
Now, “Tethering” is a Bigger Educational Problem Than “Involution”
The Ambiguity Of "Human Values" Is A Feature, Not A Bug
Racing For AI Safety™ was always a bad idea, right?
AI safety undervalues founders
Sharpening Your Map: Introducing Calibrate
Why does ChatGPT think mammoths were alive December?
The skills and physics of high-performance driving, Pt. 1
Your Clone Wants to Kill You Because You Assumed Too Much
Writing Hack: Write It Just Like That
AI loves octopuses
Punctuation & Quotation Conventions
Matrices map between biproducts
Don't use the phrase "human values"
Generative AI and the P=NP problem
Halfway there; on desperation management
"Middlemarch" is inane and also one of my favorite books
Just Another Five Minutes
Same cognitive paints, exceedingly different mental pictures
A Love Song to Nicotine
Private Latent Notation and AI-Human Alignment
On Battle-Short: What, How, and Why Not To
The Flaw in the Paperclip Maximizer Thought Experiment
"But You'd Like To Feel Companionate Love, Right? ... Right?"
Generation Ship: A Protest Song For PauseAI
Will AI systems drift into misalignment?
Everyday Clean Air
Some Sun Tsu quotes sound like they're actually about debates/epistemics
What are your impossible problems?
Prediction markets for social deduction games without weird incentives
List of great filk songs
a sketch of how we might go about getting basins of corrigibility from RL
Lambda Calculus Prior
AI Craziness: Additional Suicide Lawsuits and The Fate of GPT-4o
Understanding and Controlling LLM Generalization
Lorxus Does Halfhaven: 11/08~11/14
From Anthony: Control Inversion
LLM would have said this better, and without all these typos too
The Charge of the Hobby Horse
The Eightfold Path To Enlightened Disagreement
10 Types of LessWrong Post
Creditworthiness should not be for sale
Everyone has a plan until they get lied to the face
Notes on the book "Talent"
How do you read Less Wrong?
Halfhaven Digest #4
AI Corrigibility Debate: Max Harms vs. Jeremy Gillen
Types of systems that could be useful for agent foundations
The rare, deadly virus lurking in the Southwest US, and the bigger picture
Tell people as early as possible it's not going to work out
Questioning Computationalism
Orient Speed in the 21st Century
Evaluation Avoidance: How Humans and AIs Hack Reward by Disabling Evaluation Instead of Gaming Metrics
Self-interpretability: LLMs can describe complex internal processes that drive their decisions
(Fantasy) -> (Planning): A Core Mental Move For Agentic Humans?
How does one tell apart results in ethics and decision theory?
Handover to AI R&D Agents - relevant research?
Supervised fine-tuning as a method for training-based AI control
Perhaps you should suspect me as well
The Transformer and the Hash
just another potential man
Low-Temperature Evaluations Can Mask Critical AI Behaviors
Epistemic Spot Check: Expected Value of Donating to Alex Bores's Congressional Campaign
Weight-sparse transformers have interpretable circuits
AI #142: Common Ground
Mortgage houses not land?
ClaudoBiography: The Unauthorized Autobiography of Claude, or: The Life of Claude and of His Fortunes and Adversities
Paranoia rules everything around me
8 Questions for the Future of Inkhaven
Strategically Procrastinate as an Anti-Rabbit-Hole Strategy
Favorite quotes from "High Output Management"
Turing-Complete vs Turing-Universal
Are AI time horizons inherently superexponential?
Meetup Tip: Food
Two can keep a secret if one is dead. So please share everything with at least one person.
Utilitarian inequality metrics
Being The Target Demographic
Lorxus Favors: An Experiment in Self-Backed Giftlike Macroeconomics (+ Extra Bits)
A bad review != a bad book
The Pope Offers Wisdom
Why Truth First?
Bitcoin Halvings and the Trisolaran Mistake: When External Actors Masquerade as Natural Laws
Social drives 2: “Approval Reward”, from norm-enforcement to status-seeking
OpenAI Releases GPT 5.1
Is SGD capabilities research positive?
Lighthaven-ish Ticket Strategy: Three Pillars of FOMO
Personal Account: To the Muck and the Mire
We live in the luckiest timeline
AI for Safety & Science Nodes in Berlin & the Bay Area
Reflections on being Sorted
Lorxus Does Halfhaven: 11/01~11/07
Undissolvable Problems: things that still confuse me
Introducing faruvc.org
Warning Aliens About the Dangerous AI We Might Create
9+ weeks of mentored AI safety research in London | Pivotal Research Fellowship
I Read Red Heart and I Heart It
Miscellaneous observations about board games
Why to Commit to a Writing and Publishing Schedule
5 Things I Learned After 10 Days of Inkhaven
Response to "Taking AI Welfare Seriously": The Indirect Approach to Moral Patienthood
Do not hand off what you cannot pick up
Better than Baseline
How human-like do safe AI motivations need to be?
Teleosemantics & Swampman
How I Learned That I Don't Feel Companionate Love
Conceptual reasoning dataset v0.1 available (AI for AI safety/AI for philosophy)
Flirt like it’s predetermined
“Wait, feelings are supposed to be IN THE BODY?”
Fairly Breaking Ties Without Fair Coins
Kimi K2 Thinking
Dulling the Bite of Mere Addition
Not-A-Book Review: The Attractive Man (Dating Coach Service)
Don't Get One-Shotted
Learnings from the Zurich AI Safety Day
Announcing the Society of Teen Scientists
What is Happening in AI Governance?
Human Agency at Stake
Omniscience one bit at a time: Chapter 3
Evolution's Alignment Solution: Why Burnout Prevents Monsters
Thick practices for AI tools
The problem of graceful deference
See Your Word Count While You Write
On Stance
Breaking the Hedonic Rubber Band
Rejecting "Goodness" Does Not Mean Hammering The Defect Button
On the Normativity of Debate: A Discussion With Said Achmiz
Questioning the Requirements
France is ready to stand alone
Love is Willingness to do Violence
Turning Grey
The AI bubble covered in the Atlantic
A Simple Sing-along Solstice
Universal Basic Income in an AGI Future
Ternary plots are underrated
How likely is dangerous AI in the short term?
[Linkpost] Galaxy brain resistance
A pencil is not a pencil is not a pencil
The Open Strategy Dictator Game: An Experiment in Transparent Cooperation
DC/Maryland Secular Solstice
What I learned building a language-learning app
Andrej Karpathy on LLM cognitive deficits
Consciousness as a Distributed Ponzi Scheme
Maat - Intro Post
Variously Effective Altruism
Why does everything feel so urgent?
Omniscience one bit at a time: Chapter 2
Science Fiction Trail: The Compressed Universe
Social drives 1: “Sympathy Reward”, from compassion to dehumanization
Ontology for AI Cults and Cyber Egregores
From Vitalik: Galaxy brain resistance
The jailbreak argument against LLM values
The grapefruit juice effect
Against Powerful Text Editors
Duncan Sabien and Politics
The only important ASI timeline
Book Announcement: The Gentle Romance
Three Kinds Of Ontological Foundations
When does Claude sabotage code? An Agentic Misalignment follow-up
Manifest X DC Opening Benediction -
Making Friends Along the Way
Relearning how to be human
Condensation
Omniscience one bit at a time: Chapter 1
We're Not The Center of the Moral Universe
Gradual Disempowerment Monthly Roundup #2
We're Already Living in a Sci-Fi World
AI hasn't seen widespread adoption because the labs are focusing on automating AI R&D
Heroic responsibility is morally neutral
The General Social Survey and the ACX Survey
There should be unicorns
One Shot Singalonging is an attitude, not a skill or song-difficulty-level
Where Our Engineering Education Went Wrong
A sonnet, a sestina, a villanelle
n-ary Huffman coding
Liouville's Theorem and the Second Law
Insofar As I Think LLMs "Don't Really Understand Things", What Do I Mean By That?
Why AC is cheap, but AC repair is a luxury
Myopia Mythology
Omniscaling to MNIST
Why can planarians regrow their bodies?
Unexpected Things that are People
A humanist critique of technological determinism
Bottom-Up: Principled Compression to Shrink LLMs
Five very good reasons to not write down literally every single thought you have
Review: Parsifal at the SF Opera
Escalation and perception
The Snaw
Augustine of Hippo's Handbook on Faith, Hope, and Love in Latin (or: Claude as Pandoc++)
Mourning a life without AI
Two Times I Was Surprised By My Own Values
On slack and akrasia
Anthropic & Dario’s dream
Against “You can just do things”
Agent Foundations: Paradigmatizing in Math and Science
Entity Review: Pythia
Announcing “Computational Functionalism Debate” (soliciting paid feedback): Test your intuitions about consciousness
AI Safety's Berkeley Bubble and the Allies We're Not Even Trying to Recruit
Start an AI safety group with the Pathfinder Fellowship
AI is not inevitable.
The Hawley-Blumenthal AI Risk Evaluation Act
Secular Solstice Roundup 2025
A country of alien idiots in a datacenter: AI progress and public alarm
On Sam Altman’s Second Conversation with Tyler Cowen
Plans to build AGI with nuclear reactor-like safety lack 'systematic thinking,' say researchers
13 Arguments About a Transition to Neuralese AIs
Open Letter to Ohio House Reps
Two easy digital intentionality practices
Is it really paranoia if I'm really Out to Get Me?
Did you know you can just buy blackbelts?
GPTF-8: A tokenizer-based character encoding
Cancer; A Crime Story (and other tales of optimization gone wrong)
Liberation Clippy
Minimizing Loss ≠ Maximizing Intelligence
Solstice Season Megameetups
My new nonprofit Evitable is hiring.
Willpower is exhausting, use content blockers
A review of MSUM's AI Innovation Summit: Day Two
Brutalist Prose
Can we do useful meta-analysis? Unjournal evaluations of "Meaningfully reducing consumption of meat... is an unsolved problem..."
Toward Statistical Mechanics Of Interfaces Under Selection Pressure
Sex, Drugs, and the Future of American Politics
What is the (LW) consensus on jump from qualia to self-awareness in AI?
OpenAI Does Not Appear to be Applying Watermarks Honestly
Genetic Enhancements of Color Qualia
Anticheat: a non-technical look without psychoanalysis
Hyperstition Progress Report, and, Any Ideas for Making AI-Generated Fiction Actually Good?
It is our responsibility to develop a healthy relationship with our technology
Debunking “When Prophecy Fails”
[Linkpost] How to Win Board Games
SPAR Spring ‘26 mentor apps open—now accepting biosecurity, AI welfare, and more!
AI #141: Give Us The Money
Fake media seems to be a fact of life now
Our ancestors didn't know their faces
Review: K-Pop Demon Hunters (2025)
A memo on Takeoff
announcing my modular coal startup
Halloween Tombstone Simulacra
Halfway to Anywhere
People Seem Funny In The Head About Subtle Signals
Why I think consciousness is fundamental rather than structural/neural.
Technical AI Safety Roles at Open Philanthropy
Career Advice for High Schoolers and College Students
A 2032 Takeoff Story
Continuous takeoff is a bad name
An ARENA 6.0 Capstone: Model Organism of Encoded Reasoning
Breaking Books: A tool to bring books to the social sphere
Digital minimalism is out, digital intentionality is in
Anthropic Commits To Model Weight Preservation
Living in the Shadow of The Sort
Hardening against AI takeover is difficult, but we should try
AI Safety at the Frontier: Paper Highlights of October 2025
New homepage for AI safety resources – AISafety.com redesign
An atheist's guide to prayer
Theory of Change for US Govt Whistleblower Database and Guide
Suffering is what makes it special
Maxwell's Demon and the Arrow of Time
A Guide To Being Persuasive About AI Dangers
Why Safety Constraints in LLMs Are Easily Breakable? Knowledge as a Network of Gated Circuits
Sable and Able: A Tale of Two ASIs
Dario Amodei's "Machines of
Loving Grace" sounds incredibly dangerous, for Humans
What are you excited about doing?
Intentionality
Food-related things that have made my life a little better
Gerrymandering California
How to survive until AGI
Heroic Responsibility
[Linkpost] Competing Motivations: When More Incentives Lead To Less Effort
Not Over Or Under Indexed
Parsing Validation
A/B testing could lead LLMs to retain users instead of helping them
OpenAI: The Battle of the Board: Ilya’s Testimony
Berkeley Solstice Weekend
Modeling the geopolitics of AI development
Thoughts by a non-economist on AI and economics
AI Safety Camp 11
Keeping Ants and Spotting Queens
Letter to a close friend
Open-weight training practices and implications for CoT monitorability
Free Learning in Today’s Society:
Some Personal Experiences and Reflections
A prayer for engaging in conflict
Rainbows, fractals, and crumpled paper: Hölder continuity
On tasting things
More notes on US govt whistleblower guide and DB
US Govt Whistleblower Guide
US Govt Whistleblower Database
The Mortifying Ordeal of Knowing Thyself
Build the life you actually want
Research Reflections
I ate bear fat with honey and salt flakes, to prove a point
Parleying with the Principled
The Zen Of Maxent As A Generalization Of Bayes Updates
Sam Altman's track record of manipulation: some quotes from Karen Hao's "Empire of AI"
Comparative advantage & AI
Just complaining about LLM sycophancy (filler episode)
The Tale of the Top-Tier Intellect
High-Resistance Systems to Change: Can a Political Strategy Apply to Personal Change?
Leaving Open Philanthropy, going to Anthropic
Red Heart
How Powerful AI Gets Cheap
The EU could hold AI capabilities development hostage if they wanted to
What's up with Anthropic predicting AGI by early 2027?
For Synthetic Situations
The Unreasonable Effectiveness of Fiction
Crime and Punishment #1
Solving a problem with mindware
Publishing academic papers on transformative AI is a nightmare
Trying to understand my own cognitive edge
Why you shouldn't write a blog post every day for a month
There's some chance oral herpes is pretty bad for you?
Spending Less by Doing More
Lack of Social Grace is a Lack of Skill
Live Conversational Threads: Not an AI Notetaker
A glimpse of the other side
Things I've Become More Confident About
How and why you should make your home smart (it's cheap and secure!)
Body Time and Daylight Savings Apologetics
Introducing the Hauntoloscope: a Counterfactual Timeline Generation Tool
Midwit-Savants Are Common
Are apples made of cells?
Halfhaven halftime
Ilya Sutskever Deposition Transcript
Donut Theory of Consciousness
Reason About Intelligence, Not AI
Human Values ≠ Goodness
A toy model of corrigibility
Doom from a Solution to the Alignment Problem
Ohio House Bill 469
I Wondered Why I Procrastinate Even On Things I Am "Passionate" About
Quick take: Economics of Intelligence 101
25 Que
My YC Pitch
Me consuming five different forms of media at once to minimize the chance of a thought occurring
A review of MSUM's AI Innovation Summit: Day One
FTL travel and scientific realism
Reflections on 4 years of meta-honesty
Better ways to do grayscale screens
2025 Unofficial LW Community Census, Request for Comments
Metformin 1000mg/day upon symptom onset may reduce your risk of long covid by 10-30%
Weak-To-Strong Generalization
Model welfare and open source
Why I Transitioned: A Case Study
Economics and Transformative AI (by Tom Cunningham)
Decision theory when you can't make decisions
You’re always stressed, your mind is always busy, you never have enough time
Re-rolling environment
Why Is Printing So Bad?
Some Meetups I Ran (2025 Q2)
Shouldn't taking over the world be easier than recursively self-improving, as an AI?
ACX Atlanta November Meetup
Seattle Secular Solstice 2025 – Dec 20th
Seattle Secular Solstice 2025 – Dec 20th
Fermi Paradox, Ethics and Astronomical waste
LLM-generated text is not testimony
Apply to the Cooperative AI PhD Fellowship by November 16th!
Vaccination against ASI
Strategy-Stealing Argument Against AI Dealmaking
Evidence on language model consciousness
Asking Paul Fussell for Writing Advice
Freewriting in my head, and overcoming the “twinge of starting”
2025 NYC Secular Solstice & East Coast Rationalist Megameetup
Supervillain Monologues Are Unrealistic
Secretly Loyal AIs: Threat Vectors and Mitigation Strategies
Ink without haven
Apply to the Cambridge ERA:AI Winter 2026 Fellowship
FAQ: Expert Survey on Progress in AI methodology
Social media feeds 'misaligned' when viewed through AI safety framework, show researchers
Crossword Halloween 2025: Manmade Horrors
Debugging Despair ~> A bet about Satisfaction and Values
Halfhaven Digest #3
OpenAI Moves To Complete Potentially The Largest Theft In Human History
A (bad) Definition of AGI
Resampling Conserves Redundancy & Mediation (Approximately) Under the Jensen-Shannon Divergence
Centralization begets stagnation
Summary and Comments on Anthropic's Pilot Sabotage Risk Report
AI #140: Trying To Hold The Line
AISLE discovered three new OpenSSL vulnerabilities
Sonnet 4.5's eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals
On The Conservation of Rights
When “HDMI-1” Lies To You
Why there is still one instance of Eliezer Yudkowsky?
What is LessWrong explicitly trying to optimize?(And what percentage of its structure currently serves that goal?)
Interview on the Hengshui Model High School
Emergent Introspective Awareness in Large Language Models
Genius is Not About Genius
Quotes on OpenAI's timelines to automated research, safety research, and safety collaborations before recursive self improvement
Why you shouldn't eat meat if you hate factory farming
Unsureism: The Rational Approach to Religious Uncertainty
The End of OpenAI’s Nonprofit Era
An intro to the Tensor Economics blog
Uncertain Updates: October 2025
AI Doomers Should Raise Hell
AISN #65: Measuring Automation and Superintelligence Moratorium Letter
TBC Episode with Max Harms - Red Heart and If Anyone Builds It, Everyone Dies
Thresholds for Pascal's Mugging?
Please Do Not Sell B30A Chips to China
Why Civilizations Are Unstable (And What This Means for AI Alignment)
Personal Debugging for Survival: How Can I Prevent Despair Without Imminent AI Death?
What can we learn from parent-child-alignment for AI?
Some data from LeelaPieceOdds
How can I prevent despair, without imminent AI death
A Very Simple Model of AI Dealmaking
300 por basil - And if I'm already in the shit, what's the most beautiful fart I can leave behind?
Workshop on Post-AGI Economics, Culture, and Governance
When Will AI Transform the Economy?
AI Craziness Mitigation Efforts
Introducing the Epoch Capabilities Index (ECI)
Mottes and Baileys in AI discourse
Temporarily Losing My Ego
The Memetics of AI Successionism
New 80,000 Hours problem profile on the risks of power-seeking AI
LLM robots can't pass butter (and they are having an existential crisis about it)
Call for mentors from AI Safety and academia. Sci.STEPS mentorship program
Heuristics for assessing how much of a bubble AI is in/will be
Q2 AI Benchmark Results: Pros Maintain Clear Lead
A Sketch of Helpfulness Theory With Equivocal Principals
All the lab's AI safety Plans: 2025 Edition
A Bayesian Explanation of Causal Models
Brainstorming Food on the Cheap+Healthy+Convenient+Edible Frontier
How Important is Inverting LLMs?
Would concentration of power be bad, given offense-dominant weapons?
Asking (Some Of) The Right Questions
life lessons from trading
Agentic Monitoring for AI Control
Model Parameters as a Steganographic Private Channel
Major survey on the HS/TS spectrum and gAyGP
Death of the Author
Exploring the multi-dimensional refusal subspace in reasoning models
AIs should also refuse to work on capabilities research
List of lists of project ideas in AI Safety
Uncommon Utilitarianism #3: Bounded Utility Functions
How valuable is money-in-market?
Credit goes to the presenter, not the inventor
On Fleshling Safety: A Debate by Klurl and Trapaucius.
Results of "Experiment on Bernoulli processes"
certain exotic neurotransmitters as SMART PILLS: or compounds that increase the capacity for mental work in humans
Cancer has a surprising amount of detail
Stability of natural latents in information theoretic terms
Lessons from Teaching Rationality to EAs in the Netherlands
Are We Their Chimps?
FWIW: What I noticed at a (Goenka) Vipassana retreat
Why are there no helpful infectious diseases?
Brightline is Actually Pretty Dangerous
Seven-ish Words from My Thought-Language
Remembrancy
Pygmalion's Wafer
Despair and information
Why is OpenAI releasing products like Sora and Atlas?
Origins and dangers of future AI capability denial
Do you completely trust that you are completely in the shit? - despair and information -
Assessing Far UVC Positioning
Musings on Reported Cost of Compute (Oct 2025)
Regardless of X, you can still just sign superintelligence-statement.org if you agree
The Future of Interpretability is Geometric
New Statement Calls For Not Building Superintelligence For Now
Highlights from Explaining AI Explainability
Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability
Dollars in political giving are less fungible than you might think
LW Reacts pack for Discord/Slack/etc
AI Timelines and Points of no return
Can we steer AI models toward safer actions by making these instrumentally useful?
Introducing ControlArena: A library for running AI control experiments
Plan 1 and Plan 2
Why I Don't Believe in True AGI
Guys I might be an e/acc
How transformers can compute distances along a curve locally.
How an AI company CEO could quietly take over the world
Worlds Where Iterative Design Succeeds?
Automated real time monitoring and orchestration of coding agents
Reminder: Morality is unsolved
The main way I've seen people turn ideologically crazy [Linkpost]
Empirical Partial Derivatives
Differences in Alignment Behaviour between Single-Agent and Multi-Agent AI Systems
Building a different kind of personal intelligence
Beliefs about formal methods and AI safety
Depressión and information - What my confidence level am I that I'm in absolute shit?
AI #139: The Overreach Machines
Should AI Developers Remove Discussion of AI Misalignment from AI Training Data?
Software Engineering at SecureBio
Is terminal lucidity real?
A Concrete Roadmap towards Safety Cases based on
Chain-of-Thought Monitoring
LW Psychosis
Learning to Interpret Weight Differences in Language Models
AGI's Last Bottlenecks
Statement on Superintelligence - FLI Open Letter
Why AI alignment matters today
Any corrigibility naysayers outside of MIRI?
Which side of the AI safety community are you in?
Homomorphically encrypted consciousness and its implications
Consider donating to AI safety champion Scott Wiener
Postrationality: An Oral History
Penny's Hands
Is 90% of code at Anthropic being written by AIs?
LLM Self-Reference Language in Multilingual vs English-Centric Models
How Well Does RL Scale?
The Perpetual Technological Cage
Utopiography Interview
White House OSTP AI Deregulation Public Comment Period Ends Oct. 27
July-October 2025 Progress in Guaranteed Safe AI
In remembrance of Sonnet '3.6'
Stratified Utopia
Early stage goal-directednesss
On Dwarkesh Patel’s Podcast With Andrej Karpathy
Relearning Human Signaling — Experiments in Real-World Connection (Stockholm + virtual)
Samuel x Bhishma - Superintelligence by 2030?
Remarks on Bayesian studies from 1963
How the Human Lens Shapes Machine Minds
Symbiogenesis vs. Convergent Consequentialism
21st Century Civilization curriculum
Ramblings on the Self Indication Assumption
An epistemic theory of populism [link post to Joseph Heath]
EU explained in 10 minutes
"Tilakkhana", Gwern [poem]
Attending Your First Contra Dance in a Fragrance-Compliant Manner
How low can we go? Modern School Curricula
The If Anyone Builds It, Everyone Dies march assurance contract should indicate how many signatures it has received
A Mathematical Model of Alcor’s Economic Survival
Can you find the steganographically hidden message?
How cause-area specific conferences can strengthen the EA community
Secular Solstice: Bremen (Dec 13)
How Stuart Buck funded the replication crisis
Contra-Zombies? Contra-Zombies!: Chalmers as a parallel to Hume
Consider donating to Alex Bores, author of the RAISE Act
Bubble, Bubble, Toil and Trouble
Considerations around career costs of political donations
A Cup of Blue Tea
A Bayesian nightmare: Instagram and Sampling bias
Uncommon Utilitarianism #2: Positive Utilitarianism
Can you transfer a backdoor with subliminal learning?
Final-Exam-Tier Medical Problem With Handwavy Reasons We Can't Just Call A Licensed M.D.
Humanity Learned Almost Nothing From COVID-19
Frontier LLM Race/Sex Exchange Rates
Scenes, cliques and teams - a high level ontology of groups
Learnable Skills
Give Me Your Data: The Rationalist Mind Meld
The IABIED statement is not literally true
Libraries need more books
In defense of the goodness of ideas
Sample Interesting First
Comma v0.1 converted to GGUF
Using Bayes' Theorem to determine Optimal Protein Intake
Selected Graphics Showing Progress towards AGI
Networking for Spies: Translating a Cyrillic Text with Claude Code
How to survive in the storm of uncertainty (and shit)?
Space colonization and scientific discovery could be mandatory for successful defensive AI
Memory Decoding Journal Club: Functional connectomics reveals general wiring rule in mouse visual cortex
Meditation is dangerous
I handbound a book of Janus's essays for my girlfriend
The Dark Arts of Tokenization or: How I learned to start worrying and love LLMs' undecoded outputs
How To Vastly Increase Your Charitable Impact
Nontrivial pillars of IABIED
What Success Might Look Like
I’m an EA who benefitted from rationality
AI #138 Part 2: Watch Out For Documents
Mess AI – deliberate corruption of the training data to prevent superintelligence
Activation Plateaus: Where and How They Emerge
Can We Simulate Meiosis to Create Digital Gametes — and Are the Results Your Biological Offspring?
Steven Adler reports that NVIDIA is attempting to stifle pro-export-control speach
Book Review: To Explain the World
AISN#64: New AGI Definition and Senate Bill Would Establish Liability for AI Harms
Finding Features in Neural Networks with the Empirical NTK
Learning from the Luddites: Implications for a modern AI labour movement
Job Openings: SWE, PM, and Grants Coordinator to help improve grant-making
AI #138 Part 1: The People Demand Erotic Sycophants
Cheap Labour Everywhere
Quantum immortality and AI risk – the fate of a lonely survivor
Halfhaven Digest #2
Fragrance Free Confusion
The Three Levels of Agency
Electronics Mechanic -> AI Safety Researcher: A 30-Month Journey to Model Welfare
Some astral energy extraction methods
AI-202X-slowdown: can CoT-based AIs become capable of aligning the ASI?
Chemical Telescopes And The Process Of Science
Monthly Roundup #35: October 2025
Updating the name of Open Philanthropy's AI program
Open Global Investment: Comparisons and Critique
We are too comfortable with AI "magic"
Are calm introverts (like East Asians) uniquely suited for space travel & Mars missions?
Until the stars burn out? Assessing the stakes of AGI lock-in
It will cost you nothing to "bribe" a Utilitarian
How I Became a 5x Engineer with Claude Code
That Mad Olympiad
A New Global Risk: Large Comet’s Impact on Sun Could Cause Fires on Earth
Can LLMs Coordinate? A Simple Schelling Point Experiment
Minimal Prompt Induction of Self-Talk in Base LLMs
Humans Are Spiky (In an LLM World)
Gnashing of Teeth
Geometric Structure of Emergent Misalignment: Evidence for Multiple Independent Directions
Communism By Another Name
Situational Awareness as a Prompt for LLM Parasitism
Postrationality: An Oral History
The sum of its parts: composing AI control protocols
Why your boss isn’t worried about AI
Humanity AI Commits $500 million to AI and Democracy Protection, AI x Security, and more
Thinking Partners: Building AI-Powered Knowledge Management Systems
SS26 Color Stats
The Biochemical Beauty of Retatrutide: How GLP-1s Actually Work
My views on Lesswrong
Trade Escalation, Supply Chain Vulnerabilities and Rare Earth Metals
12 Angry Agents, or: A Plan for AI Empathy
The "Length" of "Horizons"
My Software Setup
Narcissism, Echoism, and Sovereignism: A 4-D Model of Personality
A personal take on why you should work at Forethought (maybe)
Discrete Generative Models
Survey Results: Far UVC and Glycol Vapors
How AI Manipulates—A Case Study
Recontextualization Mitigates Specification Gaming Without Modifying the Specification
AI Psychosis, with Tim Hua and Adele Lopez
What is Lesswrong good for?
Predictability is Underrated
The Mom Test for AI Extinction Scenarios
Is There a Sound Argument for Generality in AI?
Reasons to sign a statement to ban superintelligence (+ FAQ for those on the fence)
Water Above the Ocean
OpenAI #15: More on OpenAI’s Paranoid Lawfare Against Advocates of SB 53
The Thirteen-Circle Paradox
Pause House, Blackpool
Global vs. Local feedback
Sublinear Utility in Population and other Uncommon Utilitarianism
RiskiPedia
Don't Mock Yourself
Experiment: Test your priors on Bernoulli processes.
The Problem of Consciousness and AI as an Ethical Subject
Dr Evil & Realpolitik
How do we know when something is deserving of welfare?
The Narcissistic Spectrum
Non-copyability as a security feature
The Alignment Problem Isn't Theoretical
International Programme on AI Evaluations
If a Lioness Could Speak
How long do AI companies have to achieve significant capability gains before funding collapses?
You're not confused by thermodynamics
Subscribe to my Inkhaven feed!
The Most Common Bad Argument In These Parts
Experiments With Sonnet 4.5's Fiction
Letter to Heads of AI labs
Emil the Moose
Using complex polynomials to approximate arbitrary continuous functions
What does it feel like to understand?
The 5 Obstacles I Had to Overcome to Become Vegan
2025 State of AI Report and Predictions
Applications Open for a Weekend Exploring Civilisational Sanity
Maybe Use BioLMs To Mitigate Pre-ASI Biorisk?
The statement "IABIED" is true even if the book IABIED is mostly false
AI and Biological Risk: Forecasting Key Capability Thresholds
Why Future AIs will Require New Alignment Methods
Materialist Semiotics and the Nature of Qualia
Patience and Willingness to Be Slow
You should probably prepare for your chats to be fed to AI
We won’t get docile, brilliant AIs before we solve alignment
Labs lack the tools to course-correct
The Liberty Tractor
At odds with the unavoidable meta-message
Stars are a rounding error
Towards a Typology of Strange LLM Chains-of-Thought
Training Qwen-1.5B with a CoT legibility penalty
Would concentration of power be bad, given offense-dominant weapons?
Interview with a drone expert on the future of AI warfare
Investigating Neural Scaling Laws Emerging from Deep Data Structure
I take antidepressants. You’re welcome
Training fails to elicit subtle reasoning in current language models
Realistic Reward Hacking Induces Different and Deeper Misalignment
Why am I not currently starting a religion around AI or similar topics?
THE WISE, THE GOOD, THE POWERFUL
Hubris & Control
"Yes, and—" Requires the Possibility of "No, Because—"
Four Questions to Refine Your Policy Proposal
A Snippet On The Epistemically Hygienic Containment Of Faith-In-Reason-Itself
Alignment progress doesn’t compensate for higher capabilities
The Necronomicon Is Already Read
Biouploading: Preserving My Living Neurons and Connectome as a Spatially Distributed Mesh
self reflections of a striver
Hospitalization: A Review
AI #137: An OpenAI App For That
Are We Leaving Literature To The Psychotic?
Lessons from the Mountains
Probabilistic Societies
Inverting the Most Forbidden Technique: What happens when we train LLMs to lie detectably?
NEPA, Permitting and Energy Roundup #2
What shapes does reasoning take but circular?
The Oracle's Gift
Thinking Mathematically - Convergent Sequences
The Relationship Between Social Punishment and Shared Maps
IABIED: Paradigm Confusion and Overconfidence
The Wise Baboon of Loyalty
Spooky Collusion at a Distance with Superrational AI
The Architecture of the Narcissistic False Self
Reflections on The Curve 2025
2025-10-12 - London rationalish meetup - Periscope
Plans A, B, C, and D for misalignment risk
Three Paths Through Manifold
Halfhaven Digest #1
The "cool idea" bias
Irresponsible Companies Can Be Made of Responsible Employees
Heaven, Hell, and Mechanics
10 Ways to Waste a Decade
You Should Get a Reusable Mask
Replacing RL w/ Parameter-based Evolutionary Strategies
Intent alignment seems incoherent
The Tower of Babel in Reverse
Petri: An open-source auditing tool to accelerate AI safety research
Bending The Curve
Kairos is hiring: Founding Generalist & SPAR Contractor
Messy on Purpose: Part 2 of A Conservative Vision for the Future
Going Phoneless
The Alignment Paradox: Why Transparency Can Breed Deception
Notes on "Homology, Genes and Evolutionary Innovation"
Research Robots: When AIs Experiment on Us
Top Warning Signs Your Friends are Being Oneshotted By AI
LLMs as a limiter of social intercourse
Generalization and the Multiple Stage Fallacy?
Telling the Difference Between Memories & Logical Guesses
Notes from European Progress Conference
"Intelligence" -> "Relentless, Creative Resourcefulness"
Chaos Alone is No Bar to Superintelligence
We won’t get AIs smart enough to solve alignment but too dumb to rebel
Notes on the need to lose
Excerpts from my neuroscience to-do list
Experience Report - ML4Good Bootcamp Singapore, Sep'25
Gradual Disempowerment Monthly Roundup
Subliminal Learning, the Lottery-Ticket Hypothesis, and Mode Connectivity
The Origami Men
Medical Roundup #5
Why I think ECL shouldn’t make you update your cause prio
Did Tyler Robinson carry his rifle as claimed by the government?
AI Science Companies: Evidence AGI Is Near
LLMs one-box when in a "hostile telepath" version of Newcomb's Paradox, except for the one that beat the predictor
Alignment Faking Demo for Congressional Staffers
Do Things for as Many Reasons as Possible
One Does Not Simply Walk Away from Omelas
The quotation mark
The Sadism Spectrum and How to Access It
Maybe social media algorithms don't suck
Base64Bench: How good are LLMs at base64, and why care about it?
Behavior Best-of-N achieves Near Human Performance on Computer Tasks
How likely are “s-risks” (large-scale suffering outcomes) from unaligned AI compared to extinction risks?
LLMs are badly misaligned
The Counterfactual Quiet AGI Timeline
AISafety.com Reading Group session 328
Open Philanthropy’s Biosecurity and Pandemic Preparedness Team Is Hiring and Seeking New Grantees
$250 bounties for the best short stories set in our near future world & Brooklyn event to select them
How the NanoGPT Speedrun WR dropped by 20% in 3 months
a quick thought about AI alignment
Making Your Pain Worse can Get You What You Want
Markets in Democracy: What happens when you can sell your vote?
What I've Learnt About How to Sleep
The 'Magic' of LLMs: The Function of Language
To my client having a sudden ‘bad day’:
Consider Doing Small Walks at Work
Where does Sonnet 4.5's desire to "not get too comfortable" come from?
Goodness is harder to achieve than competence
Good is a smaller target than smart
Making Sense of Consciousness Part 6: Perceptions of Disembodiment
Recent AI Experiences