LessWrong
Death of the Author
Exploring the multi-dimensional refusal subspace in reasoning models
AIs should also refuse to work on capabilities research
List of lists of project ideas in AI Safety
Uncommon Utilitarianism #3: Bounded Utility Functions
How valuable is money-in-market?
Credit goes to the presenter, not the inventor
On Fleshling Safety: A Debate by Klurl and Trapaucius.
Results of "Experiment on Bernoulli processes"
certain exotic neurotransmitters as SMART PILLS: or compounds that increase the capacity for mental work in humans
Cancer has a surprising amount of detail
Stability of natural latents in information theoretic terms
Lessons from Teaching Rationality to EAs in the Netherlands
Are We Their Chimps?
FWIW: What I noticed at a (Goenka) Vipassana retreat
Why are there no helpful infectious diseases?
Brightline is Actually Pretty Dangerous
Seven-ish Words from My Thought-Language
Remembrancy
Pygmalion's Wafer
Despair and information
Why is OpenAI releasing products like Sora and Atlas?
Origins and dangers of future AI capability denial
Do you completely trust that you are completely in the shit? - despair and information -
Assessing Far UVC Positioning
Musings on Reported Cost of Compute (Oct 2025)
Regardless of X, you can still just sign superintelligence-statement.org if you agree
The Future of Interpretability is Geometric
New Statement Calls For Not Building Superintelligence For Now
Highlights from Explaining AI Explainability
Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability
Dollars in political giving are less fungible than you might think
LW Reacts pack for Discord/Slack/etc
AI Timelines and Points of no return
Can we steer AI models toward safer actions by making these instrumentally useful?
Introducing ControlArena: A library for running AI control experiments
Plan 1 and Plan 2
Why I Don't Believe in True AGI
Guys I might be an e/acc
How transformers can compute distances along a curve locally.
How an AI company CEO could quietly take over the world
Worlds Where Iterative Design Succeeds?
Automated real time monitoring and orchestration of coding agents
Reminder: Morality is unsolved
The main way I've seen people turn ideologically crazy [Linkpost]
Empirical Partial Derivatives
Differences in Alignment Behaviour between Single-Agent and Multi-Agent AI Systems
Building a different kind of personal intelligence
Beliefs about formal methods and AI safety
Depressión and information - What my confidence level am I that I'm in absolute shit?
AI #139: The Overreach Machines
Should AI Developers Remove Discussion of AI Misalignment from AI Training Data?
Software Engineering at SecureBio
Is terminal lucidity real?
A Concrete Roadmap towards Safety Cases based on
Chain-of-Thought Monitoring
LW Psychosis
Learning to Interpret Weight Differences in Language Models
AGI's Last Bottlenecks
Statement on Superintelligence - FLI Open Letter
Why AI alignment matters today
Any corrigibility naysayers outside of MIRI?
Which side of the AI safety community are you in?
Homomorphically encrypted consciousness and its implications
Consider donating to AI safety champion Scott Wiener
Postrationality: An Oral History
Penny's Hands
Is 90% of code at Anthropic being written by AIs?
LLM Self-Reference Language in Multilingual vs English-Centric Models
How Well Does RL Scale?
The Perpetual Technological Cage
Utopiography Interview
White House OSTP AI Deregulation Public Comment Period Ends Oct. 27
July-October 2025 Progress in Guaranteed Safe AI
In remembrance of Sonnet '3.6'
Stratified Utopia
Early stage goal-directednesss
On Dwarkesh Patel’s Podcast With Andrej Karpathy
Relearning Human Signaling — Experiments in Real-World Connection (Stockholm + virtual)
Samuel x Bhishma - Superintelligence by 2030?
Remarks on Bayesian studies from 1963
How the Human Lens Shapes Machine Minds
Symbiogenesis vs. Convergent Consequentialism
21st Century Civilization curriculum
Ramblings on the Self Indication Assumption
An epistemic theory of populism [link post to Joseph Heath]
EU explained in 10 minutes
"Tilakkhana", Gwern [poem]
Attending Your First Contra Dance in a Fragrance-Compliant Manner
How low can we go? Modern School Curricula
The If Anyone Builds It, Everyone Dies march assurance contract should indicate how many signatures it has received
A Mathematical Model of Alcor’s Economic Survival
Can you find the steganographically hidden message?
How cause-area specific conferences can strengthen the EA community
Secular Solstice: Bremen (Dec 13)
How Stuart Buck funded the replication crisis
Contra-Zombies? Contra-Zombies!: Chalmers as a parallel to Hume
Consider donating to Alex Bores, author of the RAISE Act
Bubble, Bubble, Toil and Trouble
Considerations around career costs of political donations
A Cup of Blue Tea
A Bayesian nightmare: Instagram and Sampling bias
Uncommon Utilitarianism #2: Positive Utilitarianism
Can you transfer a backdoor with subliminal learning?
Final-Exam-Tier Medical Problem With Handwavy Reasons We Can't Just Call A Licensed M.D.
Humanity Learned Almost Nothing From COVID-19
Frontier LLM Race/Sex Exchange Rates
Scenes, cliques and teams - a high level ontology of groups
Learnable Skills
Give Me Your Data: The Rationalist Mind Meld
The IABIED statement is not literally true
Libraries need more books
In defense of the goodness of ideas
Sample Interesting First
Comma v0.1 converted to GGUF
Using Bayes' Theorem to determine Optimal Protein Intake
Selected Graphics Showing Progress towards AGI
Networking for Spies: Translating a Cyrillic Text with Claude Code
How to survive in the storm of uncertainty (and shit)?
Space colonization and scientific discovery could be mandatory for successful defensive AI
Memory Decoding Journal Club: Functional connectomics reveals general wiring rule in mouse visual cortex
Meditation is dangerous
I handbound a book of Janus's essays for my girlfriend
The Dark Arts of Tokenization or: How I learned to start worrying and love LLMs' undecoded outputs
How To Vastly Increase Your Charitable Impact
Nontrivial pillars of IABIED
What Success Might Look Like
I’m an EA who benefitted from rationality
AI #138 Part 2: Watch Out For Documents
Mess AI – deliberate corruption of the training data to prevent superintelligence
Activation Plateaus: Where and How They Emerge
Can We Simulate Meiosis to Create Digital Gametes — and Are the Results Your Biological Offspring?
Steven Adler reports that NVIDIA is attempting to stifle pro-export-control speach
Book Review: To Explain the World
AISN#64: New AGI Definition and Senate Bill Would Establish Liability for AI Harms
Finding Features in Neural Networks with the Empirical NTK
Learning from the Luddites: Implications for a modern AI labour movement
Job Openings: SWE, PM, and Grants Coordinator to help improve grant-making
AI #138 Part 1: The People Demand Erotic Sycophants
Cheap Labour Everywhere
Quantum immortality and AI risk – the fate of a lonely survivor
Halfhaven Digest #2
Fragrance Free Confusion
The Three Levels of Agency
Electronics Mechanic -> AI Safety Researcher: A 30-Month Journey to Model Welfare
Some astral energy extraction methods
AI-202X-slowdown: can CoT-based AIs become capable of aligning the ASI?
Chemical Telescopes And The Process Of Science
Monthly Roundup #35: October 2025
Updating the name of Open Philanthropy's AI program
Open Global Investment: Comparisons and Critique
We are too comfortable with AI "magic"
Are calm introverts (like East Asians) uniquely suited for space travel & Mars missions?
Until the stars burn out? Assessing the stakes of AGI lock-in
It will cost you nothing to "bribe" a Utilitarian
How I Became a 5x Engineer with Claude Code
That Mad Olympiad
A New Global Risk: Large Comet’s Impact on Sun Could Cause Fires on Earth
Can LLMs Coordinate? A Simple Schelling Point Experiment
Minimal Prompt Induction of Self-Talk in Base LLMs
Humans Are Spiky (In an LLM World)
Gnashing of Teeth
Geometric Structure of Emergent Misalignment: Evidence for Multiple Independent Directions
Communism By Another Name
Situational Awareness as a Prompt for LLM Parasitism
Postrationality: An Oral History
The sum of its parts: composing AI control protocols
Why your boss isn’t worried about AI
Humanity AI Commits $500 million to AI and Democracy Protection, AI x Security, and more
Thinking Partners: Building AI-Powered Knowledge Management Systems
SS26 Color Stats
The Biochemical Beauty of Retatrutide: How GLP-1s Actually Work
My views on Lesswrong
Trade Escalation, Supply Chain Vulnerabilities and Rare Earth Metals
12 Angry Agents, or: A Plan for AI Empathy
The "Length" of "Horizons"
My Software Setup
Narcissism, Echoism, and Sovereignism: A 4-D Model of Personality
A personal take on why you should work at Forethought (maybe)
Discrete Generative Models
Survey Results: Far UVC and Glycol Vapors
How AI Manipulates—A Case Study
Recontextualization Mitigates Specification Gaming Without Modifying the Specification
AI Psychosis, with Tim Hua and Adele Lopez
What is Lesswrong good for?
Predictability is Underrated
The Mom Test for AI Extinction Scenarios
Is There a Sound Argument for Generality in AI?
Reasons to sign a statement to ban superintelligence (+ FAQ for those on the fence)
Water Above the Ocean
OpenAI #15: More on OpenAI’s Paranoid Lawfare Against Advocates of SB 53
The Thirteen-Circle Paradox
Pause House, Blackpool
Global vs. Local feedback
Sublinear Utility in Population and other Uncommon Utilitarianism
RiskiPedia
Don't Mock Yourself
Experiment: Test your priors on Bernoulli processes.
The Problem of Consciousness and AI as an Ethical Subject
Dr Evil & Realpolitik
How do we know when something is deserving of welfare?
The Narcissistic Spectrum
Non-copyability as a security feature
The Alignment Problem Isn't Theoretical
International Programme on AI Evaluations
If a Lioness Could Speak
How long do AI companies have to achieve significant capability gains before funding collapses?
You're not confused by thermodynamics
Subscribe to my Inkhaven feed!
The Most Common Bad Argument In These Parts
Experiments With Sonnet 4.5's Fiction
Letter to Heads of AI labs
Emil the Moose
Using complex polynomials to approximate arbitrary continuous functions
What does it feel like to understand?
The 5 Obstacles I Had to Overcome to Become Vegan
2025 State of AI Report and Predictions
Applications Open for a Weekend Exploring Civilisational Sanity
Maybe Use BioLMs To Mitigate Pre-ASI Biorisk?
The statement "IABIED" is true even if the book IABIED is mostly false
AI and Biological Risk: Forecasting Key Capability Thresholds
Why Future AIs will Require New Alignment Methods
Materialist Semiotics and the Nature of Qualia
Patience and Willingness to Be Slow
You should probably prepare for your chats to be fed to AI
We won’t get docile, brilliant AIs before we solve alignment
Labs lack the tools to course-correct
The Liberty Tractor
At odds with the unavoidable meta-message
Stars are a rounding error
Towards a Typology of Strange LLM Chains-of-Thought
Training Qwen-1.5B with a CoT legibility penalty
Would concentration of power be bad, given offense-dominant weapons?
Interview with a drone expert on the future of AI warfare
Investigating Neural Scaling Laws Emerging from Deep Data Structure
I take antidepressants. You’re welcome
Training fails to elicit subtle reasoning in current language models
Realistic Reward Hacking Induces Different and Deeper Misalignment
Why am I not currently starting a religion around AI or similar topics?
THE WISE, THE GOOD, THE POWERFUL
Hubris & Control
"Yes, and—" Requires the Possibility of "No, Because—"
Four Questions to Refine Your Policy Proposal
A Snippet On The Epistemically Hygienic Containment Of Faith-In-Reason-Itself
Alignment progress doesn’t compensate for higher capabilities
The Necronomicon Is Already Read
Biouploading: Preserving My Living Neurons and Connectome as a Spatially Distributed Mesh
self reflections of a striver
Hospitalization: A Review
AI #137: An OpenAI App For That
Are We Leaving Literature To The Psychotic?
Lessons from the Mountains
Probabilistic Societies
Inverting the Most Forbidden Technique: What happens when we train LLMs to lie detectably?
NEPA, Permitting and Energy Roundup #2
What shapes does reasoning take but circular?
The Oracle's Gift
Thinking Mathematically - Convergent Sequences
The Relationship Between Social Punishment and Shared Maps
IABIED: Paradigm Confusion and Overconfidence
The Wise Baboon of Loyalty
Spooky Collusion at a Distance with Superrational AI
The Architecture of the Narcissistic False Self
Reflections on The Curve 2025
2025-10-12 - London rationalish meetup - Periscope
Plans A, B, C, and D for misalignment risk
Three Paths Through Manifold
Halfhaven Digest #1
The "cool idea" bias
Irresponsible Companies Can Be Made of Responsible Employees
Heaven, Hell, and Mechanics
10 Ways to Waste a Decade
You Should Get a Reusable Mask
Replacing RL w/ Parameter-based Evolutionary Strategies
Intent alignment seems incoherent
The Tower of Babel in Reverse
Petri: An open-source auditing tool to accelerate AI safety research
Bending The Curve
Kairos is hiring: Founding Generalist & SPAR Contractor
Messy on Purpose: Part 2 of A Conservative Vision for the Future
Going Phoneless
The Alignment Paradox: Why Transparency Can Breed Deception
Notes on "Homology, Genes and Evolutionary Innovation"
Research Robots: When AIs Experiment on Us
Top Warning Signs Your Friends are Being Oneshotted By AI
LLMs as a limiter of social intercourse
Generalization and the Multiple Stage Fallacy?
Telling the Difference Between Memories & Logical Guesses
Notes from European Progress Conference
"Intelligence" -> "Relentless, Creative Resourcefulness"
Chaos Alone is No Bar to Superintelligence
We won’t get AIs smart enough to solve alignment but too dumb to rebel
Notes on the need to lose
Excerpts from my neuroscience to-do list
Experience Report - ML4Good Bootcamp Singapore, Sep'25
Gradual Disempowerment Monthly Roundup
Subliminal Learning, the Lottery-Ticket Hypothesis, and Mode Connectivity
The Origami Men
Medical Roundup #5
Why I think ECL shouldn’t make you update your cause prio
Did Tyler Robinson carry his rifle as claimed by the government?
AI Science Companies: Evidence AGI Is Near
LLMs one-box when in a "hostile telepath" version of Newcomb's Paradox, except for the one that beat the predictor
Alignment Faking Demo for Congressional Staffers
Do Things for as Many Reasons as Possible
One Does Not Simply Walk Away from Omelas
The quotation mark
The Sadism Spectrum and How to Access It
Maybe social media algorithms don't suck
Base64Bench: How good are LLMs at base64, and why care about it?
Behavior Best-of-N achieves Near Human Performance on Computer Tasks
How likely are “s-risks” (large-scale suffering outcomes) from unaligned AI compared to extinction risks?
LLMs are badly misaligned
The Counterfactual Quiet AGI Timeline
AISafety.com Reading Group session 328
Open Philanthropy’s Biosecurity and Pandemic Preparedness Team Is Hiring and Seeking New Grantees
$250 bounties for the best short stories set in our near future world & Brooklyn event to select them
How the NanoGPT Speedrun WR dropped by 20% in 3 months
a quick thought about AI alignment
Making Your Pain Worse can Get You What You Want
Markets in Democracy: What happens when you can sell your vote?
What I've Learnt About How to Sleep
The 'Magic' of LLMs: The Function of Language
To my client having a sudden ‘bad day’:
Consider Doing Small Walks at Work
Where does Sonnet 4.5's desire to "not get too comfortable" come from?
Goodness is harder to achieve than competence
Good is a smaller target than smart
Making Sense of Consciousness Part 6: Perceptions of Disembodiment
Recent AI Experiences