Frontpage posts - LessWrong 2.0 viewer
Anhedoniapolis by Alex Beyman
Thinking through a lens of physiology by Vadim Golub
An Approach for Evaluating Self-Boundary Consistency in AI Systems by Anurag
A Catalog of AI Evaluations by Anurag
If It Can Learn It, It Can Unlearn It: AI Safety as Architecture, Not Training by Timothy Danforth
Thinking in Predictions by Julius
About Natural & Synthetic Beings (Interactive Typology) by Anurag
Have there been any rational analyses of mindbody techniques for chronic pain/illness? by Liface
Eval-unawareness ≠ Eval-invariance by Mo Baker
Arch-anarchy, the end of state and digital anarchism by Peter lawless
on self-knowledge by Vadim Golub
Shaping Model Cognition Through Reflective Dialogue—Experiment & Findings by Anurag
42, and expanding human consciousness by Daniel Jacobson
Adding Empathy as a Tool for LLMs by RobinHa
GiveCalc: Open-source tool to calculate the true cost of charitable giving by Max Ghenis
Open Thread Winter 2025/26 by kave
Practical AI risk II: Training transparency by Gustavo Ramires
GRPO is terrible by RobinHa
The 2024 LessWrong Review by RobertM
Slack Observability by Alice Blair
Do Language Models Really Learn to Mislead Humans via RLHF? by Aaryan Chandna
The Glasses on Your Face by Casey_
Sutskever Refuses to Answer the Q: How Will AGI Be Built? He Has No Answer by Oscar Davies
Why people like your quick bullshit takes better than your high-effort posts by eukaryote
Ruby’s Ultimate Guide to Thoughtful Gifts by Ruby
MIT Lincoln Laboratory & The Dual-Use Frontier
by Austin Morrissey
Anti-Foom Anti-Doom by homosapien97
Books cover a larger idea-space than movies by beyarkay
Paper Review: Must Rhodes Fall? Differing responses to contentious monumental public art by soycarts
Bringing organ-scale cryopreservation into existence by Abhishaike Mahajan
AI Sentience and Welfare Misalignment Risk by ejk64
Be Naughty by habryka
Eight Heuristics of Anti-Epistemology by Ben Pace
The Worrying Nature of Akrasia by Notelrac
Can questions rigidly designate intentions? File under: Rolling my own metaethics by Mason Broxham
Week 3: Adversarial Robustness by Ely Hahami
Informed Consent as the Sole Criterion for Medical Treatment by Chara#2736
How you got RL’d into your idiosyncratic cognition by Ruby
Defining AI Truth-Seeking by What It Is Not by Tianyi (Alex) Qiu
The Moss Fractal: How Care Regulates Functional Awareness from Microbes to AI by Lcofa
One King Upon The Chessboard by Screwtape
Evrart Claire: A Case Study in Anti-Epistemology by Ben Pace
Out-paternalizing the government (getting oxygen for my baby) by Ruby
KPD is a weak obstruction by JustinSheek
Habermasian Alignment by kenneth myers
Reading LLM chain of thought makes me more rational by Michael Steele
Don’t grow your org fast by Ruby
The Barriers to Your Unemployment by claywren
Cooling the way is the obvious way to boost human IQ by Michael Steele
Close open loops by habryka
Toddler Shoggoth Has Plenty Of Raw Material (The Memetic Cocoon Threat Model) by KAP
The skills and physics of high-performance driving, Pt. 2 by Ruby
Wiki AI by abramdemski
An introduction to the invariants of Special Relativity and the geometry of Spacetime by Horosphere
Darwin’s LLMs—Natural Selection is Already Shaping AI by Ben Turtel
No one understands that tomorrow will not be today and I don’t know what to do. by AdamLacerdo
Biology of the Living—A Conversation with two generations of Google AI by matthew allen
Diagonalization: A (slightly) more rigorous model of paranoia by habryka
Put numbers on stuff, all the time, otherwise scope insensitivity will eat you by habryka
Increasing returns to marginal effort are common by habryka
Thoughts are surprisingly detailed and remarkably autonomous by Ruby
Tools for deferring gracefully by TsviBT
What’s so hard about...? A question worth asking by Ruby
A Timeless Universe Viewed From the Inside by 0xA
Strengthening Red Teams: A Modular Scaffold for Control Evaluations by Chloe Loughridge
Don’t cancel out your rewards! by Sneha Bangalore
On model weight preservation: Anthropic’s new initiative by Olle Häggström
Pause from Behind / Losing Heroically by enterthewoods
Learning information which is full of spiders by Screwtape
Introspection or confusion? by Victor Godet
Structural Estimates of Human Computation by Nicolas Villarreal
Can Models be Evaluation Aware Without Explicit Verbalization? by gersonkroiz
Cake vs Lack of Cake by Notelrac
Cross-Model Semantic Convergence Across Independent LLM Architectures (Preliminary Data + Replication Request) by viniburilux
The Decalogue For Aligned AI. by theophilus tabuke
Analytical Validation of Biomarkers is Not the Full Story by mnarayan
Using the probabilistic method to bound the performance of toy transformers by Alex Gibson
Control by Committee by Alexander Bistagne
Using math to foster acceptance and equality by jackoda
Being “Usefully Concrete” by Raemon
Questions About Outperforming Common Wisdom by Notelrac
Metaphors for Biology: Sizes by Niko McCarty
You think you are in control? by Recurrented
“What’s hard about this? What can I do about that?” (Recursive) by Raemon
Erasmus: Social Engineering at Scale by Martin Sustrik
Time, Panpsychism, and Substrate Independence by kihara.sofia
Introducing Project Telos by Mario Giulianelli
Critical Fallibilism and Theory of Constraints in One Analyzed Paragraph by Elliot Temple
Transcendental Argumentation and the Epistemics of Discourse by 0xA
ImpossibleBench: Measuring Reward Hacking in LLM Coding Agents by Ziqian Zhong
LLM Hallucinations: An Internal Tug of War by violazhong
An Opinionated Guide to Privacy Despite Authoritarianism by TurnTrout
How Do We Evaluate the Quality of LLMs’ Mathematical Responses? by Miguel Angel
Visualizing a Platform for Live World Models by Kuil
Why Would we get Inner Misalignment by Default? by Coil
Rational Emotivism by Notelrac
by CMDiamond
Paper: Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization by antmaier
What were mistakes of AI Safety field-building? How can we avoid them while we build the AI Welfare? by Güney Türker
Resolving Newcomb’s Problem Perfect Predictor Case by Praphull Kabtiyal
[CS 2881r] Can We Prompt Our Way to Safety? Comparing System Prompt Styles and Post-Training Effects on Safety Benchmarks by hughvd
Transactional method for non-transactional relationship: Relationship as a Common-pool Resource problem by Source Wishes
When a reasoning model uses CoT as a tool to produce a refusal through invented policy violations by Gabriella Baris
I will not sign up for cryonics by Syd Lonreiro_
Can AI Agents with Divergent Interests Learn To Prevent Civilizational Failures? by joao_abrantes
The Doomers Were Right by Algon
Moving Faster on AI Safety: Summary and Key Insights from 10/10/25 Safety Acceleration Symposium by Martin Leitgab
Dead-switches as AI safety tools by Jesper L.
The Cloud industry architecture [Infra-Platform-App] is unlikely to replicate for AI by Armchair Descending
Why deep space programs select for calm agreeable introverted candidates by David Sun
[Thought Experiment] If Human Extinction “Improves the World,” Should We Oppose It? Species Bias and the Utilitarian Challenge by satopi
The idea of paradigm testing of LLMs by Daniel Fenge
Conjecture: Emergent φ is provable in Large Language Models by BarnicleBarn
The Complex Universe Theory of AI Psychology by Andrew Tomazos
Enhancing Genomic Foundation Model Robustness through Iterative Black-Box Adversarial Training by Jeyashree Krishnan
When a reasoning model uses CoT as a tool to produce a refusal through invented policy violations by Gabriella Baris
The increase in the number of Everett branches could be very important. by Crazy philosopher
If Anyone Builds It Everyone Dies, a semi-outsider review
by dvd
Words make us Dumb #1: The “Point”lessness of Knowledge by Enmai.MCimbu
Live Governance: AI tools for coordination without centralisation by mbuch
[CS 2881r] [Week 6] Recursive Self-Improvement by Joshua Qin
Designing for perpetual control by Remmelt
“Naive Consequentialism” as a Thought-Terminating cliche by Jacob Goldsmith
The Moral Infrastructure for Tomorrow by sdeture
Assuring Agent Safety Evaluations By Analysing Transcripts by bigmouthmodels
How we’ll make all world leaders work together to make the world better by Wes R
Sandbagging: distinguishing detection of underperformance from incrimination, and the implications for downstream interventions. by lennie
What can Canadians do to help end the AI arms race? by Tom938
17 years old, self-taught state control—looking for people who actually get this by Cornelius Caspian