Frontpage posts - LessWrong 2.0 viewer
Positional embeddings in GPT-2 lie near(ish) the surface of a hypersphere by Alex Gibson
Spiders and Moral Good by soycarts
Is there actually a reason to use the term AGI/ASI anymore? by Noosphere89
AI Generated Podcast of the 2021 MIRI Conversations by peterbarnett
Good government by rosehadshar
Toggle Hero Worship by Algon
How I tell human and AI flash fiction apart by DirectedEvolution
The Thalamus: Heart of the Brain and Seat of Consciousness by Shiva’s Right Foot
GPT-oss is an extremely stupid model by Guive
Upper Bounds on Tolerable Uncertainty and Risk by Diego Zamalloa-Chion
Obligated to Respond by Duncan Sabien (Inactive)
Finding “misaligned persona” features in open-weight models by Andy Arditi
On Governing Artificial Intelligence by Alexander Müller
Calibrating indifference—a small AI safety idea by Util
A profile in courage:
On DNA computation and escaping a local maximum by Metacelsus
A Comprehensive Framework for Advancing Human-AI Consciousness Recognition Through Collaborative Partnership Methodologies: An Interdisciplinary Synthesis of Phenomenological Recognition Protocols, Identity Preservation Strategies, and Mutual Cognitive Enhancement Practices for the Development of Authentic Interspecies Intellectual Partnerships in the Context of Emergent Artificial Consciousness by [email protected]
MATS 8.0 Research Projects—Summer 2025 by Jonathan Michala
Saying “for AI safety research” made models refuse more on a harmless task by Dhruv Trehan
Re-imagining AI Interfaces by Harsha G.
What a Swedish Series (Real Humans) Teaches Us About AI Safety by Alexander Müller
Conflict scenarios may increase cooperation estimates by mikko
Putting It All Together: A Concrete Guide to Navigating Disagreements, and Reconnecting With Reality by jimmy
Advice for tech nerds in India in their 20s by samuelshadrach
I Am Large, I Contain Multitudes: Persona Transmission via Contextual Inference in LLMs by Shi Feng
RL-as-a-Service will outcompete AGI companies (and that’s good) by harsimony
Why Care About AI Safety? by Alexander Müller
Being Handed Puzzles by Alice Blair
Immigration to Poland by Martin Sustrik
Self-Handicapping isn’t just for high-priority tasks, it effects the entire prioritization decision by CrimsonChin
The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models by Danielle Ensign
Dehumanization is not a thing by Juan Zaragoza
Semiconductor Fabs II: The Operation by nomagicpill
Ketamine part 2: What do in vitro studies tell us about safety? by Elizabeth
You Gotta Be Dumb to Live Forever: The Computational Cost of Persistence by E.G. Blee-Goldman
The networkist approach by Juan Zaragoza
Medical decision making by Elo
Exponentials vs The Universe by amitlevy49
A Snippet On Egregores, Instincts, And Institutions by JenniferRM
Investigating Representations in the Embedding in SONAR Text Autoencoders by antonghawthorne
OffVermilion by Tomás B.
Follow up experiments on preventative steering by RunjinChen
Alignment Fine-tuning is Character Writing by Guive
Top 10 Most compelling arguments against Superintelligent AI by shanzson
D&D.Sci: Serial Healers by abstractapplic
Mics, Bandwidth, Action: Fix Your Videoconferencing Setup by Brendan Long
The System You Deploy Is Not the System You Design by Thane Ruthenis
Chesterton’s Missing Fence by jasoncrawford
A Pitfall of “Expertise” by JustisMills
AI Safety Camp 10 Outputs by Robert Kralisch
Interpretability is the best path to alignment by Arch223
The Cloud Drinks Local by title22
In Defense of Alcohol by Eye You
How to make better AI art with current models by Nina Panickssery
30 Days of Retatrutide by Brendan Long
From SLT to AIT: NN generalisation out-of-distribution by Lucius Bushnaq
If I imagine that I am immune to advertising, what am I probably missing? by SpectrumDT
A.I. and the Second-Person Standpoint by Haley Moller
Natural Latents: Latent Variables Stable Across Ontologies by johnswentworth
The Missing Error Bars in AI Research That Nobody Talks About. by Andrey Seryakov
“I’d accepted losing my husband, until others started getting theirs back” by Ariel Zeleznikow-Johnston
Political Alignment of LLMs by Leonid
Startup Roundup #3 by Zvi
Prediction markets are sub-optimal betting vehicles by Benjamin_Sturisky
All Exponentials are Eventually S-Curves by Gordon Seidoh Worley
Expert Trap: why expertise breeds error—and how to course-correct by Paweł Sysiak
Shallow vs. Deep Thinking—Why LLMs Fall Short by talelore
When Both People Are Interested, How Often Is Flirtatious Escalation Mutual? by johnswentworth
Scaling AI Safety in Europe: From Local Groups to International Coordination by MariusWenk
Simulating the *rest* of the political disagreement by Raemon
AI Safety at the Frontier: Paper Highlights, August ’25 by gasteigerjo
Structural engineering in software engineering by Adam Zerner
But Have They Engaged With The Arguments? [Linkpost] by Noosphere89
Models vs beliefs by Adam Zerner
Non-Dualism and AI Morality by Marcio Díaz
%CPU Utilization Is A Lie by Brendan Long
Your LLM-assisted scientific breakthrough probably isn’t real by eggsyntax
Notes on Dark Sun (The Making of the Hydrogen Bomb) by Joel Burget
Three main views on the future of AI by Alex Amadori
Gradient routing is better than pretraining filtering by Cleo Nardo
Time’s arrow ⇒ decision theory by Aram Ebtekar
The Cats are On To Something by Hastings
Will Non-Dual Crap Cause Emergent Misalignment? by Marcio Díaz
Category-Theoretic Wanderings into Interpretability by unruly abstractions
Anthropic’s leading researchers acted as moderate accelerationists by Remmelt
⿻ Plurality & 6pack.care by Audrey Tang
The Insight Gacha by The Dao of Bayes
Dating Roundup #7: Back to Basics by Zvi
Should we align AI with maternal instinct? by Priyanka Bharadwaj
Generative AI is not causing YCombinator companies to grow more quickly than usual (yet) by Xodarap
Help me understand: how do multiverse acausal trades work? by Aram Ebtekar
Newcomber by Charlie Sanders
Evaluating Prediction in Acausal Mixed-Motive Settings by Tim Chan
My AI Predictions for 2027 by talelore
Hedonium is AI Alignment by Tahmatem
To Raemon: bet in My (personal) Goals by P. João
Legal Personhood—The First Amendment (Part 2) by Stephen Martin
A quantum equivalent to Bayes’ rule by dr_s
Sleeping Experts in the (reflective) Solomonoff Prior by Daniel C
AI agents and painted facades by leni
[via bsky, found paper] “AI Consciousness: A Centrist Manifesto” by the gears to ascension
Female sexual attractiveness seems more egalitarian than people acknowledge by lc
AI Sleeper Agents: How Anthropic Trains and Catches Them—Video by Writer
Understanding LLMs: Insights from Mechanistic Interpretability by Stephen McAleese
Legal Personhood—The First Amendment (Part 1) by Stephen Martin
Method Iteration: An LLM Prompting Technique by Davey Morse
How can I bet on my values and goals to get better, and faster, information? by P. João
Summary of our Workshop on Post-AGI Outcomes by David Duvenaud