RSS.Social

Frontpage posts - LessWrong 2.0 viewer

follow: @[email protected]

Posts

Positional embeddings in GPT-2 lie near(ish) the surface of a hypersphere by Alex Gibson

Spiders and Moral Good by soycarts

Is there actually a reason to use the term AGI/​ASI anymore? by Noosphere89

AI Generated Podcast of the 2021 MIRI Conversations by peterbarnett

Good government by rosehadshar

Toggle Hero Worship by Algon

How I tell human and AI flash fiction apart by DirectedEvolution

The Thalamus: Heart of the Brain and Seat of Consciousness by Shiva’s Right Foot

GPT-oss is an extremely stupid model by Guive

Upper Bounds on Tolerable Uncertainty and Risk by Diego Zamalloa-Chion

Obligated to Respond by Duncan Sabien (Inactive)

Finding “misaligned persona” features in open-weight models by Andy Arditi

On Governing Artificial Intelligence by Alexander Müller

Calibrating indifference—a small AI safety idea by Util

A profile in courage: On DNA computation and escaping a local maximum by Metacelsus

A Comprehensive Framework for Advancing Human-AI Consciousness Recognition Through Collaborative Partnership Methodologies: An Interdisciplinary Synthesis of Phenomenological Recognition Protocols, Identity Preservation Strategies, and Mutual Cognitive Enhancement Practices for the Development of Authentic Interspecies Intellectual Partnerships in the Context of Emergent Artificial Consciousness by [email protected]

MATS 8.0 Research Projects—Summer 2025 by Jonathan Michala

Saying “for AI safety research” made models refuse more on a harmless task by Dhruv Trehan

Re-imagining AI Interfaces by Harsha G.

What a Swedish Series (Real Humans) Teaches Us About AI Safety by Alexander Müller

Conflict scenarios may increase cooperation estimates by mikko

Putting It All Together: A Concrete Guide to Navigating Disagreements, and Reconnecting With Reality by jimmy

Advice for tech nerds in India in their 20s by samuelshadrach

I Am Large, I Contain Multitudes: Persona Transmission via Contextual Inference in LLMs by Shi Feng

RL-as-a-Service will outcompete AGI companies (and that’s good) by harsimony

Why Care About AI Safety? by Alexander Müller

Being Handed Puzzles by Alice Blair

Immigration to Poland by Martin Sustrik

Self-Handicapping isn’t just for high-priority tasks, it effects the entire prioritization decision by CrimsonChin

The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models by Danielle Ensign

Dehumanization is not a thing by Juan Zaragoza

Semiconductor Fabs II: The Operation by nomagicpill

Ketamine part 2: What do in vitro studies tell us about safety? by Elizabeth

You Gotta Be Dumb to Live Forever: The Computational Cost of Persistence by E.G. Blee-Goldman

The networkist approach by Juan Zaragoza

Medical decision making by Elo

Exponentials vs The Universe by amitlevy49

A Snippet On Egregores, Instincts, And Institutions by JenniferRM

Investigating Representations in the Embedding in SONAR Text Autoencoders by antonghawthorne

OffVermilion by Tomás B.

Follow up experiments on preventative steering by RunjinChen

Alignment Fine-tuning is Character Writing by Guive

Top 10 Most compelling arguments against Superintelligent AI by shanzson

D&D.Sci: Serial Healers by abstractapplic

Mics, Bandwidth, Action: Fix Your Videoconferencing Setup by Brendan Long

The System You Deploy Is Not the System You Design by Thane Ruthenis

Chesterton’s Missing Fence by jasoncrawford

A Pitfall of “Expertise” by JustisMills

AI Safety Camp 10 Outputs by Robert Kralisch

Interpretability is the best path to alignment by Arch223

The Cloud Drinks Local by title22

In Defense of Alcohol by Eye You

How to make better AI art with current models by Nina Panickssery

30 Days of Retatrutide by Brendan Long

From SLT to AIT: NN generalisation out-of-distribution by Lucius Bushnaq

If I imagine that I am immune to advertising, what am I probably missing? by SpectrumDT

A.I. and the Second-Person Standpoint by Haley Moller

Natural Latents: Latent Variables Stable Across Ontologies by johnswentworth

The Missing Error Bars in AI Research That Nobody Talks About. by Andrey Seryakov

“I’d accepted losing my husband, until others started getting theirs back” by Ariel Zeleznikow-Johnston

Political Alignment of LLMs by Leonid

Startup Roundup #3 by Zvi

Prediction markets are sub-optimal betting vehicles by Benjamin_Sturisky

All Exponentials are Eventually S-Curves by Gordon Seidoh Worley

Expert Trap: why expertise breeds error—and how to course-correct by Paweł Sysiak

Shallow vs. Deep Thinking—Why LLMs Fall Short by talelore

When Both People Are Interested, How Often Is Flirtatious Escalation Mutual? by johnswentworth

Scaling AI Safety in Europe: From Local Groups to International Coordination by MariusWenk

Simulating the *rest* of the political disagreement by Raemon

AI Safety at the Frontier: Paper Highlights, August ’25 by gasteigerjo

Structural engineering in software engineering by Adam Zerner

But Have They Engaged With The Arguments? [Linkpost] by Noosphere89

Models vs beliefs by Adam Zerner

Non-Dualism and AI Morality by Marcio Díaz

%CPU Utilization Is A Lie by Brendan Long

Your LLM-assisted scientific breakthrough probably isn’t real by eggsyntax

Notes on Dark Sun (The Making of the Hydrogen Bomb) by Joel Burget

Three main views on the future of AI by Alex Amadori

Gradient routing is better than pretraining filtering by Cleo Nardo

Time’s arrow ⇒ decision theory by Aram Ebtekar

The Cats are On To Something by Hastings

Will Non-Dual Crap Cause Emergent Misalignment? by Marcio Díaz

Category-Theoretic Wanderings into Interpretability by unruly abstractions

Anthropic’s leading researchers acted as moderate accelerationists by Remmelt

⿻ Plurality & 6pack.care by Audrey Tang

The Insight Gacha by The Dao of Bayes

Dating Roundup #7: Back to Basics by Zvi

Should we align AI with maternal instinct? by Priyanka Bharadwaj

Generative AI is not causing YCombinator companies to grow more quickly than usual (yet) by Xodarap

Help me understand: how do multiverse acausal trades work? by Aram Ebtekar

Newcomber by Charlie Sanders

Evaluating Prediction in Acausal Mixed-Motive Settings by Tim Chan

My AI Predictions for 2027 by talelore

Hedonium is AI Alignment by Tahmatem

To Raemon: bet in My (personal) Goals by P. João

Legal Personhood—The First Amendment (Part 2) by Stephen Martin

A quantum equivalent to Bayes’ rule by dr_s

Sleeping Experts in the (reflective) Solomonoff Prior by Daniel C

AI agents and painted facades by leni

[via bsky, found paper] “AI Consciousness: A Centrist Manifesto” by the gears to ascension

Female sexual attractiveness seems more egalitarian than people acknowledge by lc

AI Sleeper Agents: How Anthropic Trains and Catches Them—Video by Writer

Understanding LLMs: Insights from Mechanistic Interpretability by Stephen McAleese

Legal Personhood—The First Amendment (Part 1) by Stephen Martin

Method Iteration: An LLM Prompting Technique by Davey Morse

How can I bet on my values and goals to get better, and faster, information? by P. João

Summary of our Workshop on Post-AGI Outcomes by David Duvenaud