Joseph Carlsmith RSS Feed

follow: @[email protected]

Posts

On restraining AI development for the sake of safety

Building AIs that do human-like philosophy

Video and transcript of talk on human-like-ness in AI safety

How human-like do safe AI motivations need to be?

Leaving Open Philanthropy, going to Anthropic

Controlling the options AIs can pursue

Video and transcript of talk on giving AIs safe motivations

Giving AIs safe motivations

Video and transcript of talk on “Can goodness compete?”

Video and transcript of talk on AI welfare

The stakes of AI moral status

Video and transcript of talk on automating alignment research

Can we safely automate alignment research?

AI for AI safety

Paths and waystations in AI safety

When should we worry about AI power-seeking?

How do we solve the alignment problem?

What is it to solve the alignment problem?

Fake thinking and real thinking

Takes on “Alignment Faking in Large Language Models”

Video and transcript of presentation on Otherness and control in the age of AGI

(Part 2, AI takeover) Extended audio/transcript from my conversation with Dwarkesh Patel

(Part 1, Otherness) Extended audio/transcript from my conversation with Dwarkesh Patel

Loving a world you don’t trust

On attunement

Video and transcript of presentation on Scheming AIs

On green

On the abolition of man

Being nicer than Clippy

An even deeper atheism

Does AI risk “other” the AIs?

When “yang” goes wrong

Deep atheism and AI risk

Gentleness and the artificial Other

Otherness and control in the age of AGI

New report: “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Superforecasting the premises in “Is power-seeking AI an existential risk?”

In memory of Louise Glück

Predictable updating about AI risk

Existential Risk from Power-Seeking AI (shorter version)

A Stranger Priority? Topics at the Outer Reaches of Effective Altruism

Seeing more whole

Why should ethical anti-realists do ethics?

On sincerity

Against meta-ethical hedonism

Against the normative realist’s wager

Is Power-Seeking AI an Existential Risk?

Video and Transcript of Presentation on Existential Risk from Power-Seeking AI

Dutch books, Cox, and Complete Class

VNM, separability, and more

Why it can be OK to predictably lose

Skyscrapers and madmen

Simulation arguments

On infinite ethics

The ignorance of normative realism bot

Morality and constrained maximization, part 2

Morality and constrained maximization, part 1

Anthropics and the Universal Distribution

On the Universal Distribution

In defense of the presumptuous philosopher

An aside on betting in anthropics

Telekinesis, reference classes, and other scandals

Learning from the fact that you exist

Can you control the past?

In search of benevolence (or: what should you get Clippy for Christmas?)

On the limits of idealized values

Problems of evil

The innocent gene

The importance of how you weigh it

On future people, looking back at 21st century longtermism

Against neutrality about creating happy lives

Care and demandingness

Subjectivism and moral authority

Two types of deference

Contact with reality

Killing the ants

Believing in things you cannot see

On clinging

Actually possible: thoughts on Utopia

Shouldn’t it matter to the victim?

The despair of normative realism bot

A ghost

Alienation and meta-ethics (or: is it possible you should maximize helium?)

Wholehearted choices and “morality as taxes”

Thoughts on being mortal

Grokking illusionism

The impact merge

Thoughts on personal identity

How core is confusion about consciousness?

To light a candle

The gestures of trees

Mistaking the plot

How much computational power does it take to match the human brain?