Joseph Carlsmith RSS Feed
Giving AIs safe motivations
Video and transcript of talk on “Can goodness compete?”
Video and transcript of talk on AI welfare
The stakes of AI moral status
Video and transcript of talk on automating alignment research
Can we safely automate alignment research?
AI for AI safety
Paths and waystations in AI safety
When should we worry about AI power-seeking?
How do we solve the alignment problem?
What is it to solve the alignment problem?
Fake thinking and real thinking
Takes on “Alignment Faking in Large Language Models”
Video and transcript of presentation on Otherness and control in the age of AGI
(Part 2, AI takeover) Extended audio/transcript from my conversation with Dwarkesh Patel
(Part 1, Otherness) Extended audio/transcript from my conversation with Dwarkesh Patel
Loving a world you don’t trust
On attunement
Video and transcript of presentation on Scheming AIs
On green
On the abolition of man
Being nicer than Clippy
An even deeper atheism
Does AI risk “other” the AIs?
When “yang” goes wrong
Deep atheism and AI risk
Gentleness and the artificial Other
Otherness and control in the age of AGI
New report: “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Superforecasting the premises in “Is power-seeking AI an existential risk?”
In memory of Louise Glück
Predictable updating about AI risk
Existential Risk from Power-Seeking AI (shorter version)
A Stranger Priority? Topics at the Outer Reaches of Effective Altruism
Seeing more whole
Why should ethical anti-realists do ethics?
On sincerity
Against meta-ethical hedonism
Against the normative realist’s wager
Is Power-Seeking AI an Existential Risk?
Video and Transcript of Presentation on Existential Risk from Power-Seeking AI
Dutch books, Cox, and Complete Class
VNM, separability, and more
Why it can be OK to predictably lose
Skyscrapers and madmen
Simulation arguments
On infinite ethics
The ignorance of normative realism bot
Morality and constrained maximization, part 2
Morality and constrained maximization, part 1
Anthropics and the Universal Distribution
On the Universal Distribution
In defense of the presumptuous philosopher
An aside on betting in anthropics
Telekinesis, reference classes, and other scandals
Learning from the fact that you exist
Can you control the past?
In search of benevolence (or: what should you get Clippy for Christmas?)
On the limits of idealized values
Problems of evil
The innocent gene
The importance of how you weigh it
On future people, looking back at 21st century longtermism
Against neutrality about creating happy lives
Care and demandingness
Subjectivism and moral authority
Two types of deference
Contact with reality
Killing the ants
Believing in things you cannot see
On clinging
Actually possible: thoughts on Utopia
Shouldn’t it matter to the victim?
The despair of normative realism bot
A ghost
Alienation and meta-ethics (or: is it possible you should maximize helium?)
Wholehearted choices and “morality as taxes”
Thoughts on being mortal
Grokking illusionism
The impact merge
Thoughts on personal identity
How core is confusion about consciousness?
To light a candle
The gestures of trees
Mistaking the plot
How much computational power does it take to match the human brain?