Podcast: Play in new window | Download
What conclusions can you draw from the fact that you’re here?
The Anthropic Principle – 5 short examples
Anthropics and Biased Models – “we can’t use anthropic reasoning on a single side of a model”
An Anthropic Principle Fairy Tale
Hanson’s “Great Filter” and does the Anthropic Principle affect it?
Max Tegmark’s mathematical universe levels
New Death Note one-shot chapter
Key & Peele get tone confused over texts
Hey look, we have a discord! What could possibly go wrong?
Rationality: From AI to Zombies, The Podcast… and the other podcast
LessWrong posts Discussed in this Episode:
We didn’t get around to discussing any LessWrong posts in this episode. Whoops!
Next Episode’s Sequence Posts:
Big thanks to David for our intro music! Check out his music and VFX here!
We’d like to thank creators of our new outro music from the Sumerki Project! Check out their stuff here!
Pingback: Rational Newsletter | Issue #101
I am a Bayesian statistician by training, and when I listened to this episode, the vocabulary tripped me up. Let’s talk about the statement, “we can’t know for sure because we don’t have any priors”. I think we should be clear about what is the prior, what is the dataset, and what we are predicting.
Prior distributions are not the same thing as data or evidence. We can always come up with a prior, but we cannot magic up data. I think that last point is what you were trying to say.
Here is one way to look at the anthropic principle. Suppose we observe n universes and find that y of them could support life. The finite quantities n and y are the data, and as Bayesians, we assume they are fixed and known. In the real world, y = 1 and n = 1 because we only know about our own universe, but we can pursue thought experiments with n > 1. Define theta as the unknown fraction of all life-capable universes. Theta is the model parameter, and it captures what we want to know.
Now let’s talk about the prior distribution of theta. The prior should have a density function from 0 to 1 that integrates to 1. A convenient choice (but by no means the only choice) is a beta distribution Beta(a, b) with shape parameters a and b. There are many choices for a and b, and there is no objectively correct answer. a = b = 1 is the uniform distribution, and a = b = 1/2 is the Jeffreys prior, both of which are common choices for non-informative/uninformative/diffuse/”objective” priors. Alternatively, we could elicit a and b from expert physicists, or we could assert a skeptical prior like Beta(1, 100), or an optimistic prior like Beta(100, 1). In practice, professional Bayesian statisticians run the same analysis with a variety of different priors to assess how much the conclusions change, a technique called sensitivity analysis.
However we pick the prior, we do not use the data to do it. The prior distribution is literally prior to any data, and we should come up with several alternatives before we ever see the observed values of y or n. (Let’s ignore empirical Bayes, which is a computational shortcut that would not help us with such a simple model anyway.)
For inference, we use the posterior distribution of theta given y and n. If we integrate the posterior density from 0 to 0.1, we get the posterior probability that less than 10% of all universes could support life. Prediction is another matter entirely. To predict the status y~ of a set of n~ unobserved universes, we use the posterior predictive distribution, which is different from the ordinary posterior distribution.
A correction: sometimes it is appropriate to use historical data to construct the prior. In practice, however, this is only done when it does not make sense to pool the historical data and current data together for the main analysis. For example, if you are doing a survey among US citizens and you define the parameters of interest for the US only, you might still borrow equivalent survey results from Australia just to build the prior.