Bayesian Data Analysis for Speech Sciences

# Bayesian Data Analysis for Speech Sciences
## Priors and Bayesian updating
### Timo Roettger, Stefano Coretta and Joseph Casillas
### LabPhon workshop
### 2021/07/05 (updated: 2021-07-02)

---

???

We are faced every day with probabilities. Just think about the weather forecast.

We say things like t"here is a 70% prob that it will rain today". In this sense, probability is the probability of an event occurring.

But what about more complex situations that are not a flip-of-coin kinda situation? For example what about rolling two dice?

Here is where probability distributions come in.

---

# Grubabilities

&nbsp;

???

A probability distribution is a list of values and their corresponding probability.

---

# Discrete and continuous

???

Depending on the nature of the values a variable can take, there are 2 types of probs.

---

# Discrete probability distributions

???

A discrete probability distributions is like counting how many ways you can get a particular value.

For example, if you roll a white and a black dice, there are 3 ways to get a 4 or a 10, but 6 ways to get a 7.

---

# Discrete probability distributions

???

The Poisson is the probability distribution of counts (like counting eyes, number of fillers in speech, etc).

---

# Continuous probability distributions

???

With continuous probabilities we cannot make a list of all the possible values (0.0, 0.00, 0.000, 0.0001...), because there is an infinite number of possible values. So we cannot assign a probability to a specific value.

Instead, we assign probabilities to a range of values.

---

# Continuous probability distributions

???

In this example, we want to know the probability of getting a mean f0 between 0 and 160 Hz.

We simply calculate the area under the curve between those two values (the total area under the curve is 1).

The probability of f0 being less than 160 Hz is 0.212.

---

# Continuous probability distributions

???

The probability of f0 being greater than 220 Hz is 0.345.

---

# Continuous probability distributions

???

The probability of f0 being between 120 and 210 Hz is 0.524

---

# Continuous probability distributions

???

There are many other types of continuous probability distributions

This is the beta distribution. It's bounded between 0 and 1, and it's used for example with proportions and percentages (which can take on any value between 0-1 and 0-100%).

---

???

But how do we describe probability distributions? We can't make a list of all values and probabilities, especially for continuous probabilities.

Instead, we specify the value of a few parameters that describe the distribution in a succinct way.

---
class: middle

$$y_i \sim Normal(\mu, \sigma)$$

???

Let's look at some formulas.

This is the formula of a variable `$y_1$` that is distributed according to (~) a Normal probability distribution.

As we have seen in the example above, a Normal distribution can be described with two parameters: the mean and the standard deviation.

---
class: middle

$$\text{f0}_i \sim Normal(200, 50)$$

???

Remember the example above of a Gaussian/Normal distribution of f0?

We can describe that distribution with this formula (much easier than listing all the values and their probability).

---

???

Nothing new here. Just the distribution as we've seen before.

---

class: inverse center bottom
background-image: url("img/chris-robert-unsplash.jpg")

# BREAK

---

# Memory loss

https://thinking.umwblogs.org/2020/02/26/goldfish-memory/

???

Frequentist statistics suffers from "memory loss": results of past studies are "forgotten".

One important aspect of Bayesian analysis is prior knowledge (you've seen that with the Bayes theorem).

We can take prior knowledge, or belief, into consideration thanks to "Bayesian belief updating".

---

# Bayesian belief update

???

How do we ensure that prior knowledge is not lost?

You've seen in Session 01 that Bayesian analysis is about estimating the posterior probability distribution of a variable of interest.

Roughly speaking, the posterior probability distribution is the combination of the prior belief and the evidence derived from the data.

Prior and evidence are, of course, probability distributions.

---

---

# Prior belief as probability distributions

&nbsp;

$$\text{articulation_rate}_i \sim Normal(\mu, \sigma)$$

???

Our prior belief about articulation rate is that it is distributed according to a Normal (aka Gaussian) distribution.

The Normal distribution has two parameters: mean `$\mu$` and standard deviation `$\sigma$`.

Now, we want to estimate these two parameters from the data.

---

# Prior belief as probability distributions

$$\text{articulation_rate}_i \sim Normal(\mu, \sigma)$$

$$\mu = ...?$$

???

We do have an idea of what the mean articulation rate could be like but we are not certain.

When you are not certain, you make a list of values and their probability, i.e. a probability distribution!

---

# Prior belief as probability distributions

$$\text{articulation_rate}_i \sim Normal(\mu, \sigma)$$

$$\mu \sim Normal(\mu_1, \sigma_1)$$

???

Usually, we assume the mean to be a value taken from another Normal distribution (with its own mean and SD).

This Normal distribution is the **prior probability distribution** (or simply prior) of the mean.

---

# Prior belief as probability distributions

$$\text{articulation_rate}_i \sim Normal(\mu, \sigma)$$

$$\mu \sim Normal(0, \sigma_1)$$

???

We will talk about different types of priors later. For now it's sufficient to remember that a conservative approach (which is what we want to do) is to set `$\mu_1$` to 0.

$$\sigma_1 = ...?$$

???

What about `$\sigma_1$`?

---

# The empirical rule

???

As a general rule, `$\pm2\sigma_1$` covers 95% of the Normal distribution, which means we are 95% confident that the value lies within that range.

Let's be generous and assume that the mean articulation rate will definitely not be greater than 30 syllables per second. (This might seem like a very high number, but we will see tomorrow how it can help estimation)

30/2 = 15 (remember, two times the SD), so we can set `$\sigma_1 = 15$`.

---

# Seeing is believing

???

Visualising priors is important, because it's easier to grasp its meaning when you can actually see the shape of the distribution.

Are you wondering about the negative values? (Articulation rate cannot be negative!) I will tell more about this and how to "fix" it tomorrow.

---

# LIVE CODING

---

# And `$\sigma$`?

$$\text{articulation_rate}_i \sim Normal(\mu, \sigma)$$

$$\mu \sim Normal(0, 15)$$

$$\sigma = ...?$$

???

Now, what about `$\sigma$`? (Be careful not to mix up `$\sigma$` and `$\sigma_1$`! This is `$\sigma$` from the first line, not `$sigma_1$` from the second line).

---

# EXERCISE

???

Find out in the exercise!

Run `open_exercise(3)` in the console to open the exercise.