Bayesian Data Analysis for Speech Sciences

class: center, middle, inverse, title-slide

# Bayesian Data Analysis for Speech Sciences
## Bayesian inference
### Timo Roettger, Stefano Coretta and Joseph Casillas
### LabPhon workshop
### 2021/07/06 (updated: 2021-07-03)

---

background-image: url(https://images.unsplash.com/reserve/oIpwxeeSPy1cnwYpqJ1w_Dufer%20Collateral%20test.jpg?ixlib=rb-1.2.1&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=1516&q=80)
background-size: 1200px

???

We have been motivating a framework that emphasizes describing our statistical inferences by quantifying uncertainty

This is in contrast to the null hypothesis significance testing framework that was laid out by Timo yesterday and mentioned briefly today

Now we will turn our attention to making statistical inferences under a Bayesian framework

Currently, there are many methods to choose from and I will describe what we believe to be those that are the most common, though this overview will not be exhaustive

You can think of these methods as new tools that we will be putting in our toolbox.

I don't believe that any single tool is better than the others, though one may be more appropriate for the types of questions you are interested in answering.

---
class: title-slide-section-grey, middle

# .RUred[Credible intervals]

---

# Credible intervals

???

We'll start by talking about credible intervals.

If you have heard of Bayesian inference before, there is a good chance you have also heard of credible intervals

We could describe them as the Bayesian counterpart to confidence intervals under a frequentist framework.

Some might say the difference is merely philosophical, but I won't go into that now.

Crucially, credible intervals are what most people think confidence intervals represent: that is, the interval within which an observed parameter estimate falls with a particular probability

When we describe a posterior distribution we typically say that a specified % of the posterior falls within a certain range

In this plot I am using color highlight part of the posterior that falls within a certain range

---
class: middle

# Credible intervals

???

To give a more concrete example, you might hear somebody say something like: the probability that B ("a value") is between 10 and 20 is .95 (or "we're 95% sure that the value is between 10 and 20")

This is an informative way to quantify uncertainty when describing a posterior distribution

This is the case with the plot you see here... we're 95% certain that the value of beta falls between 10 and 20

As I alluded to before, you don't get this same type of interpretation with a traditional confidence interval

---
class: middle

# Credible intervals

???

Some researchers use a 95% CI to make statistical inferences by saying something like "if the 95% credible interval of the posterior distribution for a given parameter does not include 0, then we will consider that to be compelling evidence for an effect"

That would certainly be the case in the plot we see here, as the entire posterior falls far from 0, but obviously this is not always the case

---
class: middle

# Credible intervals

???

In this plot, for example, the 95% CI of the posterior overlaps 0 by a pretty large amount

Thus some researchers would conclude that there is not compelling evidence for this particular effect, or that the evidence in inconclusive

Notice that I've used 95% for the range here and in the previous examples

This is, without a doubt, the most common value, which is carryover from NHST (p < 0.05) and completely arbitrary

You can use any interval you want to describe the posterior

Let's practice a couple examples

---
class: title-slide-section-blue, middle, center

# Live coding

---

class: title-slide-section-grey, middle
background-image: url(https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/Brosen_windrose.svg/1200px-Brosen_windrose.svg.png)
background-size: 500px
background-position: 100% 50%

# .RUred[Probability of direction]

---
background-image: url(https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/Brosen_windrose.svg/1200px-Brosen_windrose.svg.png)
background-size: 200px
background-position: 95% 15%

# Probability of direction

???

Another tool we have for decision making is the probability of direction, also known as the maximum probability of effect

This value can range from 50% to 100% (0.5 - 1.0)

.middle[
.center[
## .RUred[The proportion of the posterior that has the same sign as the median of the distribution]
]
]

???

It is the proportion of the posterior that has the same sign (that is, if its + or -) as the median of the distribution

<b4ss-blockquote>
The probability that a given parameter estimate is positive or negative
</b4ss-blockquote>

???

In simple terms, it tells us the probability that a given parameter estimate is positive or negative

To illustrate, if you have a probability of direction of 100%, then you are 100% certain that a given parameter is positive (or negative)

On the other hand, if you obtain a probability of direction of 50%, then the parameter is equally likely to be positive as it is to be negative (that is a lot of uncertainty!)

The pbd is likely the closest equivalent to a p-value

It is quite simple to calculate, so let's look at a few examples...

---
class: title-slide-section-blue, middle, center

# Live coding

---
exclude: true
class: center, middle

---
class: title-slide-section-grey, middle
background-image: url(https://cdn10.bigcommerce.com/s-sqvei3ff9f/product_images/theme_images/rope-background.jpg?t=1507721858)
background-size: 1300px
background-position: 50% 0%

# .RUred[Region of practical equivalence]

### (ROPE)

???

The next tool we are going to talk about combines the highest density credible interval with a region of practical equivalence, or a ROPE

---
background-image: url(https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcR36meNrDaFPK1X3XSpGOg8w5yLiljw46jZ5ZVdnmMecvmCUg2gTyX8j-UE0HNIALZ6OqU&usqp=CAU)
background-size: 535px
background-position: 110% 50%

# ROPE

???

The idea behind the rope is that it is rather pointless in many cases to test whether an effect is equivalent to 0 (effects are rarely exactly 0)

Thus a ROPE allows the researcher to define a range of values which they consider to practically equivalent to a null effect

In other words...

.pull-left[
### .RUred[The ROPE represents the proportion of the HDI of a posterior distribution that lies within a region of practical equivalence]

.footnote[See Kruschke 2010, 2011, 2014]
]

???

In many cases we establish this region around a point null of 0

---
class: center, middle

???

For example, consider this plot that illustrates a region of practical equivalence around a point null of 0

In this case I established a ROPE of +/- 0.1

Imagine we fit a model and we want to see how much of the 95% credible interval overlaps with our rope

---
class: center, middle

???

Here we can see that the posterior mean is 0.3 and the 95% credible interval ranges from 0.16, 0.77

Furthermore we see that 15.67% of the HDI falls within the ROPE

This is visible in the plot with the orange vertical slab

---
class: center, middle

???

If the posterior falls further from the ROPE, as is the case here, it follows that the % of the HDI within the ROPE also decreases

---
class: center, middle

???

We could consider an HDI completely outside our ROPE as compelling evidence for a given effect

It is important to note that the researcher is responsible for determining the range of the ROPE and the width of the credible interval

---
background-image: url(https://raw.githubusercontent.com/jvcasillas/media/master/teaching/img/confused.png)
background-size: 400px
background-position: 95% 50%

# How to establish a ROPE

.Large[
- Domain expertise/previous research
]

???

So how does one go about doing that?

In many cases we have domain expertise regarding what a meaningful effect size might be (e.g., RTs) or we can examine related research

.Large[
- Pilot data
]

???

It is also possible to collect pilot data

.Large[
- Power analysis
]

???

Which can be used in conjunction with a power analysis

.Large[
- Rules of thumb, e.g., ±0.1 for standardized variables1
]

.footnote[
1 Half of what is considered a small effect using Cohen's D (Cohen, 1988, 2013) 
]

???

Some researchers also offer rules of thumb, like +/- 0.1 for standardized effect which is half of what Cohen considered a "small effect"

---
class: middle

# `$$\color{black}{\frac{\mu_{1} - \mu_{2}}{\sqrt{\frac{\sigma^2_{1} + \sigma^2_{2}}{2}}}}$$`

.footnote[
Kruschke 2018
]

???

For AB testing type designs, Kruschke 2018 offers the following formula

The takeaway here, and this is one of the overarching themes of this workshop, is that this requires thinking and planning on the part of the researcher and all decisions should be justified

So let's give this a try in R...

---
class: title-slide-section-blue, middle, center

# Live coding

---
class: title-slide-section-grey, middle

# Other methods

???

There are other methods for making statistical inferences and many more a being actively developed

I'll briefly mention a few so that you are aware of them and provide links so that you can get more information if you are interested

---

# Other methods

- Define your hypothesis (`brms::hypothesis()`)

- Bayes factors [info](https://easystats.github.io/bayestestR/articles/bayes_factors.html)

- Support intervals [info](https://easystats.github.io/bayestestR/reference/si.html)

???

Bayes factors represent other methods for decision making, I won't talk about them much but I'll mention that their use is rather controversial... see LINK for a primer and tutorial

- Equivalence test [info](https://easystats.github.io/parameters/reference/equivalence_test.lm.html)

- Practical significance [info](https://www.frontiersin.org/articles/10.3389/fpsyg.2019.02767/full)

???

Finally, I also wanted to mention equivalence testing, which allows one to see if an estimate and its uncertainty (confidence interval) falls within a ROPE

and practical significance, which is similar (conceptualized as a unidirectional equivalence test)

These type of tests allows one to provide evidence for a null effect, e.g., that two groups behave similarly

---

# Summary

.Large[
- Credible intervals

- Probability of direction (maximum probability of effect)

- Region of practical equivalence (ROPE)

- Other methods
]

.footnote[
Highly recommended reading: https://www.frontiersin.org/articles/10.3389/fpsyg.2019.02767/full
]

???

We have touched on some of the many ways in which one can use the posterior for statistical inferences

The main focus has been on emphasizing different ways we can use these tools to quantify uncertainty and help us in making decisions

They provide a rich set of tools at the researchers disposal that contrasts with the sole p-value available under a frequentist/NHST framework

Now we will do one last coding demo in which I will show you an easy way to conduct an array of tests with a single function call and you will practice testing your own hypotheses

---
class: title-slide-section-blue, middle, center

# Live coding