r plot simple slopes

"# Don't want def = TRUE for factors even though they are character#### Fit models ############################################################### Since output columns are conditional, I call summ here to see what they will# be. the data frame has columns for the slope of the test variable, the standard For more information on customizing the embed code, read Comprehensive, User-Friendly Toolkit for Probing Interactions#' \code{sim_slopes} conducts a simple slopes analysis for the purposes of#' understanding two- and three-way interaction effects in linear regression.#' @param centered A vector of quoted variable names that are to be#' mean-centered. If the model includes interactions at different levels (e.g., three two-way simple_slopes calculates all the simple effects of an interactionin a fitted model (linear, generalized linear, hierarchical linear, or ANOVA). Default#' @param jnalpha What should the alpha level be for the Johnson-Neyman#' interval? Default is the same as `format`.#' For more on what you can do with a `huxtable`, see \pkg{huxtable}.# This is a non-obvious way of seeing if we have a sim_margins or slopes# Get the number of moderator values per 2nd moderator values# Get the row I'll be inserting to; it's *2 because there are two row# tab <- huxtable::set_number_format(tab, value = digits)#' @title Plot coefficients from simple slopes analysis#' @description This creates a coefficient plot to visually summarize the#' @param ... arguments passed to [jtools::plot_coefs()]# If there's a second moderator, format as appropriate And John has another great way to do simple slopes in ggplot2! rdrr.io Find an R package R language docs Run R in your browser R Notebooks. slope for this variable was being tested. for continuous variables, and at each level of categorical variables. However, with the new package, I can't figure out how to plot the individual slopes, as in the figure for the probabilities of fixed effects by (random) group level, located here. models, the degrees of freedom and p-values will not appear, as these are I have X and Y data and want to put 95 % confidence interval in my R plot. (2005). In all but the linear model#' case, be aware that not all the assumptions applied to simple slopes#' \item{slopes}{A table of coefficients for the focal predictor at each#' \item{ints}{A table of coefficients for the intercept at each value of the#' \item{modx.values}{The values of the moderator used in the analysis}#' \item{mods}{A list containing each regression model created to estimate#' \item{jn}{If \code{johnson_neyman = TRUE}, a list of `johnson_neyman`#' objects from \code{\link{johnson_neyman}}. Plot coefficients from simple slopes analysis. By default, this function will provide slopes at -1 SD, the mean, and +1 SD can be overridden with the If a categorical variable with more than two levels is being tested, you may 11. Probing interactions in fixed and#' multilevel regression: Inferential and graphical techniques.#' \emph{Multivariate Behavioral Research}, \emph{40}(3), 373-400.#' \url{http://dx.doi.org/10.1207/s15327906mbr4003_5}#' Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). ---
title: 'Interactions and Simple Slopes'
output:
  html_document:
    code_download: yes
    fontsize: 8pt
    highlight: textmate
    number_sections: no
    theme: flatly
    toc: yes
    toc_float:
      collapsed: no
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(cache=TRUE)
knitr::opts_chunk$set(echo = TRUE) #Show all script by default
knitr::opts_chunk$set(message = FALSE) #hide messages 
knitr::opts_chunk$set(warning =  FALSE) #hide package warnings 
knitr::opts_chunk$set(fig.width=5) #Set default figure sizes
knitr::opts_chunk$set(fig.height=4.5) #Set default figure sizes
knitr::opts_chunk$set(fig.align='center') #Set default figure
knitr::opts_chunk$set(fig.show = "hold") #Set default figure
knitr::opts_chunk$set(results = "hold") #Set default figure
```

# Interactions
- Interactions have the same meaning they did in ANOVA
    - there is a synergistic effect between two variables
    - However now we can examine interactions between continuous variables
- Additive Effects:

$$Y = B_1X + B_2Z + B_0$$

- Interactive Effects: 
$$Y = B_1X + B_2Z + B_3XZ + B_0$$


## Example of Interaction
- DV: Reading Speed, IVs: Age + IQ 
- Simulation: $Y = 3X + .2Z + .4XZ +100 +\epsilon$
- So what we mean is: $ReadingSpeed = 3*Age + .2*IQ + .4*Age*IQ +100 +Noise$
    - Set age to be uniform distribution between 7-17 
    - Set IQ to be normal, $M$=100 and $S$=15
    - Set $\epsilon$ = 15 (normal distribution of noise)
    - Note: For our regression simulation to match our equation, we should center our X and Z variables first, but we will save out the raw score 
```{r, echo=TRUE, warning=FALSE}
set.seed(42)
n <- 200
# Uniform distribution of Ages 
X <- runif(n, 7, 17)
Xc<-scale(X,scale=F)
# normal distrobution of IQ (100 +-15) 
Z <- rnorm(n, 100, 15)
Zc<-scale(Z,scale=F)
# Our equation to  create Y
Y <- 3*Xc + .2*Zc + .4*Xc*Zc +100 + rnorm(n, sd=15)
#Built our data frame
Reading.Data<-data.frame(Age=X,IQ=Z,ReadingSpeed=Y)
```

## Scatterplot & correlate our predictors
- We will use a useful diagnostic tool (GGally)
- We will visualize the density, scatterplot and correlation all at once

```{r,fig.width=4.5,fig.height=4.5}
library(GGally)
DiagPlot <- ggpairs(Reading.Data,  
              lower = list(continuous = "smooth"))
DiagPlot
```

- Predictors have some heteroskedastic issues, but the correlations seem to make sense


## Let's test for an interaction
1. Center your variables
2. Model 1: Examine a main-effects model (X + Z) [not necessary but useful]
3. Model 2: Examine a main-effects + Interaction model (X + Z + X:Z)

- Note: You can simply code it as X*Z in r, as it will automatically do (X + Z + X:Z)

```{r, echo=TRUE, results='asis'}
#Center
Reading.Data$Age.C<-scale(Reading.Data$Age, center = TRUE, scale = FALSE)[,]
Reading.Data$IQ.C<-scale(Reading.Data$IQ, center = TRUE, scale = FALSE)[,]
# Regressions
Centered.Read.1<-lm(ReadingSpeed~Age.C+IQ.C,Reading.Data)
Centered.Read.2<-lm(ReadingSpeed~Age.C*IQ.C,Reading.Data)
# Show results
library(stargazer)
stargazer(Centered.Read.1,Centered.Read.2,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE, single.row=TRUE, 
          notes.append = FALSE, header=FALSE)
```

- Also, change in $R^2$ is significant, as we might expect

```{r}
ChangeInR<-anova(Centered.Read.1,Centered.Read.2)
knitr::kable(ChangeInR, digits=4)
```

### Examine residuals
- Main Effects Model
```{r, fig.width=6}
library(ggfortify)
autoplot(Centered.Read.1, which = 1:2, label.size = 1) + theme_bw()
```

- Interactions Model

```{r, fig.width=6}
autoplot(Centered.Read.2, which = 1:2, label.size = 1) + theme_bw()
```

## Understanding the Interaction 
- Let's examine a 3d scatter plot of the raw data and plot against it the predicted results of each model (additive and interaction model)
- Once we have a "model" of the data, we can generate predictions for **all possible Ages** at **every possible IQ** (within the absolute boundary limits for our data)
    - With 2 predictors that generates a "3D surface plane."
        - You go past 2 predictors with these types of plots
- Let's examine Model 1 (additive effects) as a 3D scatter/surface plot
    - Scatter dots = Raw data points
    - Surface = Model 1 Predicted Results
        - How well does the surface fit the dots? 

```{r, echo = FALSE}
library(plotly)
library(dplyr)
Read.1<-lm(ReadingSpeed~Age+IQ,Reading.Data)
Read.2<-lm(ReadingSpeed~Age*IQ,Reading.Data)
# predict over sensible grid of values
graph_reso <- .5
Age.PL <- seq(min(Reading.Data$Age), max(Reading.Data$Age), by = graph_reso)
IQ.PL <- seq(min(Reading.Data$IQ), max(Reading.Data$IQ), by = graph_reso)
d <- setNames(data.frame(expand.grid(Age.PL, IQ.PL)),c("Age", "IQ"))
vals1 <- predict(Read.1, newdata = d)
vals2 <- predict(Read.2, newdata = d)
# form matrix and give to plotly
m1 <- matrix(vals1, nrow = length(unique(d$Age)), ncol = length(unique(d$IQ)))
m2 <- matrix(vals2, nrow = length(unique(d$Age)), ncol = length(unique(d$IQ)))
```

```{r, echo = FALSE, fig.width=5.5,fig.height=5.5}
p1 <- plot_ly(Reading.Data, x = ~IQ, y = ~Age, z = ~ReadingSpeed) %>%
  add_markers(marker = list(size = 4))  %>% 
  add_surface(x = ~IQ.PL, y = ~Age.PL, z = ~m1,showscale = FALSE, opacity=.75)
p1
```

- Let's examine Model 2 (interaction effects) as a 3d scatter/surface plot
    - Scatter dots = Raw data points
    - Surface = Model 2 Predicted Results
        - How well does the surface fit the dots? 

```{r, echo = FALSE, fig.width=5.5,fig.height=5.5}
p2 <- plot_ly(Reading.Data, x = ~IQ, y = ~Age, z = ~ReadingSpeed) %>%
  add_markers(marker = list(size = 4))  %>% 
  add_surface(x = ~IQ.PL, y = ~Age.PL, z = ~m2,showscale = FALSE, opacity=.75)
p2
```


## How visualize this for Papers?
- Well, there are some options
- We can do our fancy-pants surface plot, but that is hard to put into a paper
- More common this is to examine slope of one factor at different levels of the other (Simple Slopes) 
- What we need to decide is at which levels

### Hand picking
- Manually select the what levels of each you are going to examine
- For tracking values of interest
- here I picked -3 to 3 for age (mix and max) and -15, 0 15 for IQ (theoretical 1 SD and mean)

```{r}
library(effects)
Inter.1a<-effect(c("Age.C*IQ.C"), Centered.Read.2,
                           xlevels=list(Age.C=seq(-3,3, 1), 
                                        IQ.C=c(-15,0,15)))
```


```{r}
knitr::kable(summary(Inter.1a)$effect, digits=4)
plot(Inter.1a, multiline = TRUE)
```

### Quantile
- Examine the levels based on quantiles (bins based on probability)
- We will do this into 5 equal bins based on probability cut-offs 
- Does not assume normality for IV

```{r}
IQ.Quantile<-quantile(Reading.Data$IQ.C,probs=c(0,.25,.50,.75,1))
IQ.Quantile<-round(IQ.Quantile,2)

Inter.1b<-effect(c("Age.C*IQ.C"), Centered.Read.2,
                 xlevels=list(Age.C=seq(-3,3, 1), 
                              IQ.C=IQ.Quantile))
plot(Inter.1b, multiline = TRUE)
```

### Based on the SD 
- We select 3 values: $M-1SD$, $M$, $M+1SD$
- Not it does not have to be 1 SD, it can be 1.5 ,2 or 3
- Assumes normality for IV

```{r}
IQ.SD<-c(mean(Reading.Data$IQ.C)-sd(Reading.Data$IQ.C),
         mean(Reading.Data$IQ.C),
         mean(Reading.Data$IQ.C)+sd(Reading.Data$IQ.C))
IQ.SD<-round(IQ.SD,2)

Inter.1c<-effect(c("Age.C*IQ.C"), Centered.Read.2,
                 xlevels=list(Age.C=seq(-3,3, 1), 
                              IQ.C=IQ.SD))
plot(Inter.1c, multiline = TRUE)
```

## Why center your IVs?
- The center of each IV represents the mean of that IV
- Thus when you interact them $X*Z$, that means $0*0 = 0$
- Also, this can reduce multicollinearity issues
- This is because if $X*Z$ creates a line, it means you have added a new predictor (XZ) that strongly correlates with X and Z

```{r}
X<-c(0,2,4,6,8,10)
Z<-c(0,2,4,6,8,10)
XZ<-X*Z
plot(XZ)
```

- Correlations: $r_{x,z}$ = `r round(cor(X,Z),2)`, $r_{x,xz}$ = `r round(cor(X,XZ),2)`, $r_{z,xz}$ = `r round(cor(Z,XZ),2)`
-But if you center them, now you will make a U-shaped interaction which will be orthogonal to X and Z!

```{r, echo=TRUE, warning=FALSE}
X.C<-scale(X, center = TRUE, scale = FALSE)[,]
Z.C<-scale(Z, center = TRUE, scale = FALSE)[,]
XZ.C<-X.C*Z.C
plot(XZ.C)
```

- Correlations: $r_{x_c,z_c}$ = `r round(cor(X.C,Z.C),2)`, $r_{x_c,xz_c}$ = `r round(cor(X.C,XZ.C),2)`, $r_{z_c,xz_c}$ = `r round(cor(Z.C,XZ.C),2)`

- Let's apply this to our class example
- See below where I manually multiply the values and you can see a very strong positive slope
- Left = Raw Score, Right plot = Centered Score

```{r, echo=FALSE, fig.width=3.25, fig.height=3.25,fig.show='hold',fig.align='center'}
library(car)
Reading.Data$Age.X.IQ<-Reading.Data$Age*Reading.Data$IQ
Reading.Data$Age.X.IQ.C<-Reading.Data$Age.C*Reading.Data$IQ.C
scatterplot(ReadingSpeed~Age.X.IQ, data= Reading.Data, reg.line=FALSE, smoother=loessLine)
scatterplot(ReadingSpeed~Age.X.IQ.C, data= Reading.Data, reg.line=FALSE, smoother=loessLine)
```

### Let's run an uncentered regression
- as you can see the terms have changed from centered models

```{r, echo=TRUE}
Uncentered.Read.1<-lm(ReadingSpeed~IQ+Age,Reading.Data)
Uncentered.Read.2<-lm(ReadingSpeed~IQ*Age,Reading.Data)
```

```{r, echo=FALSE, results='asis'}
stargazer(Uncentered.Read.1,Uncentered.Read.2,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE, single.row=TRUE, 
          notes.append = FALSE, header=FALSE)
```

#### Multicollinearity of the Interaction
- Multicollinearity of the uncentered model

```{r, echo=TRUE, warning=FALSE}
vif(Uncentered.Read.2)
```

- We lost our main effect of **Age** as the variances got all inflated
- We did not have this problem in the centered model

```{r, echo=TRUE, warning=FALSE}
vif(Centered.Read.2)
```

- Note: Its OK to plot the uncentered version, but run the analysis on the centered data

# Testing Simple Slopes 
- The goal here is to figure out when the slope at a given level of another variable is different from zero
- We chop up the interaction at specific places as we did with the interactions plots (-1 SD, M, +1 SD) on the *moderating* variable (a third variable that affects the strength of the relationship between a dependent and independent variable) 
- Next, we test the slopes on the *predictor* variable (IV of main interest)
- Example: Performance on a task, predicted by Level of training, moderated by how hard the challenge is

## Example of Interaction
- DV: Performance, IVs: Training (X) + Challenge (Z)
- Simulation: $Y = 2.2X + 2.46Z + .75XZ +16 +\epsilon$
    - Set Training to be uniform distribution between -10 to 10 
    - Set Challenge to be normal, $M$=0 and $S$=15
    - Set $\epsilon$ = 50 (normal distribution of noise)
    - Note: For our regression simulation we will set means to about zero (so are not forced to center them)
    
```{r}
set.seed(42)
# 250 people
n <- 250
#Training (low to high)
X <- runif(n, -10, 10)
# normal distribution of Challenge Difficulty
Z <- rnorm(n, 0, 15)
# Our equation to  create Y
Y <- 2.2*X + 2.46*Z + .75*X*Z + 16 + rnorm(n, sd=50)
#Built our data frame
Skill.Data<-data.frame(Training=X,Challenge=Z,Performance=Y)
#run our regression
Skill.Model.1<-lm(Performance~Training+Challenge,Skill.Data)
Skill.Model.2<-lm(Performance~Training*Challenge,Skill.Data)
```

```{r,echo=FALSE,fig.width=4.5,fig.height=4.5}
DiagPlot2 <- ggpairs(Skill.Data,  
              lower = list(continuous = "smooth"))
DiagPlot2
```

```{r,echo=FALSE, results='asis'}
stargazer(Skill.Model.1,Skill.Model.2,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE, single.row=TRUE, 
          notes.append = FALSE, header=FALSE)
```


## Plotting 
- We will use the `rockchalk` library to visualize our interaction and test our simple slopes

```{r}
library(rockchalk)
m1ps <- plotSlopes(Skill.Model.2, modx = "Challenge", plotx = "Training", n=3, modxVals="std.dev")
m1psts <- testSlopes(m1ps)
round(m1psts$hypotests,4)
```

- We see the slope at the mean is not significant, 
- The +1SD and -1SD are significant, this is what is driving the interaction
- Well trained people at low levels of challenge get worse with more training, while high levels of challenge with more training get better
- The bonus using rockchalk is that you can change the number of lines (change $n=3$) you want to see or you can test quantiles as well (change modxVals="quantile")

```{r}
m1pQ <- plotSlopes(Skill.Model.2, modx = "Challenge", plotx = "Training", n=5, modxVals="quantile")
m1pQts <- testSlopes(m1pQ)
round(m1pQts$hypotests,4)
```

### Notes
- You often don't need to run the *t*-tests on the simple slopes, but they can useful to test very specific hypotheses
- You can run simple slopes analysis on three-way interactions, but let's leave that aside for now as you would have to use a different R-package

# Non-Linear interactions
- You can have non-linear terms interacting with other linear and non-linear terms
- Example: Quit smoking, X = Fear of your health, Z = moderated by Self-Efficacy for quitting

## Example of Polynomial Interaction
- DV: Quit smoking, IVs: Fear (X) + Self-Efficacy (Z)
- Simulation: $Y = -.18X - .15X^2 + Z + .16XZ + .07X^2Z  +20 +\epsilon$
    - Set Fear of your health to be uniform distribution between 1 to 9 (centered) 
    - Set Self-Efficacy to be normal, $M$=0 and $S$=15
    - Set $\epsilon$ = 10 (normal distribution of noise)
    
```{r}
set.seed(42)
n <- 250
#Fear of Health
X <- scale(runif(n, 1, 9), scale=F)
#Centered Self-Efficacy
Z <- scale(runif(n, 0, 15), scale=F)
# Our equation to  create Y
Y <- -.05*X - .20*X^2 + .15*X*Z + .22*(X^2)*Z+ 35 + rnorm(n, sd=10)
#Built our data frame
Smoke.Data<-data.frame(Smoking=Y,Health=X,SelfEfficacy=Z)
#run our regression
Smoke.Model.1<-lm(Smoking~poly(Health,2)+SelfEfficacy,Smoke.Data)
Smoke.Model.2<-lm(Smoking~poly(Health,2)*SelfEfficacy,Smoke.Data)
Smoke.Model.1.R<-lm(Smoking~poly(Health,2, raw=T)+SelfEfficacy,Smoke.Data)
Smoke.Model.2.R<-lm(Smoking~poly(Health,2, raw=T)*SelfEfficacy,Smoke.Data)
```

- Orthogonal Polynomials
```{r, echo=FALSE, results='asis'}
stargazer(Smoke.Model.1,Smoke.Model.2,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE, single.row=TRUE, 
          notes.append = FALSE, header=FALSE)
```

- Power Polynomials

```{r, echo=FALSE, results='asis'}
stargazer(Smoke.Model.1.R,Smoke.Model.2.R,type="html",
          column.labels = c("Main Effects Non-Ortho", "Interaction Non-Ortho"),
          intercept.bottom = FALSE, single.row=TRUE, 
          notes.append = FALSE, header=FALSE)
```

- Note how different the results might look when you examine that as power polynomials


## Plotting the Simple Slopes when there are Curves
- These can be done with the `effects` package as I showed you above and last week
- We will work with the orthogonal polynomial models
    - or we can simply use the plotCurves function from the 'rockchalk' package

```{r}
SmokeInterPlot <- plotCurves(Smoke.Model.2, 
                             modx = "SelfEfficacy", plotx = "Health",
                             n=3,modxVals="std.dev")
```

- Lets plot just the main effect (and think about what it means relative the power)

```{r, echo=TRUE, warning=FALSE}
plotCurves(Smoke.Model.1, plotx = "Health")
```

- Does not look very quadratic, does it? In other words, you cannot see the higher order effects as they can be hidden by the moderating variable
- How to find these hidden things? 
- You have to test for interactions and changes in $R^2$
- You have to try to add higher order terms, but they should be motivated by some theory
- What if you did not know the second order term was in the model above. 

```{r, echo=TRUE, warning=FALSE}
Smoke.Model.1.L<-lm(Smoking~Health+SelfEfficacy,Smoke.Data)
Smoke.Model.2.L<-lm(Smoking~Health*SelfEfficacy,Smoke.Data)
```

```{r, echo=FALSE, results='asis'}
stargazer(Smoke.Model.1.L,Smoke.Model.2.L,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          header=FALSE)

```

```{r}
ChangeInR.Smoke<-anova(Smoke.Model.1.L,Smoke.Model.2.L)
knitr::kable(ChangeInR.Smoke, digits=4)
```

- There is no significant interaction, but let's plot it anyway.
- Replotting as linear interaction

```{r}
SmokeInterPlot.Linear <- plotSlopes(Smoke.Model.2.L, modx = "SelfEfficacy", plotx = "Health", n=3, modxVals="std.dev")
```


- Let's check the change in $R^2$ from Linear interaction model to poly interaction model now

```{r, echo=TRUE, results='asis'}
stargazer(Smoke.Model.2.L,Smoke.Model.2,type="html",
          column.labels = c("Linear", "Poly"),
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          header=FALSE)
```

```{r}
ChangeInR.Smoke.2<-anova(Smoke.Model.2,Smoke.Model.2.L)
knitr::kable(ChangeInR.Smoke.2, digits=4)
```

## Notes
- So, model with poly is a significantly better fit
- Would it change the story by much? In this case YES, but not always! 
    - Sometimes the improvement in fit is minor and does not change the story
        - Often ignoring the higher order term can make it easier to test simple slopes and tell a story
    - However, ignoring the higher order terms can violate your assumptions when the results, let's compare our linear vs poly models residuals

- Linear Model
```{r, fig.width=6}
autoplot(Smoke.Model.2, which = 1:2, label.size = 1) + theme_bw()
```

- Quadtradic Model
```{r, fig.width=6}
autoplot(Smoke.Model.2.L, which = 1:2, label.size = 1) + theme_bw()
```

<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-90415160-1', 'auto');
  ga('send', 'pageview');

</script>
\(ReadingSpeed = 3*Age + .2*IQ + .4*Age*IQ +100 +Noise\)\(Y = -.18X - .15X^2 + Z + .16XZ + .07X^2Z +20 +\epsilon\)

If you are using SPSS, this can be done by selecting "Covariance matrix" in the "Regression Coefficients" section of the "Statistics" dialog box. The program can be configured to evaluate the simple slopes for any value of the moderator. "# Saving key arguments as attributes of return object### Centering ################################################################## Use utility function shared by all interaction functions### Getting moderator values ##################################################"Johnson-Neyman intervals are not available for factor# Now specify def or not (for labeling w/ print method)# Don't want def = TRUE for factors even though they are character"mod2.values argument must include only levels of the factor. Using all factor levels instead. A 'sstest' value in a particular column indicates that the simple If you wish to test simple effects for a different interaction, simply switch Percentile.

140 0 obj <>stream