Modelr::add_prediction

6 min read Oct 04, 2024
Modelr::add_prediction

Understanding and Implementing modelr::add_prediction in R

The modelr package in R is a powerful tool for working with models, providing functions for enhancing and visualizing your model analysis. One such function is add_prediction, which offers a convenient way to add predicted values from a model directly into your data frame.

Why Use add_prediction?

Let's say you've built a model using a dataset, and you want to analyze how well your model predicts values for the same dataset. add_prediction comes in handy because it integrates the predicted values right into the original dataframe, making it effortless to compare predicted and actual outcomes.

Understanding the Basics

The add_prediction function takes the following arguments:

  • model: This is your fitted model object, typically created using functions like lm(), glm(), or gam().
  • data: This is the data frame you want to generate predictions for. It can be the same data used to fit the model, or a different data frame.
  • var: This argument is optional and specifies the variable name you want to use to store the predicted values. If not provided, it defaults to ".pred".

Example

Imagine you want to predict the price of houses based on their size. You've created a linear model using lm() and want to see how well it predicts house prices for the same data. Here's a basic example:

library(modelr)
library(dplyr)

# Sample data 
houses <- data.frame(
  size = c(1000, 1500, 2000, 2500, 3000),
  price = c(200000, 300000, 400000, 500000, 600000)
)

# Fit a linear model
house_model <- lm(price ~ size, data = houses)

# Use add_prediction to add predictions to the original data
houses_with_predictions <- add_prediction(house_model, houses)

# View the results
houses_with_predictions

What is add_prediction Doing?

Behind the scenes, add_prediction is essentially applying the predict() function to your model with the provided data. However, instead of just returning the predicted values, it neatly integrates them into the data frame, creating a new column (or using the specified variable name).

Key Benefits of add_prediction

  • Simplicity: You don't need to manually calculate predictions using predict() and then merge them with your data. add_prediction handles this automatically.
  • Clean Code: It keeps your code organized and avoids unnecessary intermediate steps.
  • Enhanced Analysis: It makes it easy to analyze residuals, compare actual vs. predicted values, and explore the model's performance directly within your data frame.

Beyond Basic Usage

add_prediction can be used with various other modeling functions in R, including:

  • Generalized linear models (GLMs): You can use add_prediction to analyze predicted probabilities from logistic regression models.
  • Generalized additive models (GAMs): For more complex models, add_prediction can help you visualize and assess predicted values.

Example with a Generalized Linear Model

Let's say you have a dataset about customer purchases and want to predict the probability of a customer making a purchase based on their age.

library(modelr)
library(dplyr)

# Sample data
customers <- data.frame(
  age = c(25, 30, 35, 40, 45),
  purchase = c(1, 1, 0, 0, 1)
)

# Fit a logistic regression model
purchase_model <- glm(purchase ~ age, data = customers, family = binomial)

# Use add_prediction to get predicted probabilities
customers_with_predictions <- add_prediction(purchase_model, customers, var = "prob_purchase")

# View the results
customers_with_predictions

Analyzing the Results

You can now easily compare the predicted probability of purchase (prob_purchase) to the actual purchase outcome (purchase) in your customers_with_predictions dataframe. This allows you to assess the model's accuracy and further refine your analysis.

Conclusion

modelr::add_prediction is a powerful tool for integrating predictions from your R models directly into your data frames. It simplifies your analysis workflow, enabling you to easily compare actual and predicted values, examine model performance, and delve deeper into your findings.