How to Calculate Summary Statistics in R DataFrame

The summarize() function can be used to calculate summary statistics in R DataFrame.

Here are the steps to derive the summary statistics for a given DataFrame.

Steps to calculate summary statistics in R DataFrame

Step 1: Install the dplyr package

To start, install the dplyr package if you haven’t already done so:

install.packages("dplyr")

Step 2: Create a DataFrame

Next, create a DataFrame in R as follows:

# Create a DataFrame
df <- data.frame(
  StudentID = 1:5,
  Name = c("Alice", "Bob", "Charlie", "David", "Eve"),
  Score = c(85, 92, 78, 89, 95)
)

# Print the DataFrame
print(df)

The result:

  StudentID    Name  Score
1         1   Alice     85
2         2     Bob     92
3         3 Charlie     78
4         4   David     89
5         5     Eve     95

Step 3: Calculate Summary statistics

Finally, calculate the summary statistics using the summarize() function:

# Load the dplyr package
library(dplyr)

# Create a DataFrame
df <- data.frame(
  StudentID = 1:5,
  Name = c("Alice", "Bob", "Charlie", "David", "Eve"),
  Score = c(85, 92, 78, 89, 95)
)

# Calculate summary statistics for the student Score
summary_stats <- df %>%
    summarize(
        Mean_Score = mean(Score),
        Median_Score = median(Score),
        Min_Score = min(Score),
        Max_Score = max(Score),
        StdDev_Score = sd(Score),
        Variance_Score = var(Score),     
       )

# View the summary statistics
print(summary_stats)

The result:

      Mean_Score  Median_Score  Min_Score  Max_Score  StdDev_Score  Variance_Score
1        87.8          89          78         95        6.610598         43.7

Leave a Comment