The summarize() function can be used to calculate summary statistics in R DataFrame.
Here are the steps to derive the summary statistics for a given DataFrame.
Steps to calculate summary statistics in R DataFrame
Step 1: Install the dplyr package
To start, install the dplyr package if you haven’t already done so:
install.packages("dplyr")
Step 2: Create a DataFrame
Next, create a DataFrame in R as follows:
# Create a DataFrame df <- data.frame( StudentID = 1:5, Name = c("Alice", "Bob", "Charlie", "David", "Eve"), Score = c(85, 92, 78, 89, 95) ) # Print the DataFrame print(df)
The result:
StudentID Name Score
1 1 Alice 85
2 2 Bob 92
3 3 Charlie 78
4 4 David 89
5 5 Eve 95
Step 3: Calculate Summary statistics
Finally, calculate the summary statistics using the summarize() function:
# Load the dplyr package library(dplyr) # Create a DataFrame df <- data.frame( StudentID = 1:5, Name = c("Alice", "Bob", "Charlie", "David", "Eve"), Score = c(85, 92, 78, 89, 95) ) # Calculate summary statistics for the student Score summary_stats <- df %>% summarize( Mean_Score = mean(Score), Median_Score = median(Score), Min_Score = min(Score), Max_Score = max(Score), StdDev_Score = sd(Score), Variance_Score = var(Score), ) # View the summary statistics print(summary_stats)
The result:
Mean_Score Median_Score Min_Score Max_Score StdDev_Score Variance_Score
1 87.8 89 78 95 6.610598 43.7