6  Module 1.3: Basic Operations in R

Now that we know about data types and structures, let’s see how to manipulate them.

6.1 Arithmetic Operations (Review)

# Basic arithmetic
2 + 5
[1] 7
10 - 3
[1] 7
4 * 8
[1] 32
100 / 4
[1] 25
# Order of operations (like standard math)
5 + 2 * 3   # Multiplication first
[1] 11
(5 + 2) * 3 # Parentheses first
[1] 21
# Built-in mathematical functions
sqrt(16)    # Square root
[1] 4
log(10)     # Natural logarithm
[1] 2.302585
log10(100)  # Base-10 logarithm
[1] 2

6.2 Logical Comparisons and Operators

Used to ask TRUE/FALSE questions about our data. Essential for filtering.

  • Comparison Operators:
    • > : Greater than
    • < : Less than
    • >=: Greater than or equal to
    • <=: Less than or equal to
    • ==: Exactly equal to (TWO equal signs! Very common mistake to use just one =)
    • !=: Not equal to
  • Logical Operators (Combine TRUE/FALSE):
    • & : AND (both sides must be TRUE)
    • | : OR (at least one side must be TRUE)
    • ! : NOT (reverses TRUE to FALSE, FALSE to TRUE)
yield <- 5.2
min_acceptable_yield <- 5.0
variety <- "ICARDA_Gold"

# Comparisons
yield > min_acceptable_yield # Is yield acceptable? TRUE
[1] TRUE
variety == "Local_Check"    # Is it the local check? FALSE
[1] FALSE
variety != "Local_Check"    # Is it NOT the local check? TRUE
[1] TRUE
# On vectors
plot_yields <- c(5.2, 4.5, 6.1, 5.5)
plot_yields > 5.0 # Which plots yielded above 5.0? [TRUE FALSE TRUE TRUE]
[1]  TRUE FALSE  TRUE  TRUE
plot_varieties <- c("ICARDA_Gold", "Local_Check", "ICARDA_RustResist", 
                    "ICARDA_Gold")
# Which plots are ICARDA_Gold? [TRUE FALSE FALSE TRUE]
plot_varieties == "ICARDA_Gold"
[1]  TRUE FALSE FALSE  TRUE
# Combining conditions
# Find plots where yield > 5.0 AND variety is ICARDA_Gold
(plot_yields > 5.0) & (plot_varieties == "ICARDA_Gold") # [TRUE FALSE FALSE TRUE]
[1]  TRUE FALSE FALSE  TRUE
# Find plots where yield > 6.0 OR variety is Local_Check
(plot_yields > 6.0) | (plot_varieties == "Local_Check") # [FALSE TRUE TRUE FALSE]
[1] FALSE  TRUE  TRUE FALSE

6.3 Vectorization: R’s Superpower

Many R operations are vectorized, meaning they automatically apply to each element of a vector without needing you to write a loop. This makes R code concise and efficient. We’ve already seen this with arithmetic (plot_yields + 0.5) and comparisons (plot_yields > 5.0).

Functions like mean(), sum(), min(), max(), sd() (standard deviation), length() also work naturally on vectors:

plot_yields <- c(5.2, 4.5, 6.1, 5.5)

mean(plot_yields)
[1] 5.325
sd(plot_yields)
[1] 0.6652067
sum(plot_yields > 5.0) # How many plots yielded > 5.0? (TRUE=1, FALSE=0)
[1] 3
length(plot_yields) # How many plots?
[1] 4

6.4 Working with Data Frames (Indexing and Filtering)

This is crucial for selecting specific data from your tables.

Let’s use the trial_data data frame from the previous section:

trial_data <- data.frame(
  PlotID = c("A101", "A102", "B101", "B102"),
  Variety = factor(c("ICARDA_Gold", "Local_Check", "ICARDA_RustResist", 
                     "ICARDA_Gold")),
  Yield_kg_plot = c(5.2, 4.5, 6.1, 5.5),
  Is_Resistant = c(TRUE, FALSE, TRUE, TRUE)
)
  1. Accessing Columns: Use $ (most common) or [[ ]]. trial_data$Variety or trial_data[["Variety"]]

  2. Accessing Rows/Columns/Cells using [row, column]:

# Get the value in Row 2, Column 3
trial_data[2, 3] # Should be 4.5
[1] 4.5
# Get the entire Row 1 (returns a data frame)
trial_data[1, ]
  PlotID     Variety Yield_kg_plot Is_Resistant
1   A101 ICARDA_Gold           5.2         TRUE
# Get the entire Column 2 (Variety column, returns a vector/factor)
trial_data[, 2]
[1] ICARDA_Gold       Local_Check       ICARDA_RustResist ICARDA_Gold      
Levels: ICARDA_Gold ICARDA_RustResist Local_Check
# Get Columns 1 and 3 (PlotID and Yield)
trial_data[, c(1, 3)] # Use c() for multiple column indices
  PlotID Yield_kg_plot
1   A101           5.2
2   A102           4.5
3   B101           6.1
4   B102           5.5
trial_data[, c("PlotID", "Yield_kg_plot")] # Can also use column names
  PlotID Yield_kg_plot
1   A101           5.2
2   A102           4.5
3   B101           6.1
4   B102           5.5
  1. Filtering Rows Based on Conditions (VERY IMPORTANT): Use a logical condition inside the row part of the square brackets.
# Select rows where Yield_kg_plot is greater than 5.0
high_yield_plots <- trial_data[trial_data$Yield_kg_plot > 5.0, ]
print(high_yield_plots)
  PlotID           Variety Yield_kg_plot Is_Resistant
1   A101       ICARDA_Gold           5.2         TRUE
3   B101 ICARDA_RustResist           6.1         TRUE
4   B102       ICARDA_Gold           5.5         TRUE
# Select rows where Variety is "ICARDA_Gold"
icarda_gold_plots <- trial_data[trial_data$Variety == "ICARDA_Gold", ]
print(icarda_gold_plots)
  PlotID     Variety Yield_kg_plot Is_Resistant
1   A101 ICARDA_Gold           5.2         TRUE
4   B102 ICARDA_Gold           5.5         TRUE
# Select rows where Variety is "ICARDA_Gold" AND yield > 5.0
# (We generated the logical vector for this earlier)
condition <- (trial_data$Variety == "ICARDA_Gold") & 
             (trial_data$Yield_kg_plot > 5.0)
print(condition) # Shows [TRUE FALSE FALSE TRUE]
[1]  TRUE FALSE FALSE  TRUE
selected_plots <- trial_data[condition, ]
print(selected_plots)
  PlotID     Variety Yield_kg_plot Is_Resistant
1   A101 ICARDA_Gold           5.2         TRUE
4   B102 ICARDA_Gold           5.5         TRUE
# Select rows where the variety is resistant
resistant_plots <- trial_data[trial_data$Is_Resistant == TRUE, ] 
# Or just trial_data[trial_data$Is_Resistant, ]
print(resistant_plots)
  PlotID           Variety Yield_kg_plot Is_Resistant
1   A101       ICARDA_Gold           5.2         TRUE
3   B101 ICARDA_RustResist           6.1         TRUE
4   B102       ICARDA_Gold           5.5         TRUE

6.5 apply() collection

The apply() family of functions lets us apply a function to the rows or columns in a matrix or data frame, a list or a vector.

# Imagine we have a data matrix of plot yield values for different varieties.
# Each row represents a variety and each column a yield measurement for each trial
plot_trials <- matrix(c(5.2, 4.5, 6.1, 5.5, 4, 6.6, 7, 5.1, 5.3), 
                      nrow = 3, ncol = 3)

# We calculate the mean for each variety (each row)
# 1 means function is run on rows, 2 would mean function is run on columns
apply(plot_trials, 1, mean)
[1] 5.900000 4.533333 6.000000

Exercise: Select the data for the ‘Local_Check’ variety from the trial_data data frame.

trial_data[trial_data$Variety == "Local_Check",]
  PlotID     Variety Yield_kg_plot Is_Resistant
2   A102 Local_Check           4.5        FALSE