# Load the necessary libraries
library(readr)
library(readxl)
library(dplyr) # for glimpse
# --- Reading a CSV file ---
# Assume you have a file called 'sample_phenotypes.csv' in your 'data' folder
# relative to your project root.
<- "data/sample_phenotypes.csv"
pheno_file_path
# Check if file exists before trying to read (good habit)
<- read_csv(pheno_file_path)
phenotype_data
# Note: Base R has read.csv() - it works but readr::read_csv() is often faster
# and handles data types more consistently (e.g., doesn't default strings to factors).
# --- Reading an Excel file ---
# Assume you have 'sample_trial.xlsx' in your 'data' folder
<- "data/sample_trial.xlsx"
excel_file_path
# See what sheets are in the workbook
excel_sheets(excel_file_path)
# Read data from a specific sheet (e.g., "YieldData")
<- read_excel(excel_file_path, sheet = "YieldData")
yield_data_excel
# Or read by sheet number (first sheet is 1)
<- read_excel(excel_file_path, sheet = 1) yield_data_excel
7 Module 1.4: Reading and Writing Data
So far, we’ve created data inside R. But usually, your breeding data exists in external files, like Excel spreadsheets or CSV files. We need to get this data into R and save our results out of R.
7.1 Common Data File Formats
- CSV (Comma Separated Values -
.csv
): Plain text file where columns are separated by commas. Very common, easily readable by many programs (including R and Excel). Often the best format for sharing data. - TSV (Tab Separated Values -
.tsv
): Similar to CSV, but uses tabs to separate columns. - Excel Files (
.xls
,.xlsx
): Native Microsoft Excel format. Can contain multiple sheets, formatting, formulas. Requires specific R packages to read/write. - Text Files(
.txt
): Additionally, data can be saved as a simple text file. This file type can support comma or tab separated values. You would simply need to specify your separator when reading the file. - RDS Files (
.rds
): A specific binary file format used in R for saving and loading single R objects
7.2 Paths, Working Directory, and RStudio Projects (Best Practice!)
R needs to know where to find your files.
- Working Directory: The default folder location R looks in. You can see it with
getwd()
and set it withsetwd("path/to/folder")
, but setting it manually is usually bad practice because it makes your code non-portable. - Absolute Path: The full path from the root of your computer (e.g.,
"C:/Users/YourName/Documents/BreedingData/trial1.csv"
). Avoid this! It breaks if you move folders or share your code. - Relative Path & RStudio Projects (RECOMMENDED):
- Organize your work using an RStudio Project. Create one via
File -> New Project -> Existing Directory...
and select your main course folder (course_project_baku
). - When you open the
.Rproj
file, RStudio automatically sets the working directory to that project folder. - Keep your data files inside the project folder, ideally in subdirectories like
data/raw
(original data) ordata/example
(cleaned data for examples). - Refer to files using relative paths starting from the project root, like
"data/example/phenotypes.csv"
. This makes your analysis reproducible and easy to share!
- Organize your work using an RStudio Project. Create one via
7.3 Reading Data into R
We’ll use functions from the readr
(for CSV/TSV) and readxl
(for Excel) packages. Make sure they are installed (see Module 1.1).
- Always inspect your data after loading! Use
head()
,str()
,glimpse()
,summary()
. Did R read the column names correctly? Are the data types what you expected (numeric, character, etc.)?
7.4 Writing Data out of R
After cleaning data or performing analysis, you’ll want to save results.
# Load the necessary libraries
library(readr)
library(readxl)
library(dplyr) # for glimpse
# --- Writing a CSV file ---
# Assume you have a data frame called "df" in your environment
write.csv(df, "data/sample_phenotypes.csv")
# --- Writing a TSV file ---
# Assume you have a data frame called "df" in your environment
write.table(df, "data/sample_phenotypes.tsv")
# --- Writing an RDS file ---
# Assume you have a data frame called "df" in your environment
saveRDS(df, "data/sample_phenotypes.rds")
Exercise: If you have a simple Excel file with some breeding data (e.g., Plot ID, Variety, Yield), try reading it into R using read_excel()
. Inspect the loaded data frame using glimpse()
.