Book Image

R Bioinformatics Cookbook - Second Edition

By : Dan MacLean
Book Image

R Bioinformatics Cookbook - Second Edition

By: Dan MacLean

Overview of this book

The updated second edition of R Bioinformatics Cookbook takes a recipe-based approach to show you how to conduct practical research and analysis in computational biology with R. You’ll learn how to create a useful and modular R working environment, along with loading, cleaning, and analyzing data using the most up-to-date Bioconductor, ggplot2, and tidyverse tools. This book will walk you through the Bioconductor tools necessary for you to understand and carry out protocols in RNA-seq and ChIP-seq, phylogenetics, genomics, gene search, gene annotation, statistical analysis, and sequence analysis. As you advance, you'll find out how to use Quarto to create data-rich reports, presentations, and websites, as well as get a clear understanding of how machine learning techniques can be applied in the bioinformatics domain. The concluding chapters will help you develop proficiency in key skills, such as gene annotation analysis and functional programming in purrr and base R. Finally, you'll discover how to use the latest AI tools, including ChatGPT, to generate, edit, and understand R code and draft workflows for complex analyses. By the end of this book, you'll have gained a solid understanding of the skills and techniques needed to become a bioinformatics specialist and efficiently work with large and complex bioinformatics datasets.
Table of Contents (16 chapters)

Tidying a long format table into a tidy table with tidyr

In this recipe, we look at the complementary operation to that of the Tidying a wide format table into a tidy table with tidyr recipe. We’ll take a long table and split one of its columns out to make multiple new columns. Initially, this might seem like we’re now violating our tidy data frame requirement, but we do occasionally come across data frames that have more than one variable squeezed into a single column. As in the previous recipe, tidyr has a specification-based function to allow us to correct our data frame.

Getting ready

We’ll use the tidyr package and the treatment data frame in the rbioinfcookbook package. This data frame has four columns, one of which—measurement—has got two variable names in it that need splitting into columns of their own.

How to do it…

In stark contrast to the Tidying a wide format table into a tidy table with tidyr recipe, this expression is extremely terse; we can tidy the wide table very easily:

library(rbioinfcookbook)library(tidyr)
treatments |> 
  pivot_wider(
    names_from = measurement,
    values_from = value
  )

This is so simple because all the data we need is already in the data frame.

How it works…

In this very simple-looking recipe, the specification is gloriously clear: simply take the measurement column and create new column names from its values, moving the value appropriately. The names_from argument specifies the column to split, and values_from specifies where its values come from.

There’s more…

It is quite possible to incorporate values from more than one column at a time; just pass a vector of columns to the names_from argument, and you can format the computed column names in the output with names_glue.