Book Image

R Bioinformatics Cookbook - Second Edition

By : Dan MacLean
Book Image

R Bioinformatics Cookbook - Second Edition

By: Dan MacLean

Overview of this book

The updated second edition of R Bioinformatics Cookbook takes a recipe-based approach to show you how to conduct practical research and analysis in computational biology with R. You’ll learn how to create a useful and modular R working environment, along with loading, cleaning, and analyzing data using the most up-to-date Bioconductor, ggplot2, and tidyverse tools. This book will walk you through the Bioconductor tools necessary for you to understand and carry out protocols in RNA-seq and ChIP-seq, phylogenetics, genomics, gene search, gene annotation, statistical analysis, and sequence analysis. As you advance, you'll find out how to use Quarto to create data-rich reports, presentations, and websites, as well as get a clear understanding of how machine learning techniques can be applied in the bioinformatics domain. The concluding chapters will help you develop proficiency in key skills, such as gene annotation analysis and functional programming in purrr and base R. Finally, you'll discover how to use the latest AI tools, including ChatGPT, to generate, edit, and understand R code and draft workflows for complex analyses. By the end of this book, you'll have gained a solid understanding of the skills and techniques needed to become a bioinformatics specialist and efficiently work with large and complex bioinformatics datasets.
Table of Contents (16 chapters)

Using datapasta to create R objects from cut-and-paste data

Being able to paste data into source code documents is useful for all sorts of reasons, not least because it allows for a reproducible example, also known as a reprex—a minimal, self-containedexample that demonstrates a problem or a concept. By including data in the source code, others can run the code and see the results for themselves, making it easier to understand and replicate the results.

The R datapasta package makes it easy to paste data into R source code documents. It provides a set of functions for converting data to and from R definitions and is extremely useful when creating static data objects in code examples, tests, or when sharing. In this recipe, you will learn how to use datapasta to bring external data into your source code documents by typing them in longhand.

Getting ready

We will use the datapasta package, though installing it is non-standard. Use this command:

renv::install("datapasta", repos = c(mm = "https://milesmcbain.r-universe.dev", getOption("repos")))

This should install the package using renv and make us ready to go. Remember to install renv the usual way if you don’t already have it.

How to do it…

The datapasta tool is implemented as an add-in for RStudio, so we begin by setting that up:

  1. Use the RStudio Tools | Addins | Browse Addins menu and then Keyboard Shortcuts. You get to choose which key combination you want to use for pasting.
  2. Click the middle column next to the operations, as shown in the following screenshot, and press the keys you want to use. The combination in Figure 2.1 is a good choice:
Figure 2.1 – Selecting key shortcuts

Figure 2.1 – Selecting key shortcuts

  1. Get a web table—for example, go to this page on Wikipedia: https://en.wikipedia.org/wiki/Tab-separated_values—and copy the whole text-based example table using the browser’s right-click Copy feature. This should put the text table in your copy/paste buffer. It should look like what’s shown in Figure 2.2:
Figure 2.2 – Web data on Wikipedia

Figure 2.2 – Web data on Wikipedia

  1. Paste the table now in your copy/paste buffer into an R source document. Place the typing cursor at a suitable place in the source R document you’re working in and use the key combo to paste in the table.

So, with the preceding setup, we should be able to quickly take data from varied sources and coerce them into R objects for analysis.

How it works…

The datapasta package is really useful. The first step sets up our preferred paste keys for later use; the second step is simple, and we just go somewhere and find some data in the world we would like in our R source; and by the third step, we’re selecting the place to put the data definition and pasting it in. Our example goes from a table in a web page to this definition for an R object:

data.frame(  stringsAsFactors = FALSE,
      Sepal.length = c(5.1, 4.9, 4.7, 4.6, 5),
       Sepal.width = c(3.5, 3, 3.2, 3.1, 3.6),
      Petal.length = c(1.4, 1.4, 1.3, 1.5, 1.4),
       Petal.width = c(0.2, 0.2, 0.2, 0.2, 0.2),
           Species = c("I. setosa","I. setosa",
                       "I. setosa","I. setosa","I. setosa")
)

And that powerful little operation is how we can convert web data to source code very easily.

There’s more…

It’s possible to go the other way around, from some R object to a definition of that object using the dpasta() function, which can coerce data frames, tibbles, and vectors into definitions. This is really useful for reproducible examples. This example shows how to actually build the object with code so that we don’t have to share the file and everything is in one source document:

library(datapasta)mtcars |> 
   dpasta()

This shows how the datapasta package is a super-useful tool for turning objects from the web and within R into code.