Book Image

R Bioinformatics Cookbook - Second Edition

By : Dan MacLean
Book Image

R Bioinformatics Cookbook - Second Edition

By: Dan MacLean

Overview of this book

The updated second edition of R Bioinformatics Cookbook takes a recipe-based approach to show you how to conduct practical research and analysis in computational biology with R. You’ll learn how to create a useful and modular R working environment, along with loading, cleaning, and analyzing data using the most up-to-date Bioconductor, ggplot2, and tidyverse tools. This book will walk you through the Bioconductor tools necessary for you to understand and carry out protocols in RNA-seq and ChIP-seq, phylogenetics, genomics, gene search, gene annotation, statistical analysis, and sequence analysis. As you advance, you'll find out how to use Quarto to create data-rich reports, presentations, and websites, as well as get a clear understanding of how machine learning techniques can be applied in the bioinformatics domain. The concluding chapters will help you develop proficiency in key skills, such as gene annotation analysis and functional programming in purrr and base R. Finally, you'll discover how to use the latest AI tools, including ChatGPT, to generate, edit, and understand R code and draft workflows for complex analyses. By the end of this book, you'll have gained a solid understanding of the skills and techniques needed to become a bioinformatics specialist and efficiently work with large and complex bioinformatics datasets.
Table of Contents (16 chapters)

Finding transmembrane domains with tmhmm and pureseqTM

Protein transmembrane domains are the parts of a protein that pass through the lipid bilayer of a cell membrane. These domains are typically composed of hydrophobic amino acids that allow the protein to interact with the nonpolar interior of the membrane. Transmembrane domains play an important role in many cellular processes, including cell signaling, transporting molecules, and cell adhesion. One important application of bioinformatics is identifying protein transmembrane domains from amino acid sequences. Several methods are used to identify transmembrane domains bioinformatically, including hydrophobicity analysis, in which we identify regions of a protein sequence that have a high degree of hydrophobicity and are likely to be transmembrane domains. There are also hidden Markov models that are trained to identify transmembrane domains based on a set of known transmembrane proteins. We can also use machine learning algorithms...