Index
A
- active data
- and non-active data, swapping / Swapping active and nonactive data
- algorithm design affects
- time complexity / Algorithm design affects time and space complexity
- space complexity / Algorithm design affects time and space complexity
- Amazon Machine Images (AMIs)
- about / Installing gputools
- Amazon Web Services (AWS)
- about / Installing gputools
- Hadoop, setting on / Setting up Hadoop on Amazon Web Services
- Amdahl's law
- array databases
B
- batch processing
- big.matrix object
- about / The bigmemory package
- bigmemory package
- using / The bigmemory package
- bit vectors
- using / Bit vectors
- built-in functions
- using / Use of built-in functions
C
- cfunction() function, arguments
- sig=signature(x="numeric", n="integer") / Including compiled code inline
- body / Including compiled code inline
- language="C" / Including compiled code inline
- columnar databases
- columnar storage
- compiled code
- considerations / Considerations for using compiled code
- R APIs / R APIs
- R data types, versus native data types / R data types versus native data types
- R objects, creating / Creating R objects and garbage collection
- garbage collection / Creating R objects and garbage collection
- memory, allocating for non-R objects / Allocating memory for non-R objects
- compiled code inline
- including / Including compiled code inline
- compiled language / R is interpreted on the fly
- compiled languages
- using / Using compiled languages in R
- prerequisites / Prerequisites
- compiled code inline, including / Including compiled code inline
- external compiled code, calling / Calling external compiled code
- compiled code, considerations / Considerations for using compiled code
- compiler package
- about / Compiling functions
- computing performance
- CPU / Three constraints on computing performance – CPU, RAM, and disk I/O
- RAM / Three constraints on computing performance – CPU, RAM, and disk I/O
- disk I/O / Three constraints on computing performance – CPU, RAM, and disk I/O
- bottlenecks / Three constraints on computing performance – CPU, RAM, and disk I/O
- copy-on-modification model
- CPU utilization
- CUDA-enabled GPU card
- URL / Installing gputools
- CUDA toolkit
- URL, for downloading / Installing gputools
D
- data
- processing, in chunks / Using memory-mapped files and processing data in chunks
- preprocessing in relational database, SQL used / Preprocessing data in a relational database using SQL
- uploading, into HDFS / Uploading data to HDFS
- database
- statistical algorithm, executing / Running statistical and machine learning algorithms in a database
- machine learning algorithm, executing / Running statistical and machine learning algorithms in a database
- datadr package
- about / Other Hadoop packages for R
- data extraction in R
- versus data processing in database / Extracting data into R versus processing data in a database
- data parallel algorithms
- implementing / Implementing data parallel algorithms
- data parallelism
- versus task parallelism / Data parallelism versus task parallelism
- about / Data parallelism versus task parallelism
- examples / Data parallelism versus task parallelism
- disk I/O
- distributed memory parallelism
- about / Shared memory versus distributed memory parallelism
- versus shared memory parallelism / Shared memory versus distributed memory parallelism
- dplyr package
- about / Using dplyr
- used, for converting R expressions / Using dplyr
- dynamically typed language
- about / Using compiled languages in R
E
- elapsed time
- Elastic MapReduce (EMR)
- about / Setting up Hadoop on Amazon Web Services
- URL, for prices / Setting up Hadoop on Amazon Web Services
- execution time
- about / Measuring total execution time
- measuring / Measuring total execution time
- measuring, with system.time() / Measuring execution time with system.time()
- user time / Measuring execution time with system.time()
- system time / Measuring execution time with system.time()
- elapsed time / Measuring execution time with system.time()
- time measurements, repeating with rbenchmark / Repeating time measurements with rbenchmark
- distribution, measuring with microbenchmark / Measuring distribution of execution time with microbenchmark
- profiling / Profiling the execution time
- function, profiling with Rprof() / Profiling a function with Rprof()
- profiling results / The profiling results
- external compiled code
- calling / Calling external compiled code
F
- fast alternative packages, in CRAN / Seeking fast alternative packages in CRAN
- ffbase package
- about / The ff package
- functions / The ff package
- ffdf tool
- about / Uploading data to HDFS
- ff package
- using / The ff package
- ff package, data types
- Boolean / The ff package
- Logical / The ff package
- Quad / The ff package
- Nibble / The ff package
- Byte / The ff package
- Ubyte / The ff package
- Short / The ff package
- Ushort / The ff package
- Integer / The ff package
- Single / The ff package
- Double / The ff package
- Complex / The ff package
- Raw / The ff package
- Factor / The ff package
- Ordered / The ff package
- POSIXct / The ff package
- Date / The ff package
- Field Programmable Gate Arrays (FPGAs)
- forked clusters
G
- garbage collection
- garbage collector
- gmatrix
- about / R and GPUs
- Google Books Ngrams data
- URL / Uploading data to HDFS
- GPU
- general purpose computing / General purpose computing on GPUs
- with R / R and GPUs
- gputools, installing / Installing gputools
- performance affecting factors / Fast statistical modeling in R with gputools
- gputools
- about / R and GPUs
- installing / Installing gputools
- URL, for installation / Installing gputools
- used, for statistical modeling / Fast statistical modeling in R with gputools
H
- Hadoop
- about / Understanding Hadoop
- URL / Understanding Hadoop
- setting, on Amazon Web Services (AWS) / Setting up Hadoop on Amazon Web Services
- used, for processing large datasets / Processing large datasets in batches using Hadoop
- data, uploading into HDFS / Uploading data to HDFS
- HDFS data, analyzing with RHadoop / Analyzing HDFS data with RHadoop
- R packages / Other Hadoop packages for R
- hash tables
- using, for frequent lookups / Use of hash tables for frequent lookups on large data
- HDFS
- about / Understanding Hadoop
- data, uploading into / Uploading data to HDFS
I
- inline package
- using / Including compiled code inline
- installation, gputools
- about / Installing gputools
- installation, MADlib
- installation, Rtools
- about / Prerequisites
- installation, SciDB / Using array databases for maximum scientific-computing performance
- installation, Xcode Command Line Tools / Prerequisites
- interfaces, external compiled code
- .C() / Calling external compiled code
- .Fortran() / Calling external compiled code
- .Call() / Calling external compiled code
- .External() / Calling external compiled code
- intermediate data
J
- just-in-time (JIT) compilation
K
- key measures, resource utilization / Monitoring memory utilization, CPU utilization, and disk I/O using OS tools
L
- Linux AMIs
- URL / Installing gputools
M
- machine learning algorithm
- MADlib
- URL / Running statistical and machine learning algorithms in a database
- URL, for building / Running statistical and machine learning algorithms in a database
- URL, for installation guide / Running statistical and machine learning algorithms in a database
- installing / Running statistical and machine learning algorithms in a database
- Map
- about / Understanding Hadoop
- mapper
- about / Understanding Hadoop
- MapReduce
- about / Understanding Hadoop
- massively parallel processing (MPP)
- Matrix package
- about / Sparse matrices
- measures, performance problems troubleshooting / Monitoring memory utilization, CPU utilization, and disk I/O using OS tools
- memory
- preallocating / Preallocating memory
- allocating, for non-R objects / Allocating memory for non-R objects
- memory-efficient data structures
- using / Using memory-efficient data structures
- smaller data types, using / Smaller data types
- sparse matrices, using / Sparse matrices, Symmetric matrices
- symmetric matrices, using / Symmetric matrices
- bit vectors, using / Bit vectors
- memory-mapped files
- using / Using memory-mapped files and processing data in chunks
- about / Using memory-mapped files and processing data in chunks
- bigmemory package, using / The bigmemory package
- ff package, using / The ff package
- memory utilization
- MonetDB
- about / Using columnar databases for improved performance
- URL, for downloading / Using columnar databases for improved performance
O
- objects
- OpenCL
- about / R and GPUs
P
- parallel computing
- performance, optimizing / Optimizing parallel performance
- pass by reference
- pass by value
- performance bottlenecks
- identifying / Identifying and resolving bottlenecks
- resolving / Identifying and resolving bottlenecks
- PivotalR package
- about / Using PivotalR
- used, for converting R expressions / Using PivotalR
- plyrmr package
- about / Other Hadoop packages for R
- PostgreSQL
- setting up / Preprocessing data in a relational database using SQL
- URL, for downloading / Preprocessing data in a relational database using SQL
- Principal Component Analysis (PCA) / Seeking fast alternative packages in CRAN
R
- R
- running, single-threaded on CPU / R is single-threaded
- data, loading into memory / R requires all data to be loaded into memory
- GPU / R and GPUs
- RAM optimization
- objects, reusing / Reusing objects without taking up more memory
- intermediate data, removing / Removing intermediate data when it is no longer needed
- values, calculating on fly / Calculating values on the fly instead of storing them persistently
- active and non-active data, swapping / Swapping active and nonactive data
- R APIs
- ravro package
- about / Other Hadoop packages for R
- R code
- executing, on fly / R is interpreted on the fly
- compiling, before execution / Compiling R code before execution
- functions, compiling / Compiling functions
- just-in-time (JIT) compilation / Just-in-time (JIT) compilation of R code
- Rcpp package
- RCUDA
- about / R and GPUs
- R data types
- versus native data types / R data types versus native data types
- Reduce
- about / Understanding Hadoop
- reducers
- about / Understanding Hadoop
- relational database
- data preprocessing, SQL used / Preprocessing data in a relational database using SQL
- R expressions
- converting, into SQL / Converting R expressions to SQL
- converting, dplyr package used / Using dplyr
- converting, PivotalR package used / Using PivotalR
- RHadoop
- about / Processing large datasets in batches using Hadoop
- URL / Processing large datasets in batches using Hadoop
- rhdfs package / Processing large datasets in batches using Hadoop
- rmr2 package / Processing large datasets in batches using Hadoop
- used, for analyzing HDFS data / Analyzing HDFS data with RHadoop
- plyrmr package / Other Hadoop packages for R
- rhbase package / Other Hadoop packages for R
- ravro package / Other Hadoop packages for R
- RHIPE package / Other Hadoop packages for R
- datadr package / Other Hadoop packages for R
- Trelliscope package / Other Hadoop packages for R
- Segue package / Other Hadoop packages for R
- rhbase package
- about / Other Hadoop packages for R
- rhdfs package
- about / Processing large datasets in batches using Hadoop
- URL, for installation / Processing large datasets in batches using Hadoop
- RHIPE package
- URL / Other Hadoop packages for R
- about / Other Hadoop packages for R
- rmr2 package
- about / Processing large datasets in batches using Hadoop
- URL, for installation / Processing large datasets in batches using Hadoop
- R objects
- creating / Creating R objects and garbage collection
- R packages, for GPU
- gputools / R and GPUs
- gmatrix / R and GPUs
- RCUDA / R and GPUs
- OpenCL / R and GPUs
- Rprof()
- used, for profiling function / Profiling a function with Rprof()
- about / The profiling results
- Rtools
- URL, for downloading / Prerequisites
- installing / Prerequisites
S
- SciDB
- Segue package
- URL / Other Hadoop packages for R
- about / Other Hadoop packages for R
- SEXP pointers
- about / Calling external compiled code
- shared memory parallelism
- versus distributed memory parallelism / Shared memory versus distributed memory parallelism
- about / Shared memory versus distributed memory parallelism
- shim
- URL, for installing / Using array databases for maximum scientific-computing performance
- simpler data structures
- using / Use of simpler data structures
- smaller data types
- using / Smaller data types
- socket-based cluster
- space complexity / Algorithm design affects time and space complexity
- sparse matrices
- using / Sparse matrices
- SQL
- used, for data preprocessing in relational database / Preprocessing data in a relational database using SQL
- R expressions, converting / Converting R expressions to SQL
- statically typed language
- about / Using compiled languages in R
- statistical algorithm
- statistical modeling
- with gputools / Fast statistical modeling in R with gputools
- symmetric matrices
- using / Symmetric matrices
- system-wide resource utilization measure
- system.time()
- system time
T
- task parallel algorithms
- implementing / Implementing task parallel algorithms
- same task, executing / Running the same task on workers in a cluster
- multiple tasks, executing / Running different tasks on workers in a cluster
- task parallelism
- versus data parallelism / Data parallelism versus task parallelism
- about / Data parallelism versus task parallelism
- tasks
- executing, in parallel on cluster of computers / Executing tasks in parallel on a cluster of computers
- time complexity
- transient storage allocation
- Trelliscope package
- about / Other Hadoop packages for R
U
- user-controlled memory
- user-controlled memory, functions
- type* Calloc(size_t n, type) / Allocating memory for non-R objects
- type* Realloc(any *p, size_t n, type) / Allocating memory for non-R objects
- void Free(any *p) / Allocating memory for non-R objects
- user time
V
- vectorization
- about / Vectorization
X
- Xcode Command Line Tools
- installing / Prerequisites
- URL, for downloading / Prerequisites