Mostly, in real-life situations, we usually start our analysis with a data frame-type structure. What do we do after getting a dataset and what are the basic data-manipulation tasks we usually perform before starting modeling? They are explained here:
We check the validity of a dataset based on conditions.
We sort the dataset based on some variables, in ascending or descending order.
We create new variables based on existing variables.
Finally, we summarize them.
This is a list of tasks we usually perform over full datasets. The dplyr
package has all the necessary functions to perform all the tasks listed and some more additional tasks that come in handy in the data-manipulation process. Group-wise operation is also possible using the dplyr
package. In the dplyr
package, every task is performed using a function that is called a verb. We may need to use multiple verbs on the same data frame. This could force us to write either a very long line or multiple...