To avoid counting duplicate rows, we can use the distinct
operation in SQL. In dplyr
, we can also eliminate duplicated rows from a given dataset.
Ensure that you completed the Enhancing a data.frame with a data.table recipe to load purchase_view.tab
and purchase_order.tab
as both data.frame
and data.table
into your R environment.
Perform the following steps to distinct duplicate rows with dplyr
:
First, we illustrate how to obtain unique products from the dataset:
> order.dt %>% select(Product) %>% distinct() %>% head(3) Product 1: P0006944501 2: P0006018073 3: P0002267974
We can also
distinct
duplicated rows containing multiple columns:> distinct.product.user.dt <- order.dt %>% select(Product, User) %>% distinct() > head(distinct.product.user.dt, 3) Product User 1: P0006944501 U312622727 2: P0006018073 U239012343 3: P0002267974 U10007697373
At this point, let's compare the...