Comparing Datasets and Reporting Only Non-Duplicated Rows

27 Feb

February 27, 2012November 16, 2017 editor

The following is in response to a question on the R-Help list.

Consider two datasets:

reported <-
structure(list(Product = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
3L, 4L, 5L, 5L), .Label = c("Cocoa", "Coffee C", "GC", "Sugar No 11",
"ZS"), class = "factor"), Price = c(2331, 2356, 2440, 2450, 204.55,
205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61L, -61L,
5L, 1L, 40L, 40L, -1L, -1L, -1L, 1L)), .Names = c("Product",
"Price", "Nbr.Lots"), row.names = c(1L, 2L, 3L, 4L, 6L, 7L, 5L,
10L, 8L, 9L), class = "data.frame")

exportfile <-
structure(list(Product = c("Cocoa", "Cocoa", "Cocoa", "Coffee C",
"Coffee C", "GC", "Sugar No 11", "ZS", "ZS"), Price = c(2331,
2356, 2440, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61,
-61, 6, 40, 40, -1, -1, -1, 1)), .Names = c("Product", "Price",
"Nbr.Lots"), row.names = c(NA, 9L), class = "data.frame")

Two possible solutions:
A. 
m   <- rbind(reported, exportfile)
m1  <- m[duplicated(m),]
res <- m[is.na(match(m$key, m1$key)),]

B.

exportfile$key <- do.call(paste, exportfile)
reported$key   <- do.call(paste, reported)
a   <- reported[is.na(match(reported$key, exportfile$key)),]
b   <- exportfile[is.na(match(exportfile$key, reported$key)),]
res <- rbind(a, b)

Related