Identifying Customers
Identifying transactions as belonging to the same customer is challenging, both for retail customers (individuals and households) and for business customers. Even when customers have an ongoing relationship, such as a loyalty card, they may not always use their identification number. This section discusses the definition of “customer” and how customers are represented in data. The next section looks at other types of data, such as addresses.
Who Is the Customer?
The transactions in the purchases dataset are orders. The database has several ways to tie transactions together over time. Each order has OrderId, which leads to a CustomerId and a HouseholdId. The following query provides the counts of orders, customers, and households:
SELECT COUNT(*) as numorders, COUNT(DISTINCT c.CustomerId) as numcusts, COUNT(DISTINCT c.HouseholdId) as numhh FROM Orders o LEFT OUTER JOIN Customers c ON o.CustomerId = c.CustomerId
This query returns 192...