Celko on SQL: Dividing the Spoils
In this database programming challenge, SQL guru Joe Celko explores various ways to split commissions among multiple sales people.
You are given two tables that both deal with sales transactions. One table shows which sales people were involved with each customer and sale. The second table shows the total amount of each sale and the commissions that have to be split among the sales people involved in each transaction. Just how do you calculate the commissions? This problem, which was originally posted on a newsgroup site, presents an interesting challenge.
What we want to do is join the two tables on sale_nbr and cust_id so we can see how much each salesperson gets for each sale. We are assuming that if (n) salespeople are assigned to a sale, we split the transaction amount among them. Here is a guess at the DDL as it was described in the post. The PRIMARY KEYs had to be determined by the data shown.
CREATE TABLE SalesTickets
(sale_nbr INTEGER NOT NULL,
cust_id VARCHAR(10) NOT NULL,
salesperson_id VARCHAR(20) NOT NULL,
PRIMARY KEY (sale_nbr, cust_id, salesperson_id));
CREATE TABLE Commissions
(sale_nbr INTEGER NOT NULL,
cust_id VARCHAR(10) NOT NULL,
trans_amt DECIMAL(12,4) NOT NULL,
PRIMARY KEY (sale_nbr, cust_id));
I have no idea why there is a cust_id column in both tables, so I assumed that it is part of the primary key, although it is a bit strange that one sale ticket involves multiple customers. Perhaps each customer's sales are numbered within his account instead? In any case, you can use a query like this:
SELECT T.sale_nbr, T.cust_id, T.salesperson_id,
(C.trans_amt/X.sp_cnt) AS commission_amt
FROM SalesTickets AS T,
Commissions AS C,
(SELECT T1.sale_nbr, T1.cust_id, COUNT(*)
FROM SalesTickets AS T1
GROUP BY T1.sale_nbr, T1.cust_id)
AS X(sale_nbr, cust_id, sp_cnt)
WHERE T.sale_nbr = C.sale_nbr
AND T.sale_nbr = X.sale_nbr
AND T.cust_id = C.cust_id
AND T.cust_id = X.cust_id;
It's a little more difficult if the transaction amount is not divisible by the number of sales persons, so the commissions will not be exactly the same. For example, if you have a $1000.00 sale to divide among 3 salespersons, each will get $333.3333, with a total of $999.9999, not $1000.00 unless you round it up. Just let the boss worry about the pennies for now.
Did you notice that we created a Table X in the last solution? There is a pretty good chance that the derived table will be materialized by the optimizer. It would sure be nice to avoid creating tables on the fly. Here is another approach, using a scalar subquery to get the divisor.
SELECT T.sale_nbr, T.cust_id,
(C.trans_amt
/ (SELECT COUNT(*)
FROM SalesTickets AS T2
WHERE T2.sale_nbr = T1.sale_nbr
AND T2.cust_id = T1.cust_id))
AS commission_amt
FROM SalesTickets AS T1,
Commissions AS C
WHERE C.sale_nbr = T1.sale_nbr
AND C.cust_id = T1.cust_id;
Will the scalar subquery be any better than a derived table? I do not know and it probably depends on your optimizer. In general, a SELECT in a SELECT list is expensive because it is implemented as nested loops. But in this case, we have a self-join in the correlation. This stuff is hard!
A slightly different solution that sticks to Core SQL-99 was proposed.
SELECT T1.sale_nbr, T1.cust_id, T1.salesperson_id,
(SUM(C.sales_amt)
/ (SELECT COUNT(*)
FROM SalesTickets AS T2
WHERE T2.sale_nbr = T1.sale_nbr))
AS commission_amt
FROM SalesTickets AS T1, Commissions AS C
WHERE T1.sale_nbr = C.sale_nbr
GROUP BY T1.sale_nbr, T1.cust_id, T1.salesperson_id;
This is the first proposed answer that sums the amount of a sale as keyed on (sale_nbr, cust_id) in the Commisions table. Will we get a boost from the GROUP BY clause if the optimizer can work on each group in parallel? Maybe, but I don't know for certain.
Here's another answer using the COUNT() OVER() function. This was added in SQL-99 along with some other OLAP functions.
SELECT T.sale_nbr, T.cust_id,
(C.trans_amt
/ COUNT(*) OVER(PARTITION BY C.sale_nbr, C.cust_id))
AS commission_amt
FROM SalesTickets AS T, Commissions AS C
WHERE C.sale_nbr = T.sale_nbr
AND C.cust_id = T.cust_id;
This is probably the fastest solution if the optimizer can construct the COUNT(*) on the fly while doing the join in the WHERE clause.
I deliberately failed to post any sample data. You would need a large amount of data to really test for the fastest solution. The purpose of this drill is to learn to guess a solution based on some idea of how your optimizer works. Unfortunately, the optimizers will change over time, so the fastest query this week can be a second- or third-place choice later.
I recommend that you keep all the answers you come up with and comment them out in the source code. This gives the person who has to maintain your code a head start when perfective maintenance is required.
About the Author
You May Also Like