Celko on SQL: Dividing the Spoils

In this database programming challenge, SQL guru Joe Celko explores various ways to split commissions among multiple sales people.

Joe Celko, Contributor

September 2, 2008

4 Min Read

You are given two tables that both deal with sales transactions. One table shows which sales people were involved with each customer and sale. The second table shows the total amount of each sale and the commissions that have to be split among the sales people involved in each transaction. Just how do you calculate the commissions? This problem, which was originally posted on a newsgroup site, presents an interesting challenge.

What we want to do is join the two tables on sale_nbr and cust_id so we can see how much each salesperson gets for each sale. We are assuming that if (n) salespeople are assigned to a sale, we split the transaction amount among them. Here is a guess at the DDL as it was described in the post. The PRIMARY KEYs had to be determined by the data shown.

 cust_id VARCHAR(10) NOT NULL,
 salesperson_id VARCHAR(20) NOT NULL,
 PRIMARY KEY (sale_nbr, cust_id, salesperson_id));

CREATE TABLE Commissions
 cust_id VARCHAR(10) NOT NULL,
 trans_amt DECIMAL(12,4) NOT NULL,
 PRIMARY KEY (sale_nbr, cust_id));

I have no idea why there is a cust_id column in both tables, so I assumed that it is part of the primary key, although it is a bit strange that one sale ticket involves multiple customers. Perhaps each customer's sales are numbered within his account instead? In any case, you can use a query like this:

SELECT T.sale_nbr, T.cust_id, T.salesperson_id,
    (C.trans_amt/X.sp_cnt) AS commission_amt
 FROM SalesTickets AS T,
    Commissions AS C,
    (SELECT T1.sale_nbr, T1.cust_id, COUNT(*)
      FROM SalesTickets AS T1
     GROUP BY T1.sale_nbr, T1.cust_id)
    AS X(sale_nbr, cust_id, sp_cnt)
 WHERE T.sale_nbr = C.sale_nbr
  AND T.sale_nbr = X.sale_nbr
  AND T.cust_id = C.cust_id
  AND T.cust_id = X.cust_id;

It's a little more difficult if the transaction amount is not divisible by the number of sales persons, so the commissions will not be exactly the same. For example, if you have a $1000.00 sale to divide among 3 salespersons, each will get $333.3333, with a total of $999.9999, not $1000.00 unless you round it up. Just let the boss worry about the pennies for now.

Did you notice that we created a Table X in the last solution? There is a pretty good chance that the derived table will be materialized by the optimizer. It would sure be nice to avoid creating tables on the fly. Here is another approach, using a scalar subquery to get the divisor.

SELECT T.sale_nbr, T.cust_id,
    / (SELECT COUNT(*)
       FROM SalesTickets AS T2
      WHERE T2.sale_nbr = T1.sale_nbr
       AND T2.cust_id = T1.cust_id))
    AS commission_amt
 FROM SalesTickets AS T1,
    Commissions AS C
 WHERE C.sale_nbr = T1.sale_nbr
  AND C.cust_id = T1.cust_id;

Will the scalar subquery be any better than a derived table? I do not know and it probably depends on your optimizer. In general, a SELECT in a SELECT list is expensive because it is implemented as nested loops. But in this case, we have a self-join in the correlation. This stuff is hard!

A slightly different solution that sticks to Core SQL-99 was proposed.

SELECT T1.sale_nbr, T1.cust_id, T1.salesperson_id,
     / (SELECT COUNT(*)
       FROM SalesTickets AS T2
       WHERE T2.sale_nbr = T1.sale_nbr))
    AS commission_amt
 FROM SalesTickets AS T1, Commissions AS C
 WHERE T1.sale_nbr = C.sale_nbr
 GROUP BY T1.sale_nbr, T1.cust_id, T1.salesperson_id;

This is the first proposed answer that sums the amount of a sale as keyed on (sale_nbr, cust_id) in the Commisions table. Will we get a boost from the GROUP BY clause if the optimizer can work on each group in parallel? Maybe, but I don't know for certain.

Here's another answer using the COUNT() OVER() function. This was added in SQL-99 along with some other OLAP functions.

SELECT T.sale_nbr, T.cust_id,
    / COUNT(*) OVER(PARTITION BY C.sale_nbr, C.cust_id))
    AS commission_amt
 FROM SalesTickets AS T, Commissions AS C
 WHERE C.sale_nbr = T.sale_nbr
  AND C.cust_id = T.cust_id;

This is probably the fastest solution if the optimizer can construct the COUNT(*) on the fly while doing the join in the WHERE clause.

I deliberately failed to post any sample data. You would need a large amount of data to really test for the fastest solution. The purpose of this drill is to learn to guess a solution based on some idea of how your optimizer works. Unfortunately, the optimizers will change over time, so the fastest query this week can be a second- or third-place choice later.

I recommend that you keep all the answers you come up with and comment them out in the source code. This gives the person who has to maintain your code a head start when perfective maintenance is required.

About the Author(s)

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights