Re: How did you "widen" your columns?
I don't work at the company listed in the article, but the following is how I'd approach the problem.
CREATE TABLE purchases_per_customer (
PRIMARY KEY (customer_id, purchase_date, item, price, age, address)
) WITH CLUSTERING ORDER BY (purchase_date DESC);
This will allow me to quickly access all recent purchase for a customer.
Importantly, my data will be physically stored in sorted order. The data will be sorted by purchase_date and then by item within any given date.
Also, I pushed all of my columns into the cell (aka physical column) name and therefore I have only 1 cell per metric. Nothing is stored in the cell value since everything has been packed into the cell name. This is a good move since it eliminates the data overhead of repeatedly storing column names in the cell name (which can be large for a high-volume metrics application).
The main benefit this gives me is the ability to get all recent purchases for a customer with a single on-disk access. This data model will also give very fast range scans by purchase date.
Of course, I'd probably also want to query by item, so I would then store that in another table. In Cassandra, writes are super fast and I tend to model my data to become CPU bound, not I/O bound, so I'd probably turn on compression for this table.