samedi 25 avril 2015

Histogram: Counting orders with variable bins in SQL


I have a table containing orders, items, and prices. I am trying to generate histograms for each item based on the prices.

Create Table #Customer_Pricing
(
customer_id int,
item_id VARCHAR(10),
qty DECIMAL(5,2),
price DECIMAL(5,2),
)
;
GO

-- Insert Statements
Insert into #Customer_Pricing values(128456, 'SOM 555', 8, 2.50)
Insert into #Customer_Pricing values(123856, 'SOM 554', 1, 2.50)
Insert into #Customer_Pricing values(123456, 'SOM 554', 55, 2.00)
Insert into #Customer_Pricing values(123556, 'SOM 555', 2, 2.20)
Insert into #Customer_Pricing values(123456, 'SOM 553', 12, 2.13)
;

For each item, I wanted 3 bins so I determined the bin sizes by dividing the difference of the MAX-MIN by 3, then adding that value to the MIN.

WITH Stats_Table_CTE (item_id2,max_p, min_p, int_p, r1_upper, r2_lower, r2_upper, r3_lower)
AS
(   SELECT  item_id
            ,max(price) 
            ,min(price)
            ,(max(price) - min(price))/3
            ,min(price)+(max(price) - min(price))/3-0.01
            ,min(price)+(max(price) - min(price))/3         
            ,min(price)+((max(price) - min(price))/3)*2-0.01
            ,min(price)+((max(price) - min(price))/3)*2                                             
        FROM #Customer_Pricing
        GROUP BY item_id)

Now, I need to count the frequencies for each range and each item. I have attempted to do so by using SUM(CASE...) but was unsuccessful.

SELECT item_id
    ,SUM(CASE WHEN price <= r1_upper, THEN 1 ELSE 0 END) AS r1_count
    ,SUM(CASE WHEN price >= r2_lower AND <= r2_upper, THEN 1 ELSE 0 END) AS r2_count
    ,SUM(CASE WHEN price >= r3_lower, THEN 1 ELSE 0 END) AS r3_count
FROM Stats_Table_CTE
GROUP BY item_id

I also attempted to use COUNT in the form SELECT item_id, price count(price <= r1_upper) AS r1_count.... but I got stuck

In one attempt, INNER JOINed the #Customer_Pricing table and Stats_Table_CTE but didn't know where to go from there.

Ideally, I would like the output table to appear as follows: *This is not the actual data, but I included it to show the desired format of the output. Item ID min_p r1_upper (r2 bins) r3_lower max_p r1_count r2_ct SOM 553 2.00 2.16 saving space 2.33 2.50 2 1 SOM 554 2.13 2.48 2.88 3.25 1 0 SOM 555 2.31 2.51 2.72 2.92 3 2

*The format of the output table is off, but I have item ID, the bins, and the counts across the top grouped by item


Aucun commentaire:

Enregistrer un commentaire