Showing posts with label Window Functions. Show all posts
Showing posts with label Window Functions. Show all posts

Calculate bus loads avoiding cursors

We have the following table structure, which is a representation of a bus route where passengers get on and off the bus with a door sensor:

Column Data Type Allow Nulls
ROUTE char(4) N
StopNumber int N
ONS int Y
OFFS int Y
SPOT_CHECK int Y

Field SPOT_CHECK represents a person who sits on that bus with a clipboard holding a spot count.

Sample data:

ROUTE StopNumber ONS OFFS SPOT_CHECK
Patras No. 2 1 5 0 NULL
Patras No. 2 2 0 0 NULL
Patras No. 2 3 2 1 NULL
Patras No. 2 4 6 3 8
Patras No. 2 5 1 0 NULL
Patras No. 2 6 0 1 7
Patras No. 2 7 0 3 NULL

Our aim:

Add a column LOAD to this table that calculates the load at each stop, according to the following logic:

LOAD = if (SPOT_CHECK is null)
       {
          Previous stops load + Current stop ONS - Current stop's OFFS
       }
       else SPOT_CHECK       

So, the expected results are:

ROUTE StopNumber ONS OFFS SPOT_CHECK LOAD
Patras No. 2 1 5 0 NULL 5
Patras No. 2 2 0 0 NULL 5
Patras No. 2 3 2 1 NULL 6
Patras No. 2 4 6 3 8 8
Patras No. 2 5 1 0 NULL 9
Patras No. 2 6 0 1 7 7
Patras No. 2 7 0 3 NULL 4

We can implement the logic described using a cursor, but, the question is, can we solve this using a set based approach?

The tricky part:

Looks like a running total on ONS and OFF fields can solve the problem. Alas, this running total is being reset each time a not null SPOT_CHECK value is met.

Key idea:

Using a gaps and islands approach, identify sub-groups of consecutive records where a running total can be applied in order to calculate LOAD.

Query based on the key idea:

SELECT ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK,
       COALESCE(SPOT_CHECK, ONS - OFFS) AS ld,
       SUM(CASE 
              WHEN SPOT_CHECK IS NULL THEN 0 
              ELSE 1 
           END)
       OVER (PARTITION BY ROUTE ORDER BY StopNumber) AS grp
FROM BusLoad

Output:

ROUTE StopNumber ONS OFFS SPOT_CHECK LOAD ld grp
Patras No. 2 1 5 0 NULL 5 5 0
Patras No. 2 2 0 0 NULL 5 0 0
Patras No. 2 3 2 1 NULL 6 1 0
Patras No. 2 4 6 3 8 8 8 1
Patras No. 2 5 1 0 NULL 9 1 1
Patras No. 2 6 0 1 7 7 7 2
Patras No. 2 7 0 3 NULL 4 -3 2

All we want now is the running total of ld over [ROUTE, grp] partitions of data:

;WITH CTE AS (
    ...
    previous query here
    ...
    )
    SELECT ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK, 
           SUM(ld) OVER (PARTITION BY ROUTE, grp ORDER BY StopNumber) AS LOAD
    FROM cte

Click here for a live demo.

Final note: The above query works for SQL Server versions starting from 2012. If you want a query for 2008 you have to somehow simulate sum() over (order by ...).

Get row numbers of records inside a group, also get the population of each group

Consider the following set of values:

PK ID State
1 1 TX
2 1 AZ
3 1 CA
4 1 NV
5 2 FL
6 2 AL
7 2 GA
8 3 NY
9 3 MA

Field PK is an auto-increment primary key, that uniquely identifies each record and also determines row order. Field ID groups together table records.

What we want to do here is to enumerate records inside each group, as well as to ouput the population of each group. This is the result we would like to see:

PK ID State Rn Cnt
1 1 TX 1 4
2 1 AZ 2 4
3 1 CA 3 4
4 1 NV 4 4
5 2 FL 1 3
6 2 AL 2 3
7 2 GA 3 3
8 3 NY 1 2
9 3 MA 2 2

In SQL Server this kind of result can be easily achieved using Window Functions. As their name implies, window functions operate on windows of data. We can determine how each window looks like using OVER clause. Using the following clause:

OVER (PARTITION BY ID ORDER BY PK)

produces these windows of data:

  1. (1, 1, 'TX')-> (2, 1, 'AZ')-> (3, 1, 'CA')-> (4, 1, 'NV')
  2. (5, 2, 'FL')-> (6, 2, 'AL')-> (7, 2, 'GA')
  3. (8, 3, 'NY')-> (9, 3, 'MA')

We can easily get at the desired output using ROW_NUMBER to enumerate the records inside each partition and windowed version of COUNT aggregate function to get the population of each partition:

SELECT ID, State, 
       ROW_NUMBER OVER (PARTITION BY ID ORDER BY PK) AS rn,
       COUNT(*) OVER (PARTITION BY ID) AS Cnt
FROM mytable

Featured Post

Store the result of a select query into an array variable

Popular Posts