Calculate bus loads avoiding cursors

We have the following table structure, which is a representation of a bus route where passengers get on and off the bus with a door sensor:

Column Data Type Allow Nulls
ROUTE char(4) N
StopNumber int N
ONS int Y
OFFS int Y
SPOT_CHECK int Y

Field SPOT_CHECK represents a person who sits on that bus with a clipboard holding a spot count.

Sample data:

ROUTE StopNumber ONS OFFS SPOT_CHECK
Patras No. 2 1 5 0 NULL
Patras No. 2 2 0 0 NULL
Patras No. 2 3 2 1 NULL
Patras No. 2 4 6 3 8
Patras No. 2 5 1 0 NULL
Patras No. 2 6 0 1 7
Patras No. 2 7 0 3 NULL

Our aim:

Add a column LOAD to this table that calculates the load at each stop, according to the following logic:

LOAD = if (SPOT_CHECK is null)
       {
          Previous stops load + Current stop ONS - Current stop's OFFS
       }
       else SPOT_CHECK       

So, the expected results are:

ROUTE StopNumber ONS OFFS SPOT_CHECK LOAD
Patras No. 2 1 5 0 NULL 5
Patras No. 2 2 0 0 NULL 5
Patras No. 2 3 2 1 NULL 6
Patras No. 2 4 6 3 8 8
Patras No. 2 5 1 0 NULL 9
Patras No. 2 6 0 1 7 7
Patras No. 2 7 0 3 NULL 4

We can implement the logic described using a cursor, but, the question is, can we solve this using a set based approach?

The tricky part:

Looks like a running total on ONS and OFF fields can solve the problem. Alas, this running total is being reset each time a not null SPOT_CHECK value is met.

Key idea:

Using a gaps and islands approach, identify sub-groups of consecutive records where a running total can be applied in order to calculate LOAD.

Query based on the key idea:

SELECT ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK,
       COALESCE(SPOT_CHECK, ONS - OFFS) AS ld,
       SUM(CASE 
              WHEN SPOT_CHECK IS NULL THEN 0 
              ELSE 1 
           END)
       OVER (PARTITION BY ROUTE ORDER BY StopNumber) AS grp
FROM BusLoad

Output:

ROUTE StopNumber ONS OFFS SPOT_CHECK LOAD ld grp
Patras No. 2 1 5 0 NULL 5 5 0
Patras No. 2 2 0 0 NULL 5 0 0
Patras No. 2 3 2 1 NULL 6 1 0
Patras No. 2 4 6 3 8 8 8 1
Patras No. 2 5 1 0 NULL 9 1 1
Patras No. 2 6 0 1 7 7 7 2
Patras No. 2 7 0 3 NULL 4 -3 2

All we want now is the running total of ld over [ROUTE, grp] partitions of data:

;WITH CTE AS (
    ...
    previous query here
    ...
    )
    SELECT ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK, 
           SUM(ld) OVER (PARTITION BY ROUTE, grp ORDER BY StopNumber) AS LOAD
    FROM cte

Click here for a live demo.

Final note: The above query works for SQL Server versions starting from 2012. If you want a query for 2008 you have to somehow simulate sum() over (order by ...).

No comments:

Post a Comment

Featured Post

Store the result of a select query into an array variable

Popular Posts