Showing posts with label gaps and islands. Show all posts
Showing posts with label gaps and islands. Show all posts

Calculate bus loads avoiding cursors

We have the following table structure, which is a representation of a bus route where passengers get on and off the bus with a door sensor:

Column Data Type Allow Nulls
ROUTE char(4) N
StopNumber int N
ONS int Y
OFFS int Y
SPOT_CHECK int Y

Field SPOT_CHECK represents a person who sits on that bus with a clipboard holding a spot count.

Sample data:

ROUTE StopNumber ONS OFFS SPOT_CHECK
Patras No. 2 1 5 0 NULL
Patras No. 2 2 0 0 NULL
Patras No. 2 3 2 1 NULL
Patras No. 2 4 6 3 8
Patras No. 2 5 1 0 NULL
Patras No. 2 6 0 1 7
Patras No. 2 7 0 3 NULL

Our aim:

Add a column LOAD to this table that calculates the load at each stop, according to the following logic:

LOAD = if (SPOT_CHECK is null)
       {
          Previous stops load + Current stop ONS - Current stop's OFFS
       }
       else SPOT_CHECK       

So, the expected results are:

ROUTE StopNumber ONS OFFS SPOT_CHECK LOAD
Patras No. 2 1 5 0 NULL 5
Patras No. 2 2 0 0 NULL 5
Patras No. 2 3 2 1 NULL 6
Patras No. 2 4 6 3 8 8
Patras No. 2 5 1 0 NULL 9
Patras No. 2 6 0 1 7 7
Patras No. 2 7 0 3 NULL 4

We can implement the logic described using a cursor, but, the question is, can we solve this using a set based approach?

The tricky part:

Looks like a running total on ONS and OFF fields can solve the problem. Alas, this running total is being reset each time a not null SPOT_CHECK value is met.

Key idea:

Using a gaps and islands approach, identify sub-groups of consecutive records where a running total can be applied in order to calculate LOAD.

Query based on the key idea:

SELECT ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK,
       COALESCE(SPOT_CHECK, ONS - OFFS) AS ld,
       SUM(CASE 
              WHEN SPOT_CHECK IS NULL THEN 0 
              ELSE 1 
           END)
       OVER (PARTITION BY ROUTE ORDER BY StopNumber) AS grp
FROM BusLoad

Output:

ROUTE StopNumber ONS OFFS SPOT_CHECK LOAD ld grp
Patras No. 2 1 5 0 NULL 5 5 0
Patras No. 2 2 0 0 NULL 5 0 0
Patras No. 2 3 2 1 NULL 6 1 0
Patras No. 2 4 6 3 8 8 8 1
Patras No. 2 5 1 0 NULL 9 1 1
Patras No. 2 6 0 1 7 7 7 2
Patras No. 2 7 0 3 NULL 4 -3 2

All we want now is the running total of ld over [ROUTE, grp] partitions of data:

;WITH CTE AS (
    ...
    previous query here
    ...
    )
    SELECT ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK, 
           SUM(ld) OVER (PARTITION BY ROUTE, grp ORDER BY StopNumber) AS LOAD
    FROM cte

Click here for a live demo.

Final note: The above query works for SQL Server versions starting from 2012. If you want a query for 2008 you have to somehow simulate sum() over (order by ...).

Identify record sequences matching a predefined pattern in SQL Server

We have a set of integer values that define a pattern: {5, 2, 6, 8}.

Our aim is to identify record sequences that match this pattern and return them with a SELECT query.

With the following table as input:

Id Val
1 5
2 2
3 6
4 8
5 5
6 2
7 5
8 2
9 6
10 8

we want to produce the following output:

Id Val
1 5
2 2
3 6
4 8
7 5
8 2
9 6
10 8

Field Id is an auto-increment primary key that uniquely identifies table records and also determines row order.

The first thing we need to do is add sequence numbers to the set that defines the pattern. We can use the following query wrapped in a Common Table Expression (CTE) to do so:

;WITH Seq AS (
    SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
    FROM (VALUES(1, 5), (2, 2), (3, 6), (4, 8)) x(k,v)
)

Output:

v rn
5 1
2 2
6 3
8 4

Using the above CTE we can identify islands, i.e. slices of sequential rows containing the whole of the sequence:

;WITH Seq AS (
    SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
    FROM (VALUES(1, 5), (2, 2), (3, 6), (4, 8)) x(k,v)
), Grp AS (
SELECT Id, Val, 
       ROW_NUMBER() OVER (ORDER BY Id) - rn AS grp            
FROM mytable AS m
LEFT JOIN Seq AS s ON m.Val = s.v
)
SELECT *
FROM Grp;

Output:

Id Val grp
1 5 0
2 2 0
3 6 0
4 8 0
5 5 4
6 2 4
7 5 6
8 2 6
9 6 6
10 8 6

All we need to do now is to just filter out partial groups:

;WITH Seq AS (
    SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
    FROM (VALUES(1, 5), (2, 2), (3, 6), (4, 8)) x(k,v)
), Grp AS (
SELECT Id, Val, 
       ROW_NUMBER() OVER (ORDER BY Id) - rn AS grp            
FROM mytable AS m
LEFT JOIN Seq AS s ON m.Val = s.v
)
SELECT g1.Id, g1.Val
FROM Grp AS g1
INNER JOIN (
   SELECT grp
   FROM Grp
   GROUP BY grp
   HAVING COUNT(*) = 4 ) AS g2
ON g1.grp = g2.grp

Click here for a live demo.

Featured Post

Store the result of a select query into an array variable

Popular Posts