How can I solve this in SQL?: gaps and islands

We have the following table structure, which is a representation of a bus route where passengers get on and off the bus with a door sensor:

Column	Data Type	Allow Nulls
ROUTE	char(4)	N
StopNumber	int	N
ONS	int	Y
OFFS	int	Y
SPOT_CHECK	int	Y

Field SPOT_CHECK represents a person who sits on that bus with a clipboard holding a spot count.

Sample data:

ROUTE	StopNumber	ONS	OFFS	SPOT_CHECK
Patras No. 2	1	5	0	NULL
Patras No. 2	2	0	0	NULL
Patras No. 2	3	2	1	NULL
Patras No. 2	4	6	3	8
Patras No. 2	5	1	0	NULL
Patras No. 2	6	0	1	7
Patras No. 2	7	0	3	NULL

Our aim:

Add a column LOAD to this table that calculates the load at each stop, according to the following logic:

LOAD = if (SPOT_CHECK is null)
       {
          Previous stops load + Current stop ONS - Current stop's OFFS
       }
       else SPOT_CHECK

So, the expected results are:

ROUTE	StopNumber	ONS	OFFS	SPOT_CHECK	LOAD
Patras No. 2	1	5	0	NULL	5
Patras No. 2	2	0	0	NULL	5
Patras No. 2	3	2	1	NULL	6
Patras No. 2	4	6	3	8	8
Patras No. 2	5	1	0	NULL	9
Patras No. 2	6	0	1	7	7
Patras No. 2	7	0	3	NULL	4

We can implement the logic described using a cursor, but, the question is, can we solve this using a set based approach?

The tricky part:

Looks like a running total on ONS and OFF fields can solve the problem. Alas, this running total is being reset each time a not null SPOT_CHECK value is met.

Key idea:

Using a gaps and islands approach, identify sub-groups of consecutive records where a running total can be applied in order to calculate LOAD.

Query based on the key idea:

SELECT ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK,
       COALESCE(SPOT_CHECK, ONS - OFFS) AS ld,
       SUM(CASE 
              WHEN SPOT_CHECK IS NULL THEN 0 
              ELSE 1 
           END)
       OVER (PARTITION BY ROUTE ORDER BY StopNumber) AS grp
FROM BusLoad

Output:

ROUTE	StopNumber	ONS	OFFS	SPOT_CHECK	LOAD	ld	grp
Patras No. 2	1	5	0	NULL	5	5	0
Patras No. 2	2	0	0	NULL	5	0	0
Patras No. 2	3	2	1	NULL	6	1	0
Patras No. 2	4	6	3	8	8	8	1
Patras No. 2	5	1	0	NULL	9	1	1
Patras No. 2	6	0	1	7	7	7	2
Patras No. 2	7	0	3	NULL	4	-3	2

All we want now is the running total of ld over [ROUTE, grp] partitions of data:

;WITH CTE AS (
    ...
    previous query here
    ...
    )
    SELECT ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK, 
           SUM(ld) OVER (PARTITION BY ROUTE, grp ORDER BY StopNumber) AS LOAD
    FROM cte

Click here for a live demo.

Final note: The above query works for SQL Server versions starting from 2012. If you want a query for 2008 you have to somehow simulate sum() over (order by ...).

Id	Val
1	5
2	2
3	6
4	8
5	5
6	2
7	5
8	2
9	6
10	8

Val

Id	Val
1	5
2	2
3	6
4	8
7	5
8	2
9	6
10	8

v	rn
5	1
2	2
6	3
8	4

;WITH Seq AS ( SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn FROM (VALUES(1, 5), (2, 2), (3, 6), (4, 8)) x(k,v) ), Grp AS ( SELECT Id, Val, ROW_NUMBER() OVER (ORDER BY Id) - rn AS grp FROM mytable AS m LEFT JOIN Seq AS s ON m.Val = s.v ) SELECT * FROM Grp;

Id	Val	grp
1	5	0
2	2	0
3	6	0
4	8	0
5	5	4
6	2	4
7	5	6
8	2	6
9	6	6
10	8	6

Val

grp

How can I solve this in SQL?

Calculate bus loads avoiding cursors

Our aim:

The tricky part:

Key idea:

Query based on the key idea:

Output:

Identify record sequences matching a predefined pattern in SQL Server

Featured Post

Store the result of a select query into an array variable

Popular Posts

Labels