We have the following table structure, which is a representation of a bus route
where passengers get on and off the bus with a door sensor:
Column |
Data Type |
Allow Nulls |
ROUTE |
char(4) |
N |
StopNumber |
int |
N |
ONS |
int |
Y |
OFFS |
int |
Y |
SPOT_CHECK |
int |
Y |
Field SPOT_CHECK
represents a person who sits on that bus with a clipboard
holding a spot count.
Sample data:
ROUTE |
StopNumber |
ONS |
OFFS |
SPOT_CHECK |
Patras No. 2 |
1 |
5 |
0 |
NULL |
Patras No. 2 |
2 |
0 |
0 |
NULL |
Patras No. 2 |
3 |
2 |
1 |
NULL |
Patras No. 2 |
4 |
6 |
3 |
8 |
Patras No. 2 |
5 |
1 |
0 |
NULL |
Patras No. 2 |
6 |
0 |
1 |
7 |
Patras No. 2 |
7 |
0 |
3 |
NULL |
Our aim:
Add a column
LOAD
to this table that calculates the load at each stop,
according to the following logic:
LOAD = if (SPOT_CHECK is null)
{
Previous stops load + Current stop ONS - Current stop's OFFS
}
else SPOT_CHECK
So, the expected results are:
ROUTE |
StopNumber |
ONS |
OFFS |
SPOT_CHECK |
LOAD |
Patras No. 2 |
1 |
5 |
0 |
NULL |
5 |
Patras No. 2 |
2 |
0 |
0 |
NULL |
5 |
Patras No. 2 |
3 |
2 |
1 |
NULL |
6 |
Patras No. 2 |
4 |
6 |
3 |
8 |
8 |
Patras No. 2 |
5 |
1 |
0 |
NULL |
9 |
Patras No. 2 |
6 |
0 |
1 |
7 |
7 |
Patras No. 2 |
7 |
0 |
3 |
NULL |
4 |
We can implement the logic described using a cursor, but, the question is, can we solve this
using a set based approach?
The tricky part:
Looks like a running total on
ONS
and
OFF
fields
can solve the problem. Alas, this running total is being reset each time a not null
SPOT_CHECK
value is met.
Key idea:
Using a gaps and islands approach, identify sub-groups of consecutive records where a
running total can be applied in order to calculate
LOAD
.
Query based on the key idea:
SELECT ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK,
COALESCE(SPOT_CHECK, ONS - OFFS) AS ld,
SUM(CASE
WHEN SPOT_CHECK IS NULL THEN 0
ELSE 1
END)
OVER (PARTITION BY ROUTE ORDER BY StopNumber) AS grp
FROM BusLoad
Output:
ROUTE |
StopNumber |
ONS |
OFFS |
SPOT_CHECK |
LOAD |
ld |
grp |
Patras No. 2 |
1 |
5 |
0 |
NULL |
5 |
5 |
0 |
Patras No. 2 |
2 |
0 |
0 |
NULL |
5 |
0 |
0 |
Patras No. 2 |
3 |
2 |
1 |
NULL |
6 |
1 |
0 |
Patras No. 2 |
4 |
6 |
3 |
8 |
8 |
8 |
1 |
Patras No. 2 |
5 |
1 |
0 |
NULL |
9 |
1 |
1 |
Patras No. 2 |
6 |
0 |
1 |
7 |
7 |
7 |
2 |
Patras No. 2 |
7 |
0 |
3 |
NULL |
4 |
-3 |
2 |
All we want now is the running total of ld
over [ROUTE, grp]
partitions of data:
;WITH CTE AS (
...
previous query here
...
)
SELECT ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK,
SUM(ld) OVER (PARTITION BY ROUTE, grp ORDER BY StopNumber) AS LOAD
FROM cte
Click here for a live demo.
Final note: The above query works for SQL Server versions starting from 2012. If you want a query
for 2008 you have to somehow simulate sum() over (order by ...)
.