We have the following table structure, which is a representation of a bus route where passengers get on and off the bus with a door sensor:
Column | Data Type | Allow Nulls |
---|---|---|
ROUTE | char(4) | N |
StopNumber | int | N |
ONS | int | Y |
OFFS | int | Y |
SPOT_CHECK | int | Y |
Field SPOT_CHECK
represents a person who sits on that bus with a clipboard
holding a spot count.
Sample data:
ROUTE | StopNumber | ONS | OFFS | SPOT_CHECK |
---|---|---|---|---|
Patras No. 2 | 1 | 5 | 0 | NULL |
Patras No. 2 | 2 | 0 | 0 | NULL |
Patras No. 2 | 3 | 2 | 1 | NULL |
Patras No. 2 | 4 | 6 | 3 | 8 |
Patras No. 2 | 5 | 1 | 0 | NULL |
Patras No. 2 | 6 | 0 | 1 | 7 |
Patras No. 2 | 7 | 0 | 3 | NULL |
Our aim:
Add a columnLOAD
to this table that calculates the load at each stop,
according to the following logic:
LOAD = if (SPOT_CHECK is null)
{
Previous stops load + Current stop ONS - Current stop's OFFS
}
else SPOT_CHECK
So, the expected results are:
ROUTE | StopNumber | ONS | OFFS | SPOT_CHECK | LOAD |
---|---|---|---|---|---|
Patras No. 2 | 1 | 5 | 0 | NULL | 5 |
Patras No. 2 | 2 | 0 | 0 | NULL | 5 |
Patras No. 2 | 3 | 2 | 1 | NULL | 6 |
Patras No. 2 | 4 | 6 | 3 | 8 | 8 |
Patras No. 2 | 5 | 1 | 0 | NULL | 9 |
Patras No. 2 | 6 | 0 | 1 | 7 | 7 |
Patras No. 2 | 7 | 0 | 3 | NULL | 4 |
We can implement the logic described using a cursor, but, the question is, can we solve this using a set based approach?
The tricky part:
Looks like a running total onONS
and OFF
fields
can solve the problem. Alas, this running total is being reset each time a not null SPOT_CHECK
value is met.
Key idea:
Using a gaps and islands approach, identify sub-groups of consecutive records where a running total can be applied in order to calculateLOAD
.
Query based on the key idea:
SELECT ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK,
COALESCE(SPOT_CHECK, ONS - OFFS) AS ld,
SUM(CASE
WHEN SPOT_CHECK IS NULL THEN 0
ELSE 1
END)
OVER (PARTITION BY ROUTE ORDER BY StopNumber) AS grp
FROM BusLoad
Output:
ROUTE | StopNumber | ONS | OFFS | SPOT_CHECK | LOAD | ld | grp |
---|---|---|---|---|---|---|---|
Patras No. 2 | 1 | 5 | 0 | NULL | 5 | 5 | 0 |
Patras No. 2 | 2 | 0 | 0 | NULL | 5 | 0 | 0 |
Patras No. 2 | 3 | 2 | 1 | NULL | 6 | 1 | 0 |
Patras No. 2 | 4 | 6 | 3 | 8 | 8 | 8 | 1 |
Patras No. 2 | 5 | 1 | 0 | NULL | 9 | 1 | 1 |
Patras No. 2 | 6 | 0 | 1 | 7 | 7 | 7 | 2 |
Patras No. 2 | 7 | 0 | 3 | NULL | 4 | -3 | 2 |
All we want now is the running total of ld
over [ROUTE, grp]
partitions of data:
;WITH CTE AS (
...
previous query here
...
)
SELECT ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK,
SUM(ld) OVER (PARTITION BY ROUTE, grp ORDER BY StopNumber) AS LOAD
FROM cte
Click here for a live demo.
Final note: The above query works for SQL Server versions starting from 2012. If you want a query
for 2008 you have to somehow simulate sum() over (order by ...)
.
No comments:
Post a Comment