Store the result of a select query into an array variable

Consider the following SELECT query:

SELECT CourseID FROM Course

with data:

CourseID
112
101
125
173
145
152
160

We try to store the above result set into a variable. We try it like this:

DECLARE @a VARCHAR(MAX)
SELECT @a=CourseID FROM Course

Giving a PRINT @a returns just the last value: 160. So, obviously, we cannot store 7 discrete values inside a scalar variable. We can use a table variable instead:

DECLARE @a TABLE (id int)
INSERT INTO @a
SELECT CourseID FROM Course

As you might have guessed giving a PRINT @a this time returns a syntax error. The correct way to access @a is to treat it like an ordinary table:

SELECT id FROM @a

SqlDBM: Model revision history

In a previous post I presented sqlDBM, a really cool on-line tool for database modeling. You can use it to create the model of your SQL Server or MySQL RDBMS. Then use its forward engineering feature to generate an sql script that creates the whole of the database you have designed in the tool.

The tool has a very useful versioning feature: each time you click on the Save button and save the work on the model you are working with, the tool creates a new revision of the model. The picture below displays the revision history of my Payroll model so far:

You can easily open any revision you like by just clicking on the timestamp link. This is the latest revision of my database model:

What is really interesting on the screenshot above is that the tool, at the end of my last session, has automatically saved a draft of some changes I made after pressing the Save button. By pressing the Load Draft button I can load my work as it was just before I left the tool and continue my work exactly from the point I left it.

SqlDBM: A cool new tool for SQL Server Database design

I recently discovered a really cool new on-line tool for SQL Database design. The site hosting the tool is called SQL Database Modeler. After loging in with my Facebook account I started creating a new schema for SQL Server database. The GUI of the tool is intuitive and really easy to handle for someone accustomed to using desktop tools for database modelling like ErWin Data modeller.

The picture shows the project I've created: it's called Payroll and contains just one table, Employee.

Now I can start modeling my new SQL Server database! I can easiy login either from my desktop computer or my laptop at home, or my workstation at work and continue my modeling work from were I left it.

Calculate bus loads avoiding cursors

We have the following table structure, which is a representation of a bus route where passengers get on and off the bus with a door sensor:

Column Data Type Allow Nulls
ROUTE char(4) N
StopNumber int N
ONS int Y
OFFS int Y
SPOT_CHECK int Y

Field SPOT_CHECK represents a person who sits on that bus with a clipboard holding a spot count.

Sample data:

ROUTE StopNumber ONS OFFS SPOT_CHECK
Patras No. 2 1 5 0 NULL
Patras No. 2 2 0 0 NULL
Patras No. 2 3 2 1 NULL
Patras No. 2 4 6 3 8
Patras No. 2 5 1 0 NULL
Patras No. 2 6 0 1 7
Patras No. 2 7 0 3 NULL

Our aim:

Add a column LOAD to this table that calculates the load at each stop, according to the following logic:

LOAD = if (SPOT_CHECK is null)
       {
          Previous stops load + Current stop ONS - Current stop's OFFS
       }
       else SPOT_CHECK       

So, the expected results are:

ROUTE StopNumber ONS OFFS SPOT_CHECK LOAD
Patras No. 2 1 5 0 NULL 5
Patras No. 2 2 0 0 NULL 5
Patras No. 2 3 2 1 NULL 6
Patras No. 2 4 6 3 8 8
Patras No. 2 5 1 0 NULL 9
Patras No. 2 6 0 1 7 7
Patras No. 2 7 0 3 NULL 4

We can implement the logic described using a cursor, but, the question is, can we solve this using a set based approach?

The tricky part:

Looks like a running total on ONS and OFF fields can solve the problem. Alas, this running total is being reset each time a not null SPOT_CHECK value is met.

Key idea:

Using a gaps and islands approach, identify sub-groups of consecutive records where a running total can be applied in order to calculate LOAD.

Query based on the key idea:

SELECT ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK,
       COALESCE(SPOT_CHECK, ONS - OFFS) AS ld,
       SUM(CASE 
              WHEN SPOT_CHECK IS NULL THEN 0 
              ELSE 1 
           END)
       OVER (PARTITION BY ROUTE ORDER BY StopNumber) AS grp
FROM BusLoad

Output:

ROUTE StopNumber ONS OFFS SPOT_CHECK LOAD ld grp
Patras No. 2 1 5 0 NULL 5 5 0
Patras No. 2 2 0 0 NULL 5 0 0
Patras No. 2 3 2 1 NULL 6 1 0
Patras No. 2 4 6 3 8 8 8 1
Patras No. 2 5 1 0 NULL 9 1 1
Patras No. 2 6 0 1 7 7 7 2
Patras No. 2 7 0 3 NULL 4 -3 2

All we want now is the running total of ld over [ROUTE, grp] partitions of data:

;WITH CTE AS (
    ...
    previous query here
    ...
    )
    SELECT ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK, 
           SUM(ld) OVER (PARTITION BY ROUTE, grp ORDER BY StopNumber) AS LOAD
    FROM cte

Click here for a live demo.

Final note: The above query works for SQL Server versions starting from 2012. If you want a query for 2008 you have to somehow simulate sum() over (order by ...).

Split alphanumeric string into numeric and alpha parts in SQL Server

Lets say we have an alphanumeric string stored in a variable in SQL Server. The fist part of the string is numeric and the second part consists of letters and/or symbols specifying a unit or a percentage:

DECLARE @str VARCHAR(MAX) = '4000 ug/ML'

We now want to split the numeric / unit parts from the variable and return them as separate fields.

If the numeric part is always at the beginning, then we can use this expression:

PATINDEX('%[0-9][^0-9]%', @str)

to get the index of the last digit. To see how this works let us have a look at the manual for PATINDEX:

Returns the starting position of the first occurrence of a pattern in a specified expression, or zeros if the pattern is not found, on all valid text and character data types.

The pattern specified catches the first occurrence of a number, [0-9], followed by a character that is not a number: [^0-9]. In other words, it returns the index of the last number.

Thus, this:

SELECT LEFT(@str, PATINDEX('%[0-9][^0-9]%', @str )) AS Number,
       LTRIM(RIGHT(@str, LEN(@str) - PATINDEX('%[0-9][^0-9]%', @str ))) As Unit

returns the following:

Number Unit
4000 ug/ML

Click here for a live demo.

If numeric data include double values as well, like:

DECLARE @str VARCHAR(MAX) = '4000.35 ug/ML'

In this case we can use:

SELECT LEN(@str) - PATINDEX ('%[^0-9][0-9]%', REVERSE(@str))

Using PATINDEX on the REVERSE order of the string value gets us the index of the first digit from the end of the string that follows a character that is not a number. In our example the SELECT returns 7 indicating the index of number 5 from the end of the string.

Thus, this:

SELECT LEFT(@str, LEN(@str) - PATINDEX ('%[^0-9][0-9]%', REVERSE(@str)))

gives us the numeric part, and this:

SELECT LEFT(@str, LEN(@str) - PATINDEX ('%[^0-9][0-9]%', REVERSE(@str))) AS Numeric,
       CASE 
          WHEN CHARINDEX ('%', @str) <> 0 THEN LTRIM(RIGHT(@str, LEN(@str) - CHARINDEX ('%', @str)))
          ELSE LTRIM(RIGHT(@str, PATINDEX ('%[^0-9][0-9]%', REVERSE(@str))))
       END AS Unit

gives us both numeric and unit part.

Click here for a live demo.

How to fix 'Parameters were not supplied for the function' error in SQL Server

SQL Server provides a feature called Table Valued Functions. These are user-defined functions that return a table data type and can be powerful alternatives to views.

Lets try to define a simple function of this kind using the following code:

CREATE FUNCTION X
(
    -- Add the parameters for the function here
    @a nvarchar(30), 
    @b nvarchar(30)
)
RETURNS 
@X TABLE 
(
    -- Add the column definitions for the TABLE variable here
    data nvarchar(250)
)
AS
BEGIN

    INSERT INTO X (data) 
    VALUES (@a + @b)    

    RETURN 
END
GO

The function just returns a table that contains the concatenation of two strings. We now try to use the function:

SELECT *
FROM X('Hello', 'World')

We expect to get a table that contains a single record but instead we get the following error message:

Parameters were not supplied for the function 'X'.

Click here for a live demo of the error.

The problem with our code is that it declares a table variable to be returned by the function that has the same name as the function. This is syntactically correct but then, instead of doing an INSERT into the table variable, it uses the function name, which happens to be the same.

What this type of, difficult to decipher, error teaches us is that it is much cleaner to use a different name for the table variable to be returned.

CREATE FUNCTION X
(
    -- Add the parameters for the function here
    @a nvarchar(30), 
    @b nvarchar(30)
)
RETURNS 
@output TABLE 
(
    -- Add the column definitions for the TABLE variable here
    data nvarchar(250)
)
AS
BEGIN

    INSERT INTO @output (data) 
    VALUES (@a + @b)    

    RETURN 
END
GO

Click here for a live demo.

Pivot a table in SQL Server that contains yearly data

Consider the following table recording expenses:

CREATE TABLE dbo.Expenses
(
    ExpenseId INT NOT NULL IDENTITY (1,1) PRIMARY KEY,
    ExpenseType VARCHAR(20) NOT NULL,
    Amount DECIMAL(6,2) NOT NULL,
    ExpenseDate DATE NOT NULL
)   

We populate the table with the following data:

ExpenseId ExpenseType Amount ExpenseDate
1 Rent 340.50 2017-01-01
2 Food 23.50 2017-01-03
3 Food 27.50 2017-04-08
4 Gasoline 50.30 2017-01-05
5 Food 27.50 2016-01-08
6 Rent 340.50 2016-01-01
7 Food 85.00 2015-07-17
8 Gasoline 48.40 2015-12-15
9 Furniture 540.00 2015-09-22
10 Rent 335.00 2015-01-01

Our aim is to get a pivoted view of this table:

ExpenseType 2015 2016 2017
Food 540.00 0.00 0.00
Furniture 85.00 27.50 51.00
Gasoline 48.40 0.00 50.30
Rent 335.00 340.50 340.50

If the list of years is static, in other words if we know before-hand the list of years to be taken into account, then we can make use of the PIVOT relational operator:

PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output.

Here, we want to turn the unique year values of column ExpenseDate:

  • 2015
  • 2016
  • 2017

into multiple columns and perform aggregation on column Amount. This operation will be performed for each value of column ExpenseType.

To get the year values of ExpenseDate column we use SQL Server's YEAR built-in function. So, the table-valued expression that PIVOT will operate on looks like this:

SELECT ExpenseType, YEAR(ExpenseDate) AS ExpenseYear, Amount
FROM Expenses

PIVOT is performed for column ExpenseYear, whereas Amount column is the one to be aggregated using SUM aggregate function:

PIVOT
( 
   SUM(Amount)
   FOR ExpenseYear IN ([2015], [2016], [2017])
) AS PivotTable

Hence, the query we want looks like this:

SELECT ExpenseType, 
       COALESCE([2015], 0) AS [2015], 
       COALESCE([2016], 0) AS [2016], 
       COALESCE([2017], 0) AS [2017]
FROM (SELECT ExpenseType, YEAR(ExpenseDate) AS ExpenseYear, Amount
      FROM Expenses) AS SourceTable
PIVOT
( 
   SUM(Amount)
   FOR ExpenseYear IN ([2015], [2016], [2017])
) AS PivotTable

Note the usage of COALESCE around each year column: in case there are no values for a specific Expense Type - ExpenseDate pair, then 0 is produced in the output.

Click here for a live demo.

How to record the current date/time value in SQL Server?

Say we have the following table in SQL Server recording incoming payments:

CREATE TABLE Payment
(
   PaymentId INT NOT NULL IDENTITY (1,1) PRIMARY KEY,
   PaymentType VARCHAR(15) NOT NULL,
   Amount DECIMAL(6,2),
   PaymentDate DATETIME NOT NULL
)

We would like to record in PaymentDate the date and time value each payment record is inserted into the table.

One way to do so is to use the current date/time timestamp when a new record is inserted into the table. So, inserting a new record could look like this:

INSERT INTO Payment (PaymentType, Amount, PaymentDate)
VALUES
('Payment Owed', 20.12, GETDATE()),
('Gift', 15.47, GETDATE())

This INSERT statement uses GETDATE built-in function, which

Returns the current database system timestamp as a datetime value without the database time zone offset. This value is derived from the operating system of the computer on which the instance of SQL Server is running.

The table after this INSERT takes place looks like:

PaymentId PaymentType Amount PaymentDate
1 Payment Owed 20.12 2017-05-28 09:24:12.237
2 Gift 15.47 2017-05-28 09:24:12.237

A more convenient way around this problem would be to add a default constraint to PaymentDate column:

ALTER TABLE Payments
ADD CONSTRAINT df_CurrentDateTime
DEFAULT CURRENT_TIMESTAMP FOR [Payment_Date]

The constaint is just a default value for column PaymentDate. This value is the value returned by built-in function CURRENT_TIMESTAMP:

This function is the ANSI SQL equivalent to GETDATE.

So, the function does the same thing with GETDATE. If we add the constraint, the INSERT statement is simplified to:

INSERT INTO Payment (PaymentType, Amount)
VALUES
('Payment Owed', 20.12),
('Gift', 15.47)

Comparing two decimal values in SQL Server

Lets say we declare and initialize two decimal variables in SQL Server like this:

DECLARE @num1 DECIMAL = 1.98
DECLARE @num2 DECIMAL = 2.2

then we use the following piece of code to compare the two values:

IF (@num1 != @num2)
   SELECT 0
ELSE IF (@num1 = @num2)
   SELECT 1

Contrary to what one might expect the above code returns a 1, implying the two values are equal! Click here for a live demo.

This is due to the fact that we have not included the precision and scale values in our declaration of @num1, @num2 variables. Hence, default values are being used.

According to the documentation:

Scale can be specified only if precision is specified. The default scale is 0; therefore, 0 <= s <= p. Maximum storage sizes vary, based on the precision.

So the scale of both variables defaults to 0 and thus both variables are being set to 2.

We can easily get around the problem by using declaration statements like:

DECLARE @num1 DECIMAL(10,2) = 1.98
DECLARE @num2 DECIMAL(10,2) = 2.2

Identify record sequences matching a predefined pattern in SQL Server

We have a set of integer values that define a pattern: {5, 2, 6, 8}.

Our aim is to identify record sequences that match this pattern and return them with a SELECT query.

With the following table as input:

Id Val
1 5
2 2
3 6
4 8
5 5
6 2
7 5
8 2
9 6
10 8

we want to produce the following output:

Id Val
1 5
2 2
3 6
4 8
7 5
8 2
9 6
10 8

Field Id is an auto-increment primary key that uniquely identifies table records and also determines row order.

The first thing we need to do is add sequence numbers to the set that defines the pattern. We can use the following query wrapped in a Common Table Expression (CTE) to do so:

;WITH Seq AS (
    SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
    FROM (VALUES(1, 5), (2, 2), (3, 6), (4, 8)) x(k,v)
)

Output:

v rn
5 1
2 2
6 3
8 4

Using the above CTE we can identify islands, i.e. slices of sequential rows containing the whole of the sequence:

;WITH Seq AS (
    SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
    FROM (VALUES(1, 5), (2, 2), (3, 6), (4, 8)) x(k,v)
), Grp AS (
SELECT Id, Val, 
       ROW_NUMBER() OVER (ORDER BY Id) - rn AS grp            
FROM mytable AS m
LEFT JOIN Seq AS s ON m.Val = s.v
)
SELECT *
FROM Grp;

Output:

Id Val grp
1 5 0
2 2 0
3 6 0
4 8 0
5 5 4
6 2 4
7 5 6
8 2 6
9 6 6
10 8 6

All we need to do now is to just filter out partial groups:

;WITH Seq AS (
    SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
    FROM (VALUES(1, 5), (2, 2), (3, 6), (4, 8)) x(k,v)
), Grp AS (
SELECT Id, Val, 
       ROW_NUMBER() OVER (ORDER BY Id) - rn AS grp            
FROM mytable AS m
LEFT JOIN Seq AS s ON m.Val = s.v
)
SELECT g1.Id, g1.Val
FROM Grp AS g1
INNER JOIN (
   SELECT grp
   FROM Grp
   GROUP BY grp
   HAVING COUNT(*) = 4 ) AS g2
ON g1.grp = g2.grp

Click here for a live demo.

Create a Cartesian product of all records of a table using a set of predefined values

Suppose we have a table like below in SQL Server:

CREATE TABLE dbo.mytable
(
    Name VARCHAR(20),
    Val INT
)

and we populate it using:

INSERT INTO dbo.mytable
VALUES
('A', 10),
('B', 5)
('C', 12)

We now want to create a Cartesian product of those records using a small set of date values

Date
2017-01-01
2017-02-01
2017-03-01

Using the above set of date values the required output is:

Name Val Date
A 10 2017-01-01
A 10 2017-02-01
A 10 2017-03-01
B 5 2017-01-01
B 5 2017-02-01
B 5 2017-03-01
C 12 2017-01-01
C 12 2017-02-01
C 12 2017-03-01

The first step in solving our problem is to create an in-line table containing the date values making use of a Table Value Constructor.

To create the Cartesian product we can use a CROSS JOIN that does exactly this:

A cross join that does not have a WHERE clause produces the Cartesian product of the tables involved in the join. The size of a Cartesian product result set is the number of rows in the first table multiplied by the number of rows in the second table.

So, our query looks like this:

SELECT  Name, Val, x.myDate
FROM mytable
CROSS JOIN (VALUES ('2017-01-01'), 
                   ('2017-02-01'), 
                   ('2017-03-01')) x(myDate)
ORDER BY Name, myDate
Click here for a live demo.

Featured Post

Store the result of a select query into an array variable

Popular Posts