Monday, March 26, 2012

Passing in variable number of parameters to a stored procedure

I am fairly new to MSSQL. Looking for a answer to a simple question.

I have a application which passes in lot of stuff from the UI into a stored procedure that has to be inserted into a MSSQL 2005 database. All the information that is passed will be spilt into 4 inserts hitting 4 seperate tables. All 4 inserts will be part of a stored procedure that have to be in one TRANSACTION. All but one insert are straight forward.

The structure of this table is something like

PKID
customerID
email address
....

customerID is not unique and can have n email addresses passed in. Each entry into this table when inserted into, will be passed n addresses (The number of email addresses passed is controlled by the user. It can be from 1..n). Constructing dynamic SQL is not an option. The SP to insert all the data is already in place. Typically I would just create the SP with IN parameters that I will use to insert into tables. In this case I can't do that since the number of email addresses passed is dynamic. My question is what's the best way to design this SP, where n email addresses are passed and each of them will have to be passed into a seperate insert statement? I can think of two ways to this...

Is there a way to create a variable length array as a IN parameter to capture the n email addresses coming in and use them to construct multiple insert statements?

Is it possible to get all the n email addresses as a comma seperated string? I know this is possible, but I am not sure how to parse this string and capture the n email addresses into variables before I construct them into insert statements.

Any other ways to do this? ThanksFrom a relational perspective, the best answer is to write a single stored procedure that takes one email address and processes it, and then call that procedure N times from your UI. This is because a relational database is based on relational algebra, and while that processes sets well as output, it doesn't process them nearly as easily as input.

If you decide to pursue your original idea and use a delimited (probably comma separated) list, you can use fSplit (http://www.dbforums.com/t997070.html) which I posted here ages ago. This is cleaner from the UI perspective, but it will eventually byte you because of the poor fit with relational databases.

-PatP|||This is because a relational database is based on relational algebra, and while that processes sets well as output, it doesn't process them nearly as easily as input.Really? I never knew that.

What's the basic reasoning around that Pat?|||Ok - I had a few mins. Played around.

CREATE TABLE InsertTable
(
MyPKFld VarChar(5) PRIMARY KEY
)

DECLARE @.i AS SmallInt

SET NOCOUNT ON

DECLARE @.Insert AS VarChar(2000)

--SELECT @.Insert = '10001,10022,20099,15073,28948,18737,90273,27910,3 7891'
SELECT @.Insert = '10001,10022,15073,18737,20099,27910,28948,37891,9 0273'

SELECT @.i = 1

DECLARE @.LoopUpper AS TinyInt

SELECT @.LoopUpper = (SELECT COUNT(*) FROM dbo.Split(@.Insert, ','))

DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

DECLARE @.Start AS DateTime

SELECT @.Start = GETDATE()

WHILE @.i <= @.LoopUpper BEGIN

INSERT INTO InsertTable
SELECT Data
FROM dbo.Split(@.Insert, ',')
WHERE ID = @.i
SELECT @.i = @.i + 1

END

PRINT 'LOOP takes ' + CAST(DATEDIFF(ms, @.Start, GETDATE()) AS VarChar(4)) + 'ms'

DELETE
FROM InsertTable
WHERE MyPKFld > 10000

DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

SELECT @.Start = GETDATE()

INSERT INTO InsertTable
SELECT Data
FROM dbo.Split(@.Insert, ',')
PRINT 'SET takes ' + CAST(DATEDIFF(ms, @.Start, GETDATE()) AS VarChar(4)) + 'ms'

DROP TABLE InsertTable
Basically tests looping and inserting one record on each pass and inserting a set. I typically get the set at between 1/3 and 2/3 the time the loop takes. What have I missed?|||What's the basic reasoning around that Pat?In a set based environment (where sets are fully supported by both the language itself and the implementation), there's no issue. SQL as a language doesn't support passing sets in (at least it doesn't yet, the new draft standard has basic support for them).

What you are doing is passing a complex argument (more than one atomic element in a single argument). That is outside of relational algebra altogether since it violates first normal form. The reason it runs faster is that you're trading multiple calls in a relational solution for processing complexity in a code oriented solution. It certainly works, and at least for now it is faster, but eventually it will get to the point that it causes problems.

-PatP|||Thanks Pat
I don't think I totally get you. The csv string is not normalised. However the function (or whatever code one might run) normalises the input. As far as SQL Server is concerned, it might always have been a set.

It certainly works, and at least for now it is faster, but eventually it will get to the point that it causes problems.By this do you mean it will be a bugger to maintain or do you mean there will be some sort of technical problem over time? If the latter - what would that be?

Grateful for the education as ever :)|||When you create a non-normalized interface like this, you've broken one of the fundamental building block "contracts" between a client and server. That relationship is either relational, or it isn't relational, and any non-relational interface makes the relationship between client and server non-relational.

This kind of change can make sense when you're implementing a different paradyme such as OOP. When you do that, you leave the relational world behind, so the rigorous "proofs" of behavior no longer have any meaning, but that happens any time you switch from one paradyme to another.

There are lots of really fundamental characteristics involved in relational processing. These make it predictable, which in turn makes it dependable. While relational technology certainly isn't the best possible way to do things, it is the best that I've found so far that is widely commercially supported and clearly understood by many professionals.

There are thousands of ways this can go wrong (and I've personally tried several hundred of those ;)). One example would be that you could have an application start out as a single server implementation, grow to use a cluster, expand further to use a cloud of replicating servers... When something goes worng :o in the process of getting data from a web server client into the data cloud, you have to start using network monitors to find the problem since you can't rely on which app/web server will initiate the conversation and which database server will process it. If you have bundled multiple calls into one and then rely on the server to parse them, you can no longer predict what the data "payload" will be exactly, so you need to start doing moderately sophisticated pattern matching. The process gets ugly, really fast.

Not everyone will face this specific problem. Given sufficient time though, I'll guarantee that you'll hit some problem related to the bundling effect. If you are making a paradyme shift, and that shift buys you something substantial in terms of coding time, support, ability to use new features, etc. then it is certainly worth considering. If all it buys you is a slight performance gain in exchange for the predictability of the pure relational model, I'd be hard pressed to "green light" this change.

-PatP|||Thanks Pat - appreciated :D

As it happens - I asked this question some time ago and got one yay and one nay from two of your esteemed peers hence why I jumped on your answer.
EDIT - plus my initial reading of your answer went against everything I thought I knew about SQL.sql

No comments:

Post a Comment