Friday, March 23, 2012

Passing data between multiple Data Flows

OK, it's the first of the month...that must mean it's time for another dumb question!

I'm trying to "componentize" an SSIS package with multiple Data Flows, and I can't figure out how to get the output of the results from one Data Flow into another.

So, for example, at the end of one Data Flow, I have a Recordset destination, and I'm storing that into a variable. But I can't figure out how to access the contents of that variable in the following Data Flow. Can this be done, or am I going about this the wrong way?The best is to use a raw file destination and source.

You could use a script component to store the data in a variable however you want and then use a script component to retrieve the data.|||If you want to try the second option that Simon mentioed then there's some code here that might help you extract data from a variable.
http://blogs.conchango.com/jamiethomson/archive/2005/02/08/960.aspx

-Jamie|||

Jamie, thanks for the link, but it doesn't seem to apply in my situation, as I'm not sending the recordset to a Script Task, but rather a Script Component (as a source) in a Data Flow.

What I'm trying to do is break a package down into four or five separate Data Flows that I can enable/disable and route around as necessary--essentially creating a template package that I can apply to many different scenarios.

It seems like it should be a trivial thing to dump the data at the end of one Data Flow into a recordset and retrieve it at the beginning of another, but I can't figure out how to do it.

I suppose using raw files is one answer, but it seems like there has to be a more elegant solution, without writing files to disk.

|||John,
Use the recordset destination to dump the data into a recordset in memory. The code I linked to above can easily be used in a script source component to loop through that recordset and put values into the pipeline.

Your next question may be "if there's a Recordset Destination, why isn't there a Recordset Source?" The reason is that SSIS needs to know the metadata of the dataflow at design time which wouldn't be possible with a recordset Source.

-Jamie|||Jamie

But whats the point of being able to put it in and not get it out. Essentially your working round the fact so why can't SSIS.

Having a recordset source that you define the columns, whats wrong with that. Ok so it can't validate until execution time and that needs to be understood. Not providing a source component people are just going to write their own as you sort of have done.|||

Will there be a "typed dataset source" somewhere in the near future? The current situation with writing data, say half a gig, to disk only to read it again seconds later does not sound very performant. And writing one's own metadata in scriptcomponent or scripttask (in a language other than the preferred) without IDE support isn't a killer feature either :-/

With best regards,

Artus

|||

ArtusKG wrote:

Will there be a "typed dataset source" somewhere in the near future? The current situation with writing data, say half a gig, to disk only to read it again seconds later does not sound very performant. And writing one's own metadata in scriptcomponent or scripttask (in a language other than the preferred) without IDE support isn't a killer feature either :-/

With best regards,

Artus

It may not sound very performant but have you tried it? Its a proprietary file format and is very very fast indeed.

Even an out-of-the-box recordset source wouldn't be able to pick up its metadata from anywhere because that metadata doesn't exist until runtime.

If there's a way around that problem then i don't know about it.

-Jamie

|||

The problem I am currently struggling with is the vast amount of data unnecessarily written to disk. Let's keep the example with 500 MB in the recordset. If I had to write it the disk were the bottleneck for several seconds. Therafter some logi occurs (prior to my next dataflow task). And when I then read the data from the raw file I am again waisting time by reading data unnecessarily written to disk (and blocking every other disk-i/o on the machine).

I don't mind the data being held in RAM as I want it there for further processing later on anyway.

And a design time component where I can define my RecordSet's colums using drag-drop were a great improvement. Think of a "write schema to file" option of the RecordSet destination and a "read schema from file" option in the to be component RecordSet source...

With best regards,

Artus

|||

What is the scenario you are trying to enable, Artus? The first post in this thread was about componentization of data flows - we will be addressing that in the future, probably through other mechanisms. In your case you want to persist the results of a data flow in memory and then have another data flow pick up that data? Why? You really want to persist 500MB of data in RAM?

Thanks

Donald

No comments:

Post a Comment