Home SQLServer2016
Category:

SQLServer2016

This is the second part of SSIS Package Parts blog post, I would recommend you to read other part first!

In the previous post we saw basics of SSIS Control Flow Package Parts, now let’s look a bit more in detail:

Package Part in Code Mode:

In design mode, an empty package part looks like regular SSIS package apart from Package Part has only Control flow and data flow tab. When we compare the source code between a new SSIS package(.dtsx) and a new control flow package part (.dtsxp), both looks pretty similar with obvious package type, refId and CreationName as different. So using BIML to create package parts won’t be much different compared to SSIS Packages.image

image

One of the first use cases that came into my mind when I first saw control flow template was to have the extensibility of reusing the code. Whenever I create an SSIS package to populate a Star Schema, I usually use a separate framework like many others; I try to capture a lot of metadata related to SSIS packages and execution. So I can monitor how many rows got inserted, updated or some other information related to executions and performances. However, one of the tedious thing with SSIS is minimal support with code reusability. Whenever you copy an SSIS Task to a new package or a new solution it loses all the existing connections and sometimes parsing too, you need to remap the connections again, but SSIS Control Flow Package Part directly addresses this issue.

Control flow Package Part for SSIS Audit:

So I want to create a package part to monitor SSIS Audit Begin and Audit End events for each execution. For this purpose, I want to use some system variables and user variables. Once I created the first variable in Control Flow Part, the first thing I noticed was all the variables were task scoped.

image

image

I did not understand the purpose of not having the flexibility of changing the variables scope in SSIS Control Flow Package Part Designer. Next thing, I tried was adding another task by assuming I can choose which task to execute during the execution based on the task enable/disable property. Immediately I got an error saying “Control flow package part need to contain exactly one executable.” This error explains why variables were task scoped only. As only one executable allowed in package parts, having task scoped variables will not lead to any confusion when the same name variables already exist in the SSIS Packages.

image

Well for my purpose I need two Execute SQL Tasks, one at the begin and one at the End. I thought I can just add two Execute SQL Tasks but by looking at the limitations the only way I can achieve this is by creating two different package parts. Hence, I created two SSIS Package Parts. Then I was curious to see why entire SSIS toolbox available in the package part designer when you can not add more than one executable? E.g., Containers. Later I placed a Sequence Container, and surprisingly I can add more than one Execute SQL tasks, and it executes perfectly fine. So if you want to use more than one task in a package part then use Sequence Container. It nicely differentiates each executable in Package Part configuration.

Sequencecontainer 2SequenceContainer1

I thought I will anyway create two separate Execute SQL Tasks because package part clearly defines what it meant to do. Both Package Parts I created has task scoped variables and SQL Statement Source as an expression.

2016-06-14_16-24-52.jpg

I can see my two SSIS Package Parts in the SSIS Tool Box. I can just drag and drop these two SSIS Package Parts into Control Flow Template I can quickly identify SSIS Package part by looking at little P icon on the top right, and also I can see it uses the expression with little fx icon on top left. Initially, I just want to see how my package works with new package parts.

And basic execution worked perfectly fine. I needed another SSIS Audit End package part with failed Execution result, as I can use package parts inside SSIS packages as many times as I want and I was using user variables for execution result value, so I just changed the User Variable Execution Result value to get the desired result. It is better to use variables whenever possible to make Package parts more dynamic.

image_thumb-31

 Next the tricky bit; how I can pass the Row count? I have package scoped variables which hold the row count. The only way I can think of doing this was manually changing the scope of the variables. ( I am so glad it was not restricted 🙂 )

image

Once I manually changed the value, I was able to use these two variables in the Data Flow Task. And the job was done. My package worked just like the way I want.

If I want I can change the scope of the variables of all control flow parts, if same variable value already exists in the package SSIS automatically renames the newly moved variable or we can just delete the control flow part variables which we already have under Package scope

image

Well, it works, with few workarounds, I did manage to achieve what I want.  But there were few hiccups while I was developing this small solution and imagine creating a robust solution.

Blank Screen:

The first and most annoying thing were the blank screens. I don’t know it was my laptop or installation, but a blank screen with no error information, Really !?!?

image

Well, I first got this error when I opened a package part which used as a reference in another SSIS Project. Not only that there were several other occasions, I saw this blank screen, and I really hope it was my PC issue than SSIS. Every time I got this blank screen error, I deleted the package part from solution and recreated.

Package Parts Execution:

Another unbelievable thing was there is no execution or debugging option for package Parts. Really? We can see brake points option available in SSIS parts. However, there is no use. There is no way I can execute a package part with in Package Part designer, the only way I can execute by placing it in an SSIS package.

image

I am sure everyone want to test their package part before they put it in the SSIS Package, there are so many other scenarios where you need to execute your task or executable before you use it somewhere…

Reusability:

As I mentioned earlier, the primary use case of package parts is code reusability. But if there is no track of where it has been used then it is a difficult job to control changes. If I make changes to package part of a solution, and the package which uses this package part was still open, then I get a notification about reference, apart from that there was no other way I can see where the package part has been used.

PPsave.jpg

When I add an existing package part as a reference to the project and make changes accidently. imageThe other packages which are using the package part will get the blank screen (At least on my PC!). When the package part added as a reference and if you change or rename it, another package where it has been used or created will have errors. But you can modify the name once it is in SSIS package, should not change the file name!

Expressions:

Package part does allow expressions, and you can see package part using expressions, but these expressions are allowed at an executable level only. For, e.g., if you have a sequence container with two SQL Tasks in package part. You can see expressions used at sequence container level but not the expression used at Execute SQL Task level however if your package part has only one executable then you can see all underlined expressions (It is better not to change expressions when using package part as a reference or added as a reference to the project)

2016-06-26_00-03-15.jpg

Package part with Sequence Container and multiple Execute SQL tasks with Expressions

2016-06-26_00-06-20.jpg

Package Part with single Execute SQL Task using Expressions

Connection Strings:

Package Part brings all connection string along with it. These connection strings are not visible in connection managers window, but you can see them under any other tasks connection managers or in configuring/ execution window of SSISDB Catalog. All package parts connections were prefixed by Package Part. To avoid confusion, it is better to name all packages with relevant name

2016-06-26_00-13-392016-06-26_00-51-42

Visual Studio Crashes:

Visual Studio crashes and when I reopened the project I lost all my references with package part. None of my package parts has P symbol,  after crash all package parts look and behave just like regular tasks. In a way, it is nice because my package still works but I lost all the references, though. (If package documentation says, to edit package part go to x location and make changes. That wouldn’t work! )

Some points:

  1. No option to change the connection string

    image

  2. The only warning you get is

    image

  3. We can not use a package part inside a package part.
  4. The deployment will not deploy Package Parts to SSISDB.
  5. No debugging options were available for package parts in the package part designer.
  6. Each Package part should contain exactly one executable – We can not create a control flow package part with two executable, SSIS will not allow saving the package part. But we can create a package part with no executable, but SSIS will not enable you to place it in Control Flow. In both scenarios, it raises same error.

Summary:

Do I like it? Don’t know. Will I use it? Depends.
It can be a great addition to any SSIS package but version 1. Needs a lot of improvements. If you are thinking of using it, I believe that you need to have strict standards in place. I would like to create a solution with all the package parts and make sure to put them under source control and have strict rules related to change control and annotations will probably make it easy.  Well it’s just my opinion after spending couple of days with Control Flow Package Parts, would love to hear what you think.

Next blog post in the series will be Logging…

Till next time,

Prathy 🙂

7 comments
0 FacebookTwitterPinterestEmail

This blog post has become longer than I thought, so decided to post as two parts. First part covers introduction and second is bit more in detail – hope it makes it easy to read!

Control flow Package Parts, one of the SSIS 2016 features which excited me most when Matt Masson did a session about imagewhat’s new in SSIS and MDS at Pass Summit 2016.Whenever there is something new, it is quite common to get positive and negative comments. But it seems like SSIS Control Flow Templates or the new name SSIS Control Flow Package Parts ( Does it mean, we can expect other types of package parts in the future?) received more negative comments than positive. I have used Control Flow template briefly with my previous blog post example; however, this blog post gives me a chance to look in detail to learn and understand what are missing from my expectations.

Creating an SSIS Control Flow Package Part:

For me the only thing which is straight forward with Package Parts is creating a Package Part. Creating a new package part is very easy. You can simply go to Package Parts section under the project where you wanted to create one, right-click and when you click on the new control flow Package Part, it creates a new .dtsxp file. You can not only create one but also can import existing one. Once it is created it looks just exactly same as SSIS package with same SSIS Tool Box.

image

Package Part in Solution Explorer

image

Creating a New Control Flow Package Part

Adding an Existing Control Flow Package Part:

You can add existing package part two ways: as a reference or without reference

image

When you add as a reference, package part still reference to the original location of the file, and when you add it as a non-reference, it copy’s the file to the new project location. We can see the difference by looking at the source file path or icon of the package part; it is a tiny icon, though.

image

Obviously, when there is a reference, then all the underlined changes will get affected. Which option to choose is entirely based on the use case? I will explain bit more about this in the next blog post, because it has it’s own pro’s and con’s.

 Using Package Part:

Once the Package Part created or added, you can see them in SSIS Tool Box. We can simply drag and drop them into Control Flow as any other SSIS Control Flow Tasks.

Configuring Control Flow Package Part:

In the configuration window of package part, you can see two tabs one to configure connection manager and one for properties. If you want to change the value of a property then, you can click on the property and change the property path. If the underline package part using any expressions, it brings back those expressions too, it is just same as copying and pasting existing task, but here it brings the connection with itself also better GUI to change the properties. Package part allows only one executable, but if you want to add more than one executable, then you can use Sequence Container. It is useful in certain cases, like when you don’t want to create multiple package parts; by using a sequence container you can add more than one executable and use enable/disable property of executable for execution.

image

Package Part Properties configurations window

image

Package Part Connection Managers configurations window

image

Package Part configurations for Sequence Container with multiple executables

That’s all you can just drag and drop this pre-created SSIS Control Flow Package Parts in any SSIS package you want. If require, you can configure the properties of package part. Easy 🙂

But unfortunately it is not as straight forward. Variables doesn’t work the way you expected, you get unexpected blank screen as an error and many more. I will address all these in the next blog post…

Till Then,

Prathy

2 comments
0 FacebookTwitterPinterestEmail

SSIS has always been my favourite among all MSBI tools. I think it is ETL which interests me than specific SSIS. As an SSIS fan girl, SQL Server 2012 release was my favourite. It was a huge release with numerous improvements (Don’t forget rounded corners ). If we compare SQL Server 2012 and 2016 releases, there are not as many exciting enhancements in the release, but some noteworthy enhancements are there. Hence, this blog post is to look into new features of SSIS 2016 in detail, so I will get a chance to explore them. I am going to publish this post as series so it will be convenient to read and write Smile

Package Deployment:

One of the most complained or requested feature after SSIS 2012 release was package deployment. I am not a huge fan of this, but I can see why people were missing this feature. I have seen people debating about SSIS, saying “SSIS is same as any other development so you deploy entire solution at a time, not individual packages”. Probably Microsoft also considered SSIS development as any other development and introduced project deployment in SSIS 2012. I totally agree with the argument, and I would like to do that way, but most of SSIS developers deploy individual packages…

Well, it’s nice to see package deployment back in SSIS 2016. So what happens when you decided to deploy individual packages?

We all know how to do project deployment, new incremental package deployment in SSIS 2016 is also same as project deployment, but you can right click on the individual package or multiple selected packages and choose deploy.

image

All the rest of the steps are same as the previous deployment. Though, if you want to see what packages are selected or if you changed your mind about deployment after you started deployment wizard; you can click on the previous button which gives us a new window in deployment wizard where we can choose the packages you want to deploy. One thing I noticed here is you can deselect, but you are not allowed to select new one’s, well you can select new one’s but wizard will not allow you to go to the next step (Next button was greyed out)

Usually, SSIS does project deployment from .ispac ( Integration Services Project Deployment File) file. When you choose package deployment instead of using .ispac file, it selects the packages which you wanted to deploy. You can see in the following screenshot only selected packages were loaded into deployment wizard. Also instead of choosing .ispac location deployment wizard is using Packages folder path. If you change deployment model to project deployment, it will ask for .ispac file path

image

Package Deployment

2016-06-06_01-16-07

Project Deployment

When we look at deployed packages in SSISDB, surprisingly we can see all packages are deployed irrespective of incremental deployment.

SELECT [package_id]
      ,[project_version_lsn]
      ,[name]
      ,[package_guid]
      ,[description]
      ,[version_build]
      ,[version_comments]
  FROM [SSISDB].[internal].[packages]

In the following screenshot, we can see, yellow highlighted is the deployment where ChildPackage2 and Master Package selected and in the green highlighted one only master package has been chosen to deploy.When we look at version_build column value, we get the assurance that it is indeed incremental package deployment. All packages have a row in [SSISDB].[internal].[packages] because of [project_version_lsn] which is the column used to related to project table. So all the relationships between different tables in SSISDB will work smoothly.

image

[SSISDB].[internal].[packages]

Is new package deployment model addressing existing issues?

  1. We still can not reference packages from different project or packages from SSISDB using Execute Package Task. Before SSIS 2012 we were able to reference to SSIS Package Store and execute packages from different solution or deployed solutions. There are various workarounds to achieve this; I prefer using Execute SQL task to call Stored Procedure to execute a deployed package but easy reference in EPT is something still missing for me.
    image

    (EPT) Execute Package Task

    2. Parameter Bindings in Execute Package Task:

    Execute Package Task still does not support variables; If you want to send parent package variable value to a child package variable, you first need to create a parameter in the child package and then use that parameter value as an expression in the variable where you wanted to use the value from the parent package.
    No biggie but most of the tasks still does not support parameters. e.g. We can not use SQL Command from Parameter in OLEDB Source, or ExcuteSQLTask result set can not be a parameter or For each Loop variable mapping does not support parameters.

image

Wrap-Up:

As I mentioned earlier, It is nice to see incremental package deployment. It’s great that Microsft listened to user’s voice and implemented this feature.  I don’t know what’s going on behind the screens, but when Packages deployed individually to a server, SSISDB nicely references the previously deployed packages. Probably just checking the version build of all packages and where it is same it just copy the existing packages metadata. But if they managed package deployment, I don’t think it will be difficult for Microsoft to develop SSISDB reference under Execute Package Task, maybe in the next release …

Next blog post in this series will be my favourite feature, Control Flow Templates.

Till next time,

Prathy 🙂

0 comment
0 FacebookTwitterPinterestEmail

After years of hesitation finally, I think I am ready to write my first blog post, or I should say I am inclined to publish a public blog post. I chose SSIS & R as the subject because I like SSIS and also since Microsoft released Power Query, I was always curious to extract Twitter data, but it was never too easy for me. Now SQL Server R Services made it quite easy.

One of my favorite feature among all new SQL Server 2016 features is integration with R. I don’t want to mislead saying I understand it very well, but I can proudly say I am trying to learn and trust me it is a good language to learn 🙂 . So note, you may find poorly written R script in this post. Also, Microsoft has decided to integrate R wherever feasible. Hence, there are different methods of using R, but I chose SSIS so I can demonstrate how we can use R in SSIS.

To achieve our purpose, first thing first, we need to enable R Services in our SQL Server 2016 instance. An excellent blog post by Henk Vadrevalk explains all about, how to install and configure SQL Server R Services http://henkvandervalk.com/introducing-microsoft-sql-server-2016-r-services . After SQL Server R Services enabled and ready run, you can execute the following script to find out all pre-installed packages and find R Services works or not.

EXECUTE  sp_execute_external_script
@language=N'R'
,@script = N'str(OutputDataSet);
packagematrix <- installed.packages();
NameOnly <- packagematrix[,1];
OutputDataSet <- as.data.frame(NameOnly);'
,@input_data_1 = N'SELECT 1 as col'
WITH RESULT SETS ((PackageName nvarchar(250) ))]

imageOnce we convinced that R services were working, then we are ready to proceed! The only challenge I faced was to figure out how to install new packages.I used RStudio to install packages and moved them across to SQL 2016 folder. Obviously, that was not the exact process, the blog post by Microsoft explains how to install additional packages: http://henkvandervalk.com/introducing-microsoft-sql-server-2016-r-services. As we saw in the above code, we are doing nothing but executing a stored procedure which has R Script in it. So, I’ll use Execute SQL Task in SSIS to run SQL Server R script to extract data by using one of the available open source R libraries, simple 🙂 . I will be using TwitterR Package/library, there are many other free libraries available, but I think it’s a good way to start; just because it is simple. To extract data using Twitter API first we need to create a Twitter App.

Twitter API:

As per Twitter documentation, there are four different objects that you’ll encounter in the API: Tweets, Users, Entities and Places, each object has several properties.For the sake of simplicity, I am looking at Tweets only. I just want to search Twitter based on a hashtag and extract a certain number of tweets just to show how it works.I created a twitter app called SQLServerRServices. Once the app created we can get: api_key, api_secret, access_token and access_token_secret which are required to access Twitter API. Next step is writing R Script.

R Script:

I have to admit writing R in the SQL Server Management Studio is not fun.I felt like it doesn’t have debugging capabilities which eventually made my troubleshooting a tough task. To overcome this, I used R-Studio to write R script until I was sure that my script was working. It was a lot easier compared to SSMS! Here is the script I used! (I replaced my API key’s with xxxx)

R Script using TwitterR:

/*Stored procedure to execute R Script in SQL Server 2016*/
execute sp_execute_external_script
@language = N'R'

/*R Script*/
, @script = N'

#Load the necessary packages
library(base64enc);
library(twitteR);
library(ROAuth);
library(bitops);
library(RCurl);
library(httr);

#OAUTH
oauth_endpoints("twitter");

#API keys from Twitter APP
api_key <- "xxxxxxxxxxxxxxx";
api_secret <- "xxxxxxxxxxxxxxxxxxx";
access_token <- "xxxxxxxxxxxxxxxxxxxxxxxx";
access_token_secret <- "xxxxxxxxxxxxxxxxxxxxxxxxxx";
setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret);

#searchTwitter - is TwitterR function to search Twitter
PowerBITweets = searchTwitter("PowerBI", n=10);

#Create a data frame with extracted data
df <- do.call("rbind", lapply(PowerBITweets , as.data.frame))

#Choose the selected columns
frame<-as.data.frame(df[,c(11,5,3,12,15,16)])
', @input_data_1 = N''

#Send created data frame as output result set
, @output_data_1_name = N'frame'
with result sets (([Handle] NVARCHAR(max),[DateTime] DATETIME,[FavCount] int,[RetweetCount] int,[Longitude] NVARCHAR(100),[Latitude] NVARCHAR(100) ));

Script in Detail:

We can execute R in SQL Server 2016 by using stored procedure called sp_execute_external_script. @Language represents the language we want to use in the script (It looks like we could probably able to run Python or any other languages shortly 🙂 ). Next step is to load all required packages.

After loading all the packages then it’s OAUTH handshake. With supplied API access tokens and key’s, we can authenticate Twitter using following bit of code.

oauth_endpoints("twitter");

#API keys from Twitter APP
api_key <- "xxxxxxxxxxxxxx";
api_secret <- "xxxxxxxxxxxxxx";
access_token <- "xxxxxxxxxxxxxx-xxxxxxxxxxxxxx";
access_token_secret <- "xxxxxxxxxxxxxx";
setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret);

Next step is to search Twitter by using SearchTwitter function. This feature works based on supplied string.There are several arguments we can supply; here I am providing two arguments: #PowerBI -Search String  and n – maximum number of tweets

SQLBitsTweets = searchTwitter(“#PowerBI”, n=10);

SearchTwitter function returns the list of status objects. In the next step, I am using three functions to convert returned list into a data frame.

#Create a data frame with extracted data df <- do.call(“rbind”, lapply(SQLBitsTweets , as.data.frame))

This data frame contains different data column, but I am interested in only a few columns, so I chose the columns I am interested in the next step.

#Choose the selected columns frame<-as.data.frame(df[,c(11,5,3,12,15,16)])

I am not supplying any input data to the query. Hence; my @Input_data_1 value is blank

‘, @input_data_1 = N”

For output, I want to return the selected data, so I gave the data frame name and in the result sets I given the column names and datatypes.

, @output_data_1_name = N’frame’ with result sets (([Handle] NVARCHAR(max),[DateTime] DATETIME,[FavCount] int,[RetweetCount] int,[Longitude] NVARCHAR(100),[Latitude] NVARCHAR(100) ));

It’s impossible to know all the column names and data types beforehand; that’s when RStudio comes handy. We can do this with SSMS too, but RStudio is easy

Identifying data types:

One of the beauties of RStudio is we can see the Environment values and data. In the following image on the right side, you can Data and Values. Under data df is the data frame created when executed the code: df <- do.call(“rbind”, lapply(PowerBITweets , as.data.frame))  image

When we click on the little table icon next to df, it opens the dataframe in a new window, where we can see the data and column names.

image

To identify the data types we can use class or type of to get the list of column names and respective data types. This worked for me for this example, but if you want to know more about it, there is a debate in the stack overflow. Beware it’s a quite old post but I found interesting: http://tinyurl.com/jj9c3e9

image

Finally, we are sure that RScript is working with Twitter API. Next step is to execute and schedule this extract using SSIS.

Creating SSIS Package:

I created an SSIS control flow template to retrieve Twitter data. This template has one Execute SQL Task with 4 task scoped variables:

TwitterHashTag – Search String

NoOfTweets – Number of tweets I want to obtain based on my search

Script – I want to make R script dynamic using  SSIS variables. Hence, I am using another variable called the script with Expression. I can use SQL statement as an expression in the SSIS Execute SQL Task but using variables makes it easy to document, and I can use BIDS Helper to create and modify variables.

imageimage

I am going to use this template in my SSIS package. Creating SSIS Control Flow Template is another blog post. There some excellent blog posts about control flow templates by Koen Verbeeck. Editing control flow template is easy, here I wanted to use the extracted data/ result set in the Data Flow Task as a source for that reason I changed the scope of the TwitterExtract Control flow template variable to Package level.

image

image

Next step is to prepare Data Flow Task. In the data flow task, I used Script Component as a Source.

image

There is no point of me repeating, how to use Sys Object variable as the source for the data flow task when Tim Mitchell explained it all.I created all columns in the result set as output columns in my source component. Again I used same data types; I used for R Script output parameter.

image

 

 

 

 

 

 

 

 

 

And the C# script I used in the Script Component Source

public override void CreateNewOutputRows()
{
OleDbDataAdapter oleDA = new OleDbDataAdapter();
DataTable dt = new DataTable();
oleDA.Fill(dt, Variables.RResultSet);

foreach (DataRow dr in dt.Rows)

{

ScriptOutputBuffer.AddRow();

ScriptOutputBuffer.Handle = (dr["Handle"].ToString());

ScriptOutputBuffer.DateTime = DateTime.Parse(dr["DateTime"].ToString());

ScriptOutputBuffer.FavCount = int.Parse(dr["FavCount"].ToString());

ScriptOutputBuffer.RetweetCount = int.Parse(dr["RetweetCount"].ToString());

ScriptOutputBuffer.Latitude = (dr["Latitude"].ToString());

ScriptOutputBuffer.Longitude = (dr["Longitude"].ToString());

}

}

Script component generates the output buffer which can be used to connect to OLEDB destination with SQL Server connection.That’s all; now we can schedule and see this data. We don’t have to use SSIS to create one simple SQL Server table, but I am using this as an example.

Here we go now we can see who is tweeting about PowerBI and who’s tweets got more retweeted and favorited. I have not done testing by going to each profile and see how many tweets were POWERBI tweets. My only testing is, I can see familiar Twitter handles which I regularly follow ;). For this example, I extracted only 1000 tweets, but we can extract tweets based on tweet time and schedule to get all tweets between the given time. Using some other TwitterR function, we can also get user profile and their profile images which can lead to an excellent interactive PowerBI report. I tried to publish to the web feature with my Wordpress blog, for some reason, Wordpress decided not to insert an iframe; until I figure out the workaround here is the URL where you can see my quick and dirty report – https://app.powerbi.com/view?r=eyJrIjoiYzg5YTUwN2EtYmMzZi00YzRlLTlkZTgtNzQzZjJiOTNiMjUyIiwidCI6ImExMjczNjNlLTcxNzAtNDg0Yi1iZDY4LTAyYjgwZjIzZDI4MiIsImMiOjh9

PowerBI File

Have fun,

Prathy

 

2 comments
5 FacebookTwitterPinterestEmail