Home Tags Posts tagged with "SSIS"

SSIS

This blog post has become longer than I thought, so decided to post as two parts. First part covers introduction and second is bit more in detail – hope it makes it easy to read!

Control flow Package Parts, one of the SSIS 2016 features which excited me most when Matt Masson did a session about imagewhat’s new in SSIS and MDS at Pass Summit 2016.Whenever there is something new, it is quite common to get positive and negative comments. But it seems like SSIS Control Flow Templates or the new name SSIS Control Flow Package Parts ( Does it mean, we can expect other types of package parts in the future?) received more negative comments than positive. I have used Control Flow template briefly with my previous blog post example; however, this blog post gives me a chance to look in detail to learn and understand what are missing from my expectations.

Creating an SSIS Control Flow Package Part:

For me the only thing which is straight forward with Package Parts is creating a Package Part. Creating a new package part is very easy. You can simply go to Package Parts section under the project where you wanted to create one, right-click and when you click on the new control flow Package Part, it creates a new .dtsxp file. You can not only create one but also can import existing one. Once it is created it looks just exactly same as SSIS package with same SSIS Tool Box.

image

Package Part in Solution Explorer

image

Creating a New Control Flow Package Part

Adding an Existing Control Flow Package Part:

You can add existing package part two ways: as a reference or without reference

image

When you add as a reference, package part still reference to the original location of the file, and when you add it as a non-reference, it copy’s the file to the new project location. We can see the difference by looking at the source file path or icon of the package part; it is a tiny icon, though.

image

Obviously, when there is a reference, then all the underlined changes will get affected. Which option to choose is entirely based on the use case? I will explain bit more about this in the next blog post, because it has it’s own pro’s and con’s.

 Using Package Part:

Once the Package Part created or added, you can see them in SSIS Tool Box. We can simply drag and drop them into Control Flow as any other SSIS Control Flow Tasks.

Configuring Control Flow Package Part:

In the configuration window of package part, you can see two tabs one to configure connection manager and one for properties. If you want to change the value of a property then, you can click on the property and change the property path. If the underline package part using any expressions, it brings back those expressions too, it is just same as copying and pasting existing task, but here it brings the connection with itself also better GUI to change the properties. Package part allows only one executable, but if you want to add more than one executable, then you can use Sequence Container. It is useful in certain cases, like when you don’t want to create multiple package parts; by using a sequence container you can add more than one executable and use enable/disable property of executable for execution.

image

Package Part Properties configurations window

image

Package Part Connection Managers configurations window

image

Package Part configurations for Sequence Container with multiple executables

That’s all you can just drag and drop this pre-created SSIS Control Flow Package Parts in any SSIS package you want. If require, you can configure the properties of package part. Easy 🙂

But unfortunately it is not as straight forward. Variables doesn’t work the way you expected, you get unexpected blank screen as an error and many more. I will address all these in the next blog post…

Till Then,

Prathy

2 comments
0 FacebookTwitterPinterestEmail

SSIS has always been my favourite among all MSBI tools. I think it is ETL which interests me than specific SSIS. As an SSIS fan girl, SQL Server 2012 release was my favourite. It was a huge release with numerous improvements (Don’t forget rounded corners ). If we compare SQL Server 2012 and 2016 releases, there are not as many exciting enhancements in the release, but some noteworthy enhancements are there. Hence, this blog post is to look into new features of SSIS 2016 in detail, so I will get a chance to explore them. I am going to publish this post as series so it will be convenient to read and write Smile

Package Deployment:

One of the most complained or requested feature after SSIS 2012 release was package deployment. I am not a huge fan of this, but I can see why people were missing this feature. I have seen people debating about SSIS, saying “SSIS is same as any other development so you deploy entire solution at a time, not individual packages”. Probably Microsoft also considered SSIS development as any other development and introduced project deployment in SSIS 2012. I totally agree with the argument, and I would like to do that way, but most of SSIS developers deploy individual packages…

Well, it’s nice to see package deployment back in SSIS 2016. So what happens when you decided to deploy individual packages?

We all know how to do project deployment, new incremental package deployment in SSIS 2016 is also same as project deployment, but you can right click on the individual package or multiple selected packages and choose deploy.

image

All the rest of the steps are same as the previous deployment. Though, if you want to see what packages are selected or if you changed your mind about deployment after you started deployment wizard; you can click on the previous button which gives us a new window in deployment wizard where we can choose the packages you want to deploy. One thing I noticed here is you can deselect, but you are not allowed to select new one’s, well you can select new one’s but wizard will not allow you to go to the next step (Next button was greyed out)

Usually, SSIS does project deployment from .ispac ( Integration Services Project Deployment File) file. When you choose package deployment instead of using .ispac file, it selects the packages which you wanted to deploy. You can see in the following screenshot only selected packages were loaded into deployment wizard. Also instead of choosing .ispac location deployment wizard is using Packages folder path. If you change deployment model to project deployment, it will ask for .ispac file path

image

Package Deployment

2016-06-06_01-16-07

Project Deployment

When we look at deployed packages in SSISDB, surprisingly we can see all packages are deployed irrespective of incremental deployment.

SELECT [package_id]
      ,[project_version_lsn]
      ,[name]
      ,[package_guid]
      ,[description]
      ,[version_build]
      ,[version_comments]
  FROM [SSISDB].[internal].[packages]

In the following screenshot, we can see, yellow highlighted is the deployment where ChildPackage2 and Master Package selected and in the green highlighted one only master package has been chosen to deploy.When we look at version_build column value, we get the assurance that it is indeed incremental package deployment. All packages have a row in [SSISDB].[internal].[packages] because of [project_version_lsn] which is the column used to related to project table. So all the relationships between different tables in SSISDB will work smoothly.

image

[SSISDB].[internal].[packages]

Is new package deployment model addressing existing issues?

  1. We still can not reference packages from different project or packages from SSISDB using Execute Package Task. Before SSIS 2012 we were able to reference to SSIS Package Store and execute packages from different solution or deployed solutions. There are various workarounds to achieve this; I prefer using Execute SQL task to call Stored Procedure to execute a deployed package but easy reference in EPT is something still missing for me.
    image

    (EPT) Execute Package Task

    2. Parameter Bindings in Execute Package Task:

    Execute Package Task still does not support variables; If you want to send parent package variable value to a child package variable, you first need to create a parameter in the child package and then use that parameter value as an expression in the variable where you wanted to use the value from the parent package.
    No biggie but most of the tasks still does not support parameters. e.g. We can not use SQL Command from Parameter in OLEDB Source, or ExcuteSQLTask result set can not be a parameter or For each Loop variable mapping does not support parameters.

image

Wrap-Up:

As I mentioned earlier, It is nice to see incremental package deployment. It’s great that Microsft listened to user’s voice and implemented this feature.  I don’t know what’s going on behind the screens, but when Packages deployed individually to a server, SSISDB nicely references the previously deployed packages. Probably just checking the version build of all packages and where it is same it just copy the existing packages metadata. But if they managed package deployment, I don’t think it will be difficult for Microsoft to develop SSISDB reference under Execute Package Task, maybe in the next release …

Next blog post in this series will be my favourite feature, Control Flow Templates.

Till next time,

Prathy 🙂

0 comment
0 FacebookTwitterPinterestEmail

While I am in the flow of blogging, I want to write about an issue which troubled me recently. It definitely deserves a blog post.So that I will know where to look when I face the similar problem again. Recently I started working with Teradata, obviously being a Microsoft person, I have not enjoyed it much but luckily my job was majorly involved in writing select query than any other complicated stuff.

However, I came across a scenario where I need to bring few thousands of rows to SQL Server, which involves complex operations with millions of row’s of data on Teradata and then matching/joining/lookups with different data sources using SSIS. It was a case where Teradata Volatile tables come handy as I want to do all complex joins and operations on Teradata box then extract the only small set of data which requires further transformations and lookups. Well, I thought I can easily do this with ADO.NET connection and variables. Unfortunately, it was not that easy with my limited C# skills which ultimately led to this blog post.

So in this post, I am going to cover two topics: How to use Teradata Volatile tables and using SSIS objects variable as a source with managed ADO.NET connections.

Teradata and SSIS:

Attunity Teradata components are popular to extract data from Teradata databases. I read, this component performs better than using ADO.NET source component. I have not made any comparisons, however as per Microsoft Document, “Attunity Teradata Components uses the Teradata Parallel Transporter API and internal buffering with Teradata PT operators directly rather than requiring access through a managed interfaces”. Paper also mentioned in the conclusion saying that “The Microsoft Connector for Teradata by Attunity provides a high-performance means of loading and unloading data to and from Teradata databases”. With my scenario, I want to work with Volatile tables, so I cannot use Attunity Teradata components.Hence, I gone for ADO.Net Connection Type. The very first thing I need to do while I am working with Teradata is setting Query Banding.

Setting The Query Band:

I used Execute SQL Task to set the Query Band with ADO.NET connection type and System DSN. I want to use this query band for all the queries under this session that’s why I updated the RetainSameConnection property to TRUE.

22-03-16 16-09-24

Creating Volatile table:

In the next step, I created a volatile table using another Execute SQL Task. If you want, you can put all the statements under one Execute SQL Task, but I prefer to put them individually for readability. Hola we are done with creating our first Volatile table using SSIS, and we can use this table in any of our further operations…I want to select data from the volatile table I just created and use the selected data as a source in my data flow task for further data transformations or anything related to Data Flow Task.

22-03-16 16-02-54

I selected data using Execute SQL Task and returned the full result set to use as a source in the Data Flow Task. Pretty much same as my previous post which was based on Tim Mitchell blog post. Besides, here comes the problem. When I used the same script as I used in my last blog post, I was getting following error.

2016-03-20_18-09-50

ADO.Net Connection:

I thought I must have typed something wrong, so went back to my script checked few times, changed few things with my minimal C# knowledge but no success. I tried same example with SQL Server and  OLEDEB Connection, which worked fine. I thought something wrong with Teradata (Probably SSIS Execute SQL task was not supporting Teradata). Apparently when I looked at the error in detail, it clearly says Object is not an ADODB recordset.  With a bit of googling, I found that error is common in the .Net world also found an interesting blog post by Valentino Vranken explaining SSIS object package variable and how it depends on the connection type. First time I realised executes SQL Task result set depends on the connection type. As Valentino blog post explains in debug mode, I can clearly see the value type property of two system object variable is different.

                                image2016-03-21_23-06-37

ADO.net provides System.Data.Dataset and OLEDB provide System._ComObject. Basically, with OLEDB, I used Fill Method to extract data from the variable but with ADO.Net, I was getting error. Surprisingly I can use Fill method in the Control Flow with Script Task but not in the Data Flow Task with Script Component. When I use Fill in the Script Component as Source, I was getting following error

‘DataTable’ does not contain a definition for ‘Fill’ and no extension method ‘Fill’ accepting a first argument of type ‘DataTable’ could be found (are you missing a using directive or an assembly reference?)

But it works fine in the Control flow task with Script Task. Frankly, I do not know why but I overcame this situation by creating a Dataset. Here is the C# code I used with two connection types.

OLEDB Connection:

using System;

using System.Data;

using System.Data.OleDb;

using System.Data.SqlClient;

public override void CreateNewOutputRows()
    {
        OleDbDataAdapter oleDA = new OleDbDataAdapter();
        DataTable dt = new DataTable();
        oleDA.Fill(dt, Variables.OLEDB);
        foreach (DataRow dr in dt.Rows)

       {

           OLEDBBuffer.AddRow();
            OLEDBBuffer.Firstname = (dr["Firstname"].ToString());
            OLEDBBuffer.Lastname = (dr["Lastname"].ToString());
        }

   }

ADO.NET Connection:

using System;

using System.Data;

using System.Data.OleDb;

using System.Data.SqlClient;

  public override void CreateNewOutputRows()
     {
         OleDbDataAdapter oleDA = new OleDbDataAdapter();
         DataSet ds = (DataSet)this.Variables.ADO;
         DataTable dt = ds.Tables[0];
         foreach (DataRow dr in dt.Rows)

        {

            ADOBuffer.AddRow();
             ADOBuffer.Firstname = (dr["Firstname"].ToString());
             ADOBuffer.Lastname = (dr["Lastname"].ToString());
         }

    }
}

Finally it worked. This may not be the proper way of doing, but it worked with my scenario. ADO.Net connections usually don’t perform better compared to other connection types like OLEDB but with my scenario there was no impact on the performance. If someone know why Fill method works with Script Task but not with Script Component, please let me know 🙂

You can download SSIS Package from here: http://tinyurl.com/SSISTeradata  (For security reason I did not included Teradata part . But it has two connection types with SQL Server)

Enjoy,
Prathy

7 comments
2 FacebookTwitterPinterestEmail

After years of hesitation finally, I think I am ready to write my first blog post, or I should say I am inclined to publish a public blog post. I chose SSIS & R as the subject because I like SSIS and also since Microsoft released Power Query, I was always curious to extract Twitter data, but it was never too easy for me. Now SQL Server R Services made it quite easy.

One of my favorite feature among all new SQL Server 2016 features is integration with R. I don’t want to mislead saying I understand it very well, but I can proudly say I am trying to learn and trust me it is a good language to learn 🙂 . So note, you may find poorly written R script in this post. Also, Microsoft has decided to integrate R wherever feasible. Hence, there are different methods of using R, but I chose SSIS so I can demonstrate how we can use R in SSIS.

To achieve our purpose, first thing first, we need to enable R Services in our SQL Server 2016 instance. An excellent blog post by Henk Vadrevalk explains all about, how to install and configure SQL Server R Services http://henkvandervalk.com/introducing-microsoft-sql-server-2016-r-services . After SQL Server R Services enabled and ready run, you can execute the following script to find out all pre-installed packages and find R Services works or not.

EXECUTE  sp_execute_external_script
@language=N'R'
,@script = N'str(OutputDataSet);
packagematrix <- installed.packages();
NameOnly <- packagematrix[,1];
OutputDataSet <- as.data.frame(NameOnly);'
,@input_data_1 = N'SELECT 1 as col'
WITH RESULT SETS ((PackageName nvarchar(250) ))]

imageOnce we convinced that R services were working, then we are ready to proceed! The only challenge I faced was to figure out how to install new packages.I used RStudio to install packages and moved them across to SQL 2016 folder. Obviously, that was not the exact process, the blog post by Microsoft explains how to install additional packages: http://henkvandervalk.com/introducing-microsoft-sql-server-2016-r-services. As we saw in the above code, we are doing nothing but executing a stored procedure which has R Script in it. So, I’ll use Execute SQL Task in SSIS to run SQL Server R script to extract data by using one of the available open source R libraries, simple 🙂 . I will be using TwitterR Package/library, there are many other free libraries available, but I think it’s a good way to start; just because it is simple. To extract data using Twitter API first we need to create a Twitter App.

Twitter API:

As per Twitter documentation, there are four different objects that you’ll encounter in the API: Tweets, Users, Entities and Places, each object has several properties.For the sake of simplicity, I am looking at Tweets only. I just want to search Twitter based on a hashtag and extract a certain number of tweets just to show how it works.I created a twitter app called SQLServerRServices. Once the app created we can get: api_key, api_secret, access_token and access_token_secret which are required to access Twitter API. Next step is writing R Script.

R Script:

I have to admit writing R in the SQL Server Management Studio is not fun.I felt like it doesn’t have debugging capabilities which eventually made my troubleshooting a tough task. To overcome this, I used R-Studio to write R script until I was sure that my script was working. It was a lot easier compared to SSMS! Here is the script I used! (I replaced my API key’s with xxxx)

R Script using TwitterR:

/*Stored procedure to execute R Script in SQL Server 2016*/
execute sp_execute_external_script
@language = N'R'

/*R Script*/
, @script = N'

#Load the necessary packages
library(base64enc);
library(twitteR);
library(ROAuth);
library(bitops);
library(RCurl);
library(httr);

#OAUTH
oauth_endpoints("twitter");

#API keys from Twitter APP
api_key <- "xxxxxxxxxxxxxxx";
api_secret <- "xxxxxxxxxxxxxxxxxxx";
access_token <- "xxxxxxxxxxxxxxxxxxxxxxxx";
access_token_secret <- "xxxxxxxxxxxxxxxxxxxxxxxxxx";
setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret);

#searchTwitter - is TwitterR function to search Twitter
PowerBITweets = searchTwitter("PowerBI", n=10);

#Create a data frame with extracted data
df <- do.call("rbind", lapply(PowerBITweets , as.data.frame))

#Choose the selected columns
frame<-as.data.frame(df[,c(11,5,3,12,15,16)])
', @input_data_1 = N''

#Send created data frame as output result set
, @output_data_1_name = N'frame'
with result sets (([Handle] NVARCHAR(max),[DateTime] DATETIME,[FavCount] int,[RetweetCount] int,[Longitude] NVARCHAR(100),[Latitude] NVARCHAR(100) ));

Script in Detail:

We can execute R in SQL Server 2016 by using stored procedure called sp_execute_external_script. @Language represents the language we want to use in the script (It looks like we could probably able to run Python or any other languages shortly 🙂 ). Next step is to load all required packages.

After loading all the packages then it’s OAUTH handshake. With supplied API access tokens and key’s, we can authenticate Twitter using following bit of code.

oauth_endpoints("twitter");

#API keys from Twitter APP
api_key <- "xxxxxxxxxxxxxx";
api_secret <- "xxxxxxxxxxxxxx";
access_token <- "xxxxxxxxxxxxxx-xxxxxxxxxxxxxx";
access_token_secret <- "xxxxxxxxxxxxxx";
setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret);

Next step is to search Twitter by using SearchTwitter function. This feature works based on supplied string.There are several arguments we can supply; here I am providing two arguments: #PowerBI -Search String  and n – maximum number of tweets

SQLBitsTweets = searchTwitter(“#PowerBI”, n=10);

SearchTwitter function returns the list of status objects. In the next step, I am using three functions to convert returned list into a data frame.

#Create a data frame with extracted data df <- do.call(“rbind”, lapply(SQLBitsTweets , as.data.frame))

This data frame contains different data column, but I am interested in only a few columns, so I chose the columns I am interested in the next step.

#Choose the selected columns frame<-as.data.frame(df[,c(11,5,3,12,15,16)])

I am not supplying any input data to the query. Hence; my @Input_data_1 value is blank

‘, @input_data_1 = N”

For output, I want to return the selected data, so I gave the data frame name and in the result sets I given the column names and datatypes.

, @output_data_1_name = N’frame’ with result sets (([Handle] NVARCHAR(max),[DateTime] DATETIME,[FavCount] int,[RetweetCount] int,[Longitude] NVARCHAR(100),[Latitude] NVARCHAR(100) ));

It’s impossible to know all the column names and data types beforehand; that’s when RStudio comes handy. We can do this with SSMS too, but RStudio is easy

Identifying data types:

One of the beauties of RStudio is we can see the Environment values and data. In the following image on the right side, you can Data and Values. Under data df is the data frame created when executed the code: df <- do.call(“rbind”, lapply(PowerBITweets , as.data.frame))  image

When we click on the little table icon next to df, it opens the dataframe in a new window, where we can see the data and column names.

image

To identify the data types we can use class or type of to get the list of column names and respective data types. This worked for me for this example, but if you want to know more about it, there is a debate in the stack overflow. Beware it’s a quite old post but I found interesting: http://tinyurl.com/jj9c3e9

image

Finally, we are sure that RScript is working with Twitter API. Next step is to execute and schedule this extract using SSIS.

Creating SSIS Package:

I created an SSIS control flow template to retrieve Twitter data. This template has one Execute SQL Task with 4 task scoped variables:

TwitterHashTag – Search String

NoOfTweets – Number of tweets I want to obtain based on my search

Script – I want to make R script dynamic using  SSIS variables. Hence, I am using another variable called the script with Expression. I can use SQL statement as an expression in the SSIS Execute SQL Task but using variables makes it easy to document, and I can use BIDS Helper to create and modify variables.

imageimage

I am going to use this template in my SSIS package. Creating SSIS Control Flow Template is another blog post. There some excellent blog posts about control flow templates by Koen Verbeeck. Editing control flow template is easy, here I wanted to use the extracted data/ result set in the Data Flow Task as a source for that reason I changed the scope of the TwitterExtract Control flow template variable to Package level.

image

image

Next step is to prepare Data Flow Task. In the data flow task, I used Script Component as a Source.

image

There is no point of me repeating, how to use Sys Object variable as the source for the data flow task when Tim Mitchell explained it all.I created all columns in the result set as output columns in my source component. Again I used same data types; I used for R Script output parameter.

image

 

 

 

 

 

 

 

 

 

And the C# script I used in the Script Component Source

public override void CreateNewOutputRows()
{
OleDbDataAdapter oleDA = new OleDbDataAdapter();
DataTable dt = new DataTable();
oleDA.Fill(dt, Variables.RResultSet);

foreach (DataRow dr in dt.Rows)

{

ScriptOutputBuffer.AddRow();

ScriptOutputBuffer.Handle = (dr["Handle"].ToString());

ScriptOutputBuffer.DateTime = DateTime.Parse(dr["DateTime"].ToString());

ScriptOutputBuffer.FavCount = int.Parse(dr["FavCount"].ToString());

ScriptOutputBuffer.RetweetCount = int.Parse(dr["RetweetCount"].ToString());

ScriptOutputBuffer.Latitude = (dr["Latitude"].ToString());

ScriptOutputBuffer.Longitude = (dr["Longitude"].ToString());

}

}

Script component generates the output buffer which can be used to connect to OLEDB destination with SQL Server connection.That’s all; now we can schedule and see this data. We don’t have to use SSIS to create one simple SQL Server table, but I am using this as an example.

Here we go now we can see who is tweeting about PowerBI and who’s tweets got more retweeted and favorited. I have not done testing by going to each profile and see how many tweets were POWERBI tweets. My only testing is, I can see familiar Twitter handles which I regularly follow ;). For this example, I extracted only 1000 tweets, but we can extract tweets based on tweet time and schedule to get all tweets between the given time. Using some other TwitterR function, we can also get user profile and their profile images which can lead to an excellent interactive PowerBI report. I tried to publish to the web feature with my Wordpress blog, for some reason, Wordpress decided not to insert an iframe; until I figure out the workaround here is the URL where you can see my quick and dirty report – https://app.powerbi.com/view?r=eyJrIjoiYzg5YTUwN2EtYmMzZi00YzRlLTlkZTgtNzQzZjJiOTNiMjUyIiwidCI6ImExMjczNjNlLTcxNzAtNDg0Yi1iZDY4LTAyYjgwZjIzZDI4MiIsImMiOjh9

PowerBI File

Have fun,

Prathy

 

2 comments
5 FacebookTwitterPinterestEmail