London Power BI User Group

UK Power BI Summit – 2017

by Prathy Kamasani

written by Prathy Kamasani

UK Power BI summit was taken place on 16^th – 17^th of February at Hemel Hempstead, HERTS, UK. It was organised by Jen Stirrup. She has put a lot of efforts toward this event, Congratulations to Jen Stirrup for hosting such a great event!

I feel immensely privileged to be part of this conference as a helper and a speaker. For me, one of the best parts of this conference was it was so near to my home, literally 9 min 🙂

Overall we received great feedback about the entire conference, speakers, sponsors and lunch. I have to say attendees were fantastic. They were very engaged, which made roundtable conversations a huge success. I was immensely happy to see so many familiar faces from London Power BI User Group. Thanks to all Sponsors, speakers and attendees. Here are some pictures from the event ( I still regret not bringing my DSLR)

Till next time,

Prathy

21 February 2017 0 comment

Find level of Directories / Find String in Power BI

by Prathy Kamasani

written by Prathy Kamasani

Directories at different levels are critical for some organisations. One of the requirements I received recently was to import Flat Files from directories ( Hundreds of them ) at different levels. Apparently Power BI makes it easy to import files from a folder, however in my case, I needed to choose the files to import based on the folder level and many other conditions which are also related to the level of the folder and file names. Folder level was a key, so let’s see how to find folder level:

When we choose Folder as a data source in Power BI, it imports some metadata along with file content. For my requirement, I used Folder Path column to find the level. In the Folder Path, if I count the number of backslashes (), that will give me the level of the folder.

Instantaneously I started using List.FindText. First I converted Folder Path column to a list, then I used List.FindText to find the position of “\”, then List.Count function to find no of items in the list. It works but notably slow. What I wanted was a Text function than List Functions.

Even though the description was a bit misleading in the documentation, I decided to look into Text.PositionOf. Like many Power Query functions, this function also has optional arguments like Occurrence and comparer. When we use Text.PositionOf with no occurrence, it returns a number. But if we want all occurrences of a substring, then it returns a list.

For my scenario, I just needed no of occurrences, so I used List.Count to find no of items in the List. And this approach was significantly faster than using List.FindText.

At the end I used List.FindText to find the level, in my real example, I used a lot more custom filters to fulfil my requirement. Here are the formulas I used:
Using List.FindText – List.Count(List.FindText(Text.ToList([Folder Path] as text) as list, “\” as text) as list) as number
Using Text.PositionOf – List.Count(Text.PositionOf([Folder Path],”\”, Occurrence.All) ) as number

Summary: Hope this helps, one thing I learned working with Power Query is “Most of the functions has more functionality than I presume, and it often hidden under optional commands “

Prathy ♀️

16 February 2017 0 comment

SSIS

Validating Data using SSIS

by Prathy Kamasani

written by Prathy Kamasani

I cannot believe it took me almost three months to recover from the house move. At last, I feel like I am back to routine. Now I have my home office; I am quite proud of it, it involved much DIY work. Wish I am one of those YouTube stars, who show off their interior decorating skills 😉 I know none of this excites you, so let’s look at the actual blog post…

Validating Data using SSIS:

Recently I have got a requirement to validate flat files using SSIS. Then I realised, why I never thought of doing this before. It would have saved hours I wasted in emails or checking data to understand the problem. We face problems with dirty data very often, it could be because of invalid characters or lookup errors or something else. Based on the ETL process you identify this data problem at various stages. However, having a data validation task enables you to accept or reject the data at the very initial stage. I have to thank my client for giving me this task. Because it is something I have not done so far and I used script component (Which I do not use very often, so I def need to write this blog post before I forget it.)

The goal of our SSIS package is to find all the validation failures. And the validations we want to check are:

Only Yellow and Orange colour products are valid
Product Column is mandatory
Product Name must be capital
Amount is a mandatory field
Only two decimals values are allowed in the Amount field
No comma’s (,) allowed in the Amount field
No special characters in the Product column
No 0’s allowed in the Amount Filed
No NULL values in the Amount Field
No duplicated products with the same column

I used Script component as a transformation in the Data Flow task. This is where all the logic resides. In the script component, I added a new output called Validation with two new output columns called FailedColumn and ValidationDescription. I also included all the source columns.

In the script component under Input0_ProcessInputRow, I included my code. Mainly I was using If condition as per my validation, As I wanted to capture all validation errors, so I added validation error rows to Validation Buffer.

Sample Script:

string key1 = String.Concat(Row.Product, Row.Color);

if (key1.Equals(previousKeyValue))

{

ValidationBuffer.AddRow();

ValidationBuffer.ValidationDescription = "No duplicated products with the same colour and different amount";

ValidationBuffer.FailedColumn = "Product and Color";

/*Map existing Columns*/

ValidationBuffer.Amount = Row.Amount;

ValidationBuffer.Product = Row.Product;

ValidationBuffer.Color = Row.Color;

}

I populated ValidationDescription and Failed column with relevant information. This enables me to understand exactly which column failed and why. And my output looks like this.

Now I can easily see where and what the issues are. Also I can just redirect these rows to a table or send them in an email or just ignore these rows and load rest or abort the ETL process and notify the relevant person to correct the file.

Prathy

You can download the SSIS project here or the complete code of the Script Component for all above mentioned validations:

 
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
 
/*Added*/
using System.Reflection;
using System.Text;
using System.Globalization;
using System.Collections.Generic;
using System.Windows.Forms;
 
#endregion
 
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
        public override void PreExecute()
    {
        base.PreExecute();
        /*
         * Add your code here
         */
    }
 
       public override void PostExecute()
    {
        base.PostExecute();
        /*
         * Add your code here
         */
    }
    
    private static readonly char[] SpecialChars = "!@#$%^&*()".ToCharArray();
    private string previousKeyValue = string.Empty;
 
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
 
        string key1 = String.Concat(Row.Product, Row.Color);
        if (key1.Equals(previousKeyValue))
        {
 
            ValidationBuffer.AddRow();
            ValidationBuffer.ValidationDescription = "No duplicated products with the same colour and different amount";
            ValidationBuffer.FailedColumn = "Product and Color";
 
            /*Map existing Columns*/
            ValidationBuffer.Amount = Row.Amount;
            ValidationBuffer.Product = Row.Product;
            ValidationBuffer.Color = Row.Color;
        }
 
        previousKeyValue = key1;
 
        /*Two decimals values are allowed for Amount */
        string[] arr = Row.Amount.Split('.');
        if (arr.Length > 1)
        {
            if (arr[1].Length > 2)
            {
                ValidationBuffer.AddRow();
                ValidationBuffer.ValidationDescription = "Only two decimals values are allowed in the Amount field ";
                ValidationBuffer.FailedColumn = "Amount";
 
                /*Map existing Columns*/
                ValidationBuffer.Amount = Row.Amount;
                ValidationBuffer.Product = Row.Product;
                ValidationBuffer.Color = Row.Color;
            }
        }
        /*No special characters in the Product column */
 
 
        if(!String.IsNullOrEmpty(Row.Product))
    
        {
            int indexOf = Row.Product.IndexOfAny(SpecialChars);
            if (indexOf != -1)
            {
 
                ValidationBuffer.AddRow();
                ValidationBuffer.ValidationDescription = "No special characters in the Product column ";
                ValidationBuffer.FailedColumn = "Product";
 
                /*Map existing Columns*/
                ValidationBuffer.Amount = Row.Amount;
                ValidationBuffer.Product = Row.Product;
                ValidationBuffer.Color = Row.Color;
            }
          
        }
        if (string.Equals(Row.Amount, "0"))
        {
            ValidationBuffer.AddRow();
            ValidationBuffer.ValidationDescription = "O is not a valid value in the AMount filed ";
            ValidationBuffer.FailedColumn = "Amount";
 
            /*Map existing Columns*/
            ValidationBuffer.Amount = Row.Amount;
            ValidationBuffer.Product = Row.Product;
            ValidationBuffer.Color = Row.Color;
        }
        if (string.IsNullOrEmpty(Row.Amount))
        {
            ValidationBuffer.AddRow();
            ValidationBuffer.ValidationDescription = "NUll or Blank is not a valid value in the AMount filed ";
            ValidationBuffer.FailedColumn = "Amount";
 
            /*Map existing Columns*/
            ValidationBuffer.Amount = Row.Amount;
            ValidationBuffer.Product = Row.Product;
            ValidationBuffer.Color = Row.Color;
        }
        var AmountArr = Row.Amount.Contains(",");
        if (AmountArr == true)
        {
            ValidationBuffer.AddRow();
            ValidationBuffer.ValidationDescription = ", is not a valid character in the Amount filed ";
            ValidationBuffer.FailedColumn = "Amount";
 
            /*Map existing Columns*/
            ValidationBuffer.Amount = Row.Amount;
            ValidationBuffer.Product = Row.Product;
            ValidationBuffer.Color = Row.Color;
        }
 
        if (String.IsNullOrEmpty(Row.Product))
        {
            ValidationBuffer.AddRow();
            ValidationBuffer.ValidationDescription = "Product is a mandatory field";
            ValidationBuffer.FailedColumn = "Product";
 
            /*Map existing Columns*/
            ValidationBuffer.Amount = Row.Amount;
            ValidationBuffer.Product = Row.Product;
            ValidationBuffer.Color = Row.Color;
        }
        //Controling area
        if (!string.Equals(Row.Color, "Yellow") || !string.Equals(Row.Color, "Orange"))
        {
            ValidationBuffer.AddRow();
            ValidationBuffer.ValidationDescription = "Only Yellow and Orange colour products are valid";
            ValidationBuffer.FailedColumn = "Color";
 
            /*Map existing Columns*/
            ValidationBuffer.Amount = Row.Amount;
            ValidationBuffer.Product = Row.Product;
            ValidationBuffer.Color = Row.Color;
        }
        if (!String.IsNullOrEmpty(Row.Product))
        {
            int i = String.Compare(Row.Product, Row.Product.ToUpper());
            if (String.Compare(Row.Product, Row.Product.ToUpper()) != 0)
            {
                ValidationBuffer.AddRow();
                ValidationBuffer.ValidationDescription = "Product Name must be capital";
                ValidationBuffer.FailedColumn = "Product";
 
                /*Map existing Columns*/
                ValidationBuffer.Amount = Row.Amount;
                ValidationBuffer.Product = Row.Product;
                ValidationBuffer.Color = Row.Color;
            }
        }
               
 
 
    }
 
    public override void CreateNewOutputRows()
    {
       
 
    }
 
}

23 January 2017 5 comments

Power BI Speaking

Data story – Property search with Power BI

by Prathy Kamasani

written by Prathy Kamasani

Don’t wait for the opportunity CREATE it.

I was just trying to create an opportunity to work with Power BI; that was the initial plan. Later I created a session and I presented a version of this session at NTK Conference, Slovenia, next at SQL Supper and recently Channel 9 Techdays online. None of them was likewise, but the heart of the topic was same Power BI + Property Data.

I first looked at the price paid data a few years ago when Chris Webb presented a power query session in a London User group. Fascinating dataset and as usual Chris did the fantastic job showing nearly all cool things you can do with the dataset. I thought what else I could do with it.After few glasses of wine, I thought why the only price paid data why don’t I pull data from Zoopla. Then I added a bit of personal touch by adding schools and commuting information. I want to do a Power BI data story with Ofsted data and schools admission boundary (working mum syndrome 😉 )

That’s the story behind the session thought process then after I presented some people asked me to share the PBI Model, and some asked me how I extracted data from different data sources. So here comes the blog post which I promised long ago …

The data story:

Schools Data:

First I extracted Ofsted data. This database has all the information related to schools like number of pupils, when did the last time inspection happen, etc. It was a simple single file. After I had imported files into Power BI, I can see I had an extra row, and I just clicked on the first row as column step, twice, and it was sorted. This two click Power BI task will be tedious in any other ETL tool. M is amazing !

Price Paid Data:
Next data set was price paid data. All price paid data, and additional sale information freely available to download from the Land Registry Price Paid data, and it gets monthly updated. Again a simple CSV file.

NEXT comes the cool bit…

BING Maps API:

I need to confess; this idea was purely stolen from Chris Webb, M presentation from London Power BI user group. He showed how to use Google Maps to find the distance, and I used to BING Maps API. Again, it is very easy. You can get API key by going to BING Maps Developer site. I went for transit option to find the commute time. I used Power Bi WEB data source with following URL

http://dev.virtualearth.net/REST/V1/Routes/Transit?wp.0=”&FromPostCode&”&wp.1=WC2H8HR&timeType=Departure&dateTime=9:00:00PM&output=xml&key=YourAPIKeyGoesHere

I used M function to find the duration.I am going to use this function in my next dataset to find the distance from my preferable postcode.

M code:

let FindDuration = (FromPostCode as text)=> let Source = Xml.Tables(Web.Contents("http://dev.virtualearth.net/REST/V1/Routes/Transit?wp.0="&FromPostCode&"&wp.1=WC2H8HR&timeType=Departure&dateTime=9:00:00PM&output=xml&key=YourAPIKeyGoesHere")), #"Changed Type" = Table.TransformColumnTypes(Source,{{"Copyright", type text}, {"BrandLogoUri", type text}, {"StatusCode", Int64.Type}, {"StatusDescription", type text}, {"AuthenticationResultCode", type text}, {"TraceId", type text}}), ResourceSets = #"Changed Type"{0}[ResourceSets], ResourceSet = ResourceSets{0}[ResourceSet], #"Changed Type1" = Table.TransformColumnTypes(ResourceSet,{{"EstimatedTotal", Int64.Type}}), Resources = #"Changed Type1"{0}[Resources], Route = Resources{0}[Route], #"Changed Type2" = Table.TransformColumnTypes(Route,{{"Id", type text}, {"DistanceUnit", type text}, {"DurationUnit", type text}, {"TravelDistance", type number}, {"TravelDuration", Int64.Type}, {"TravelDurationTraffic", Int64.Type}, {"TrafficDataUsed", type text}, {"TrafficCongestion", type text}}), #"Expanded RouteLeg" = Table.ExpandTableColumn(#"Changed Type2", "RouteLeg", {"ActualEnd", "ActualStart", "Cost", "EndLocation", "EndTime", "ItineraryItem", "RouteRegion", "RouteSubLeg", "StartLocation", "StartTime", "TravelDistance", "TravelDuration"}, {"RouteLeg.ActualEnd", "RouteLeg.ActualStart", "RouteLeg.Cost", "RouteLeg.EndLocation", "RouteLeg.EndTime", "RouteLeg.ItineraryItem", "RouteLeg.RouteRegion", "RouteLeg.RouteSubLeg", "RouteLeg.StartLocation", "RouteLeg.StartTime", "RouteLeg.TravelDistance", "RouteLeg.TravelDuration"}), #"Removed Other Columns" = Table.SelectColumns(#"Expanded RouteLeg",{"TravelDistance", "TravelDuration"}), #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "DurationInHours", each ([TravelDuration]/60)/60), #"Inserted Rounding" = Table.AddColumn(#"Added Custom", "Inserted Rounding", each Number.Round([DurationInHours], 2), type number), #"Renamed Columns" = Table.RenameColumns(#"Inserted Rounding",{{"Inserted Rounding", "TransitDuration"}}) in #"Renamed Columns" in FindDuration

ZOOPLA API:

You can create Zoopla API account from Zoopla Developer API site, pretty simple API. With the power of Power BI, I quickly imported data using WEB data source from Power BI desktop. Here is the URL I used

http://api.zoopla.co.uk/api/v1/property_listings.xml?postcode=wd17&api_key=xxxxxxx&radius=30.0&maximum_price=800000&page_number=20&page_size=100

I have given different parameters; you can get all parameters list from here. Once I imported the data, I renamed and expanded all the required columns. I used FindDuration function from the above query to find the distance from each property

Date Table:

I created the date table by following blog post by Matt Masson. There are different ways to do it; I preferred it

I imported few other datasets like UK postcodes data for maps and to relate different data sets. Also recession data, I created a table with static values just to show the recession period in the pulse chart.

Finally, my data model looked like this

And I managed to do some cool dashboards with this data. You can have a look at the reports from https://goo.gl/TqPUoo or some screenshots below

www.GIFCreator.me_VSM4YQ.gif

Creating this model was super easy, finding all those data sources were tricky and editing all the properties of different visualisations was time-consuming, but rest was simple. I chose this topic to present to show how easy it is to work with data from various data sources and create stunning visualisations in a jiffy.

Let me know what you think?

Power BI Model – https://goo.gl/mkiGLh

Prathy??

28 October 2016 2 comments

Personal

Dimensional Modelling Workbook

by Prathy Kamasani

written by Prathy Kamasani

A few years ago when I was searching for a free dimensional modelling tool, through some forum; I saw this fantastic Dimensional Modelling tool. I thought I was the only one who doesn’t know about this workbook, but surprisingly many others are not familiar with this tool. It is not new; it’s been there for ages. It was something created as a resource than a tool, so buggy 🙁 Let me not over complicate this by calling a tool; it just a super simple Excel workbook with macros.

You can download the file from Kimball Group website. As Kimball Group explains, it has the annotated data model. With the above link, you can download a zipped file which has a blank model and a pre-populated model with the detailed explanation of each sheet. It is already very well documented. There are few changes I have made to file, so it works for me. Here I am just going to show, how I used it and how it could be so useful in our day to day BI development world and what are the changes I made (So I can remember my changes for next use 😉 )

As a BI Developer, our work progresses through different stages. Once we have requirements ready, next step is to create that perfect model, well it will never be perfect. Most of the times model changes a lot. But making sure we document the changes and make our fellow developers happy could be a difficult task. Not every project will have lush of a data modeller, developer gets involved in modelling many times, that’s when Kimball Modelling Spread Sheets comes handy

Kimball Modelling Tool:

In simple words, this workbook helps us to generate scripts to create the database for the data warehouse. We give all the properties of the database in the workbook and use a macro to generate the script based on the given properties. I am going to walk through by taking MDWToolkit_Datamodel_4.0_2008R2 Blank as an example. Let’s create a simple Product dimension and Sales Fact Table using this workbook.

First three sheets (Home, ReadMe and ChangeLog) in the file has credits, notes and some useful information about the database we are going to create. I prefer to delete ReadMe and Change Log sheets and create my Notes and Versions sheets.

Home sheet is where we give Database name and schema name for views. I created a copy of Blank dimension and Blank Fact which already have dummy rows populated. Each sheet has four categories.

1 – where we give the target table information

2 – Column Information

3- Target Table Columns information

4 – Souce Table columns Information

Workbook already includes notes. However, I would like to highlight few properties/options which I consider useful.

Table Display Name – It used for views, such as here my Table name DimProducts. Apparently Dim does not make sense to business users, so whatever I gave in Display name will become view name, and I am going to use views in my CUBE.

BizFilterLogic – Very useful when we have table level filter which we need to use while extracting data from your source in the ETL

GenerateScript – Y / N, Yes mean include this table in the script, and N mean do not

Column Display Name – Again same as Table display name used in the view creation script.

Column Display Folder – Display folder of the column, this is excellent for Folder option in SSAS CUBE. This feature could be used to exclude columns from auto generated view. N.B. In the pre-populated model, it has value Exclude From The CUBE, but the underlined code only prohibits columns which have value exclude. So either change the macro code or alter the value in the cell.

Column Description – Great way to explain what the value of each column means. It is very nice to have a very documented database so that it would not be a black box

ETL Rules – Again another convenient option, this enables ETL developers to understand what exactly needed for a particular column. I use this column more like, if there is a filter or anything you want to filter at the source I put them under BizFilter or Table level comments and transformations I wanted to do inside SSIS will go under ETL Rules.

All Source Fields – All these fields could be a bit overwhelming or can be very useful. It depends on the type of data warehouse you are creating. If it is a single source than yeah, this is very helpful, but if you have several sources, then I don’t think this can be very handy. However, I like source field name, because if we have power users who are already familiar with source column names but we renamed that column with friendly names, then you can use this column as a tool tip or just for documentation purpose.

That all, by clicking Generate script button from the home sheet, we can create entire database design script. If you look at database, table and column properties, we can see extended properties at all levels

The downloadable from Kimball site does not include extended properties for views and does not let you create the view when your table schema does not default to dbo. Description extended property do not get populated. I updated few bits of VB with my very little knowledge of VB Script. For example, adding extended properties for views which will enable us to sync descriptions using BIDS Helper. I think, so far in my life, I used VB only when I worked with this workbook, that’s why I need this blog post, so I don’t need to remember it.

Once we done with creating our original database using the workbook, you can import the database into SSDT. You may ask, why don’t we just create in SSDT? Of course, we can, but editing in Excel is a lot easier than editing in SSDT. We can use SSDT Schema compare functionality to merge changes quite easily. Now we have a very well-populated database

2016-10-26_01-44-20

Once all done we can quite easily create an interactive data dictionary using your favourite reporting tool.

2016-10-26_02-13-30

So if you are a consultant who works on several data warehouse projects, then this can be very helpful. You create your model one time and data dictionary, documentation, etc. For next project are all there, you just have to refresh your existing SSRS or power BI model for the data dictionary. You will know where to look when you are creating ETL or CUBE using change log sheet. You can even put more business logic in the spread sheet or whatever you want. This is better than creating several pages of Pdf document, which no one ever read. If you are good with VB, not like me; then you can extend it much more and look into some of these known issues…

Known Issues:

If you want to create a foreign key to a column, and if that table does not have default schema; then the script macro created will not work
If you chose the first column to exclude in the view, it would anyway include it, just because that’s how macro designed.

Workbook with all the changes I have made can be download from here – https://1drv.ms/x/s!Avm7gbgZtlMlpjENQrWZX-lpoEl9. Hope it is useful, let me know what you think?

Prathy ?

26 October 2016 2 comments

Personal

I am a Microsoft Data Platform MVP :)

by Prathy Kamasani

written by Prathy Kamasani

October 1st, 2016, 03:00 PM. I was in our car with my family on our way to Harveys and email notification pop up while I was doing something on my mobile. Instantly with a bundle of a laugh, I was yelling OMG OMG OMG; I could feel tears in my eyes, and my family looks at me like “what’s wrong with her ;)”

Fast Forward >>

untitled-4

Thank you everyone, who nominated and congratulated me. It is nice to see congratulations from individuals who I admire a lot. There are so many people I want to thank, not because I got the MVP award, it’s because I am here in this position doing what I love and feeling proud of myself for just because of their influence. Long ago, I remember telling my colleagues in a daily scrum that one day I will become an MVP, that time I don’t even know who gets the MVP award, but I was following some great MVP’s. When I compare myself with the MVP’s I admire, I feel like I don’t deserve this award.

But this award is an answer to so many situations I have been. First of all, the guilty of being away from my daughter. Parenting is an amazing thing; it doesn’t come with a manual; saying what to do and what not. But I am sure that there are so many parents who sometimes felt guilty about not fulfilling their wishes. There is no such thing as work-life balance, it is very unlikely and doing these extra community activities is much more challenging. I really appreciate every individual who put their efforts towards community sharing their knowledge different ways. It is great to see Microsoft recognising them. I am saying it is not easy, but at the same time, it is something I enjoyed greatly in the recent years. However, I thank my family especially my daughter for letting me do everything I do. While I was preparing for #TechDaysOnline, she was not allowing my husband to talk to me and asked him to cook so I can focus on my work. Thank you 🙂 I also want to thank my husband Ravi and Divya for everything.

Next, comes SQLFamily, I actually kind of like calling SQLFriends than SQLFamily. There are soooooo many people I want to thank. Many inspired me, and there are some who introduced me to speaking life, encouraged me and guided me. I am not gonna name each here, but if they read this, they will know. Thank you 🙂

And Thank you Microsoft. Getting this award is like a pat on the back. I come from a non-English speaking background. I did my schooling in my native language; I started learning English alphabets when I was ten yrs. When I came to the UK for my masters, I was too scared to talk to a Job Centre Plus person on the phone. Now to be in a position to speak in front of many people, I am very proud of myself. This award feels like a reward for all the effort. At this moment I feel Awesome, I thank you every single person who helped me to be here plus huge special thanks to Microsoft and lovely people at Microsoft. I am thrilled, honoured and humbled 🙂

Thank you!

If you have come this far, thank you for reading 🙂

Prathy

3 October 2016 2 comments

Power BI Quick Tip

How to add extra columns to an existing Power BI file which has CSV as data source

by Prathy Kamasani

written by Prathy Kamasani

Quick Power BI Tip:

Case: How to add extra / new columns to an existing Power BI file which has CSV as data source?

Solution:

When I was working with Power BI today, first I created a model with my CSV files, while I was doing the report; that is when I realised I need one more column from the data source. After I had received the new file, I thought I would just refresh the data source and power BI automatically refreshes my query. In fact, Power BI does automatically update my query but what I noticed was, it excluded the last column. When I looked in the advanced editor query looked like my example here

let
    Source = Csv.Document(File.Contents("\\Mac\Home\Desktop\Blog\2.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source)
in
    #"Promoted Headers"

2016-08-04_06-08-57

I can see, it is strictly bringing two columns only. So the tip is to simply get all the columns into the existing Power BI file, just change the number of columns to the new number of columns in the CSV file, which is 3 in my case.

Updated Query:

2016-08-04_06-21-21

let
    Source = Csv.Document(File.Contents("\\Mac\Home\Desktop\Blog\3.csv"),[Delimiter=",", Columns=3, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source)
in
    #"Promoted Headers"

2016-08-04_06-21-49

That’s all for now 🙂

Prathy

4 August 2016 2 comments

SQLServer2016 SSIS

MY THOUGHTS ON SSIS 2016 – SSIS PACKAGE PARTS-2

by Prathy Kamasani

written by Prathy Kamasani

This is the second part of SSIS Package Parts blog post, I would recommend you to read other part first!

In the previous post we saw basics of SSIS Control Flow Package Parts, now let’s look a bit more in detail:

Package Part in Code Mode:

In design mode, an empty package part looks like regular SSIS package apart from Package Part has only Control flow and data flow tab. When we compare the source code between a new SSIS package(.dtsx) and a new control flow package part (.dtsxp), both looks pretty similar with obvious package type, refId and CreationName as different. So using BIML to create package parts won’t be much different compared to SSIS Packages.

One of the first use cases that came into my mind when I first saw control flow template was to have the extensibility of reusing the code. Whenever I create an SSIS package to populate a Star Schema, I usually use a separate framework like many others; I try to capture a lot of metadata related to SSIS packages and execution. So I can monitor how many rows got inserted, updated or some other information related to executions and performances. However, one of the tedious thing with SSIS is minimal support with code reusability. Whenever you copy an SSIS Task to a new package or a new solution it loses all the existing connections and sometimes parsing too, you need to remap the connections again, but SSIS Control Flow Package Part directly addresses this issue.

Control flow Package Part for SSIS Audit:

So I want to create a package part to monitor SSIS Audit Begin and Audit End events for each execution. For this purpose, I want to use some system variables and user variables. Once I created the first variable in Control Flow Part, the first thing I noticed was all the variables were task scoped.

I did not understand the purpose of not having the flexibility of changing the variables scope in SSIS Control Flow Package Part Designer. Next thing, I tried was adding another task by assuming I can choose which task to execute during the execution based on the task enable/disable property. Immediately I got an error saying “Control flow package part need to contain exactly one executable.” This error explains why variables were task scoped only. As only one executable allowed in package parts, having task scoped variables will not lead to any confusion when the same name variables already exist in the SSIS Packages.

Well for my purpose I need two Execute SQL Tasks, one at the begin and one at the End. I thought I can just add two Execute SQL Tasks but by looking at the limitations the only way I can achieve this is by creating two different package parts. Hence, I created two SSIS Package Parts. Then I was curious to see why entire SSIS toolbox available in the package part designer when you can not add more than one executable? E.g., Containers. Later I placed a Sequence Container, and surprisingly I can add more than one Execute SQL tasks, and it executes perfectly fine. So if you want to use more than one task in a package part then use Sequence Container. It nicely differentiates each executable in Package Part configuration.

Sequencecontainer 2 SequenceContainer1

I thought I will anyway create two separate Execute SQL Tasks because package part clearly defines what it meant to do. Both Package Parts I created has task scoped variables and SQL Statement Source as an expression.

I can see my two SSIS Package Parts in the SSIS Tool Box. I can just drag and drop these two SSIS Package Parts into Control Flow Template I can quickly identify SSIS Package part by looking at little P icon on the top right, and also I can see it uses the expression with little fx icon on top left. Initially, I just want to see how my package works with new package parts.

And basic execution worked perfectly fine. I needed another SSIS Audit End package part with failed Execution result, as I can use package parts inside SSIS packages as many times as I want and I was using user variables for execution result value, so I just changed the User Variable Execution Result value to get the desired result. It is better to use variables whenever possible to make Package parts more dynamic.

image_thumb-31

Next the tricky bit; how I can pass the Row count? I have package scoped variables which hold the row count. The only way I can think of doing this was manually changing the scope of the variables. ( I am so glad it was not restricted 🙂 )

Once I manually changed the value, I was able to use these two variables in the Data Flow Task. And the job was done. My package worked just like the way I want.

If I want I can change the scope of the variables of all control flow parts, if same variable value already exists in the package SSIS automatically renames the newly moved variable or we can just delete the control flow part variables which we already have under Package scope

Well, it works, with few workarounds, I did manage to achieve what I want. But there were few hiccups while I was developing this small solution and imagine creating a robust solution.

Blank Screen:

The first and most annoying thing were the blank screens. I don’t know it was my laptop or installation, but a blank screen with no error information, Really !?!?

Well, I first got this error when I opened a package part which used as a reference in another SSIS Project. Not only that there were several other occasions, I saw this blank screen, and I really hope it was my PC issue than SSIS. Every time I got this blank screen error, I deleted the package part from solution and recreated.

Package Parts Execution:

Another unbelievable thing was there is no execution or debugging option for package Parts. Really? We can see brake points option available in SSIS parts. However, there is no use. There is no way I can execute a package part with in Package Part designer, the only way I can execute by placing it in an SSIS package.

I am sure everyone want to test their package part before they put it in the SSIS Package, there are so many other scenarios where you need to execute your task or executable before you use it somewhere…

Reusability:

As I mentioned earlier, the primary use case of package parts is code reusability. But if there is no track of where it has been used then it is a difficult job to control changes. If I make changes to package part of a solution, and the package which uses this package part was still open, then I get a notification about reference, apart from that there was no other way I can see where the package part has been used.

When I add an existing package part as a reference to the project and make changes accidently. The other packages which are using the package part will get the blank screen (At least on my PC!). When the package part added as a reference and if you change or rename it, another package where it has been used or created will have errors. But you can modify the name once it is in SSIS package, should not change the file name!

Expressions:

Package part does allow expressions, and you can see package part using expressions, but these expressions are allowed at an executable level only. For, e.g., if you have a sequence container with two SQL Tasks in package part. You can see expressions used at sequence container level but not the expression used at Execute SQL Task level however if your package part has only one executable then you can see all underlined expressions (It is better not to change expressions when using package part as a reference or added as a reference to the project)

Package part with Sequence Container and multiple Execute SQL tasks with Expressions

Package Part with single Execute SQL Task using Expressions

Connection Strings:

Package Part brings all connection string along with it. These connection strings are not visible in connection managers window, but you can see them under any other tasks connection managers or in configuring/ execution window of SSISDB Catalog. All package parts connections were prefixed by Package Part. To avoid confusion, it is better to name all packages with relevant name

2016-06-26_00-13-39 2016-06-26_00-51-42

Visual Studio Crashes:

Visual Studio crashes and when I reopened the project I lost all my references with package part. None of my package parts has P symbol, after crash all package parts look and behave just like regular tasks. In a way, it is nice because my package still works but I lost all the references, though. (If package documentation says, to edit package part go to x location and make changes. That wouldn’t work! )

Some points:

No option to change the connection string
The only warning you get is
We can not use a package part inside a package part.
The deployment will not deploy Package Parts to SSISDB.
No debugging options were available for package parts in the package part designer.
Each Package part should contain exactly one executable – We can not create a control flow package part with two executable, SSIS will not allow saving the package part. But we can create a package part with no executable, but SSIS will not enable you to place it in Control Flow. In both scenarios, it raises same error.

Summary:

Do I like it? Don’t know. Will I use it? Depends.
It can be a great addition to any SSIS package but version 1. Needs a lot of improvements. If you are thinking of using it, I believe that you need to have strict standards in place. I would like to create a solution with all the package parts and make sure to put them under source control and have strict rules related to change control and annotations will probably make it easy. Well it’s just my opinion after spending couple of days with Control Flow Package Parts, would love to hear what you think.

Next blog post in the series will be Logging…

Till next time,

Prathy 🙂

28 June 2016 7 comments

SQLServer2016 SSIS SSIS Package Parts

My Thoughts On SSIS 2016: SSIS Control Flow Package Parts -1

by Prathy Kamasani

written by Prathy Kamasani

This blog post has become longer than I thought, so decided to post as two parts. First part covers introduction and second is bit more in detail – hope it makes it easy to read!

Control flow Package Parts, one of the SSIS 2016 features which excited me most when Matt Masson did a session about what’s new in SSIS and MDS at Pass Summit 2016.Whenever there is something new, it is quite common to get positive and negative comments. But it seems like SSIS Control Flow Templates or the new name SSIS Control Flow Package Parts ( Does it mean, we can expect other types of package parts in the future?) received more negative comments than positive. I have used Control Flow template briefly with my previous blog post example; however, this blog post gives me a chance to look in detail to learn and understand what are missing from my expectations.

Creating an SSIS Control Flow Package Part:

For me the only thing which is straight forward with Package Parts is creating a Package Part. Creating a new package part is very easy. You can simply go to Package Parts section under the project where you wanted to create one, right-click and when you click on the new control flow Package Part, it creates a new .dtsxp file. You can not only create one but also can import existing one. Once it is created it looks just exactly same as SSIS package with same SSIS Tool Box.

Package Part in Solution Explorer

Creating a New Control Flow Package Part

Adding an Existing Control Flow Package Part:

You can add existing package part two ways: as a reference or without reference

When you add as a reference, package part still reference to the original location of the file, and when you add it as a non-reference, it copy’s the file to the new project location. We can see the difference by looking at the source file path or icon of the package part; it is a tiny icon, though.

Obviously, when there is a reference, then all the underlined changes will get affected. Which option to choose is entirely based on the use case? I will explain bit more about this in the next blog post, because it has it’s own pro’s and con’s.

Using Package Part:

Once the Package Part created or added, you can see them in SSIS Tool Box. We can simply drag and drop them into Control Flow as any other SSIS Control Flow Tasks.

Configuring Control Flow Package Part:

In the configuration window of package part, you can see two tabs one to configure connection manager and one for properties. If you want to change the value of a property then, you can click on the property and change the property path. If the underline package part using any expressions, it brings back those expressions too, it is just same as copying and pasting existing task, but here it brings the connection with itself also better GUI to change the properties. Package part allows only one executable, but if you want to add more than one executable, then you can use Sequence Container. It is useful in certain cases, like when you don’t want to create multiple package parts; by using a sequence container you can add more than one executable and use enable/disable property of executable for execution.

Package Part Properties configurations window

Package Part Connection Managers configurations window

Package Part configurations for Sequence Container with multiple executables

That’s all you can just drag and drop this pre-created SSIS Control Flow Package Parts in any SSIS package you want. If require, you can configure the properties of package part. Easy 🙂

But unfortunately it is not as straight forward. Variables doesn’t work the way you expected, you get unexpected blank screen as an error and many more. I will address all these in the next blog post…

Till Then,

Prathy

21 June 2016 2 comments

SQLServer2016 SSIS

My Thoughts On SSIS 2016

by Prathy Kamasani

written by Prathy Kamasani

SSIS has always been my favourite among all MSBI tools. I think it is ETL which interests me than specific SSIS. As an SSIS fan girl, SQL Server 2012 release was my favourite. It was a huge release with numerous improvements (Don’t forget rounded corners ). If we compare SQL Server 2012 and 2016 releases, there are not as many exciting enhancements in the release, but some noteworthy enhancements are there. Hence, this blog post is to look into new features of SSIS 2016 in detail, so I will get a chance to explore them. I am going to publish this post as series so it will be convenient to read and write Smile

Package Deployment:

One of the most complained or requested feature after SSIS 2012 release was package deployment. I am not a huge fan of this, but I can see why people were missing this feature. I have seen people debating about SSIS, saying “SSIS is same as any other development so you deploy entire solution at a time, not individual packages”. Probably Microsoft also considered SSIS development as any other development and introduced project deployment in SSIS 2012. I totally agree with the argument, and I would like to do that way, but most of SSIS developers deploy individual packages…

Well, it’s nice to see package deployment back in SSIS 2016. So what happens when you decided to deploy individual packages?

We all know how to do project deployment, new incremental package deployment in SSIS 2016 is also same as project deployment, but you can right click on the individual package or multiple selected packages and choose deploy.

All the rest of the steps are same as the previous deployment. Though, if you want to see what packages are selected or if you changed your mind about deployment after you started deployment wizard; you can click on the previous button which gives us a new window in deployment wizard where we can choose the packages you want to deploy. One thing I noticed here is you can deselect, but you are not allowed to select new one’s, well you can select new one’s but wizard will not allow you to go to the next step (Next button was greyed out)

Usually, SSIS does project deployment from .ispac ( Integration Services Project Deployment File) file. When you choose package deployment instead of using .ispac file, it selects the packages which you wanted to deploy. You can see in the following screenshot only selected packages were loaded into deployment wizard. Also instead of choosing .ispac location deployment wizard is using Packages folder path. If you change deployment model to project deployment, it will ask for .ispac file path

Package Deployment

Project Deployment

When we look at deployed packages in SSISDB, surprisingly we can see all packages are deployed irrespective of incremental deployment.

SELECT [package_id]
      ,[project_version_lsn]
      ,[name]
      ,[package_guid]
      ,[description]
      ,[version_build]
      ,[version_comments]
  FROM [SSISDB].[internal].[packages]

In the following screenshot, we can see, yellow highlighted is the deployment where ChildPackage2 and Master Package selected and in the green highlighted one only master package has been chosen to deploy.When we look at version_build column value, we get the assurance that it is indeed incremental package deployment. All packages have a row in [SSISDB].[internal].[packages] because of [project_version_lsn] which is the column used to related to project table. So all the relationships between different tables in SSISDB will work smoothly.

[SSISDB].[internal].[packages]

Is new package deployment model addressing existing issues?

We still can not reference packages from different project or packages from SSISDB using Execute Package Task. Before SSIS 2012 we were able to reference to SSIS Package Store and execute packages from different solution or deployed solutions. There are various workarounds to achieve this; I prefer using Execute SQL task to call Stored Procedure to execute a deployed package but easy reference in EPT is something still missing for me.
(EPT) Execute Package Task

2. Parameter Bindings in Execute Package Task:

Execute Package Task still does not support variables; If you want to send parent package variable value to a child package variable, you first need to create a parameter in the child package and then use that parameter value as an expression in the variable where you wanted to use the value from the parent package.
No biggie but most of the tasks still does not support parameters. e.g. We can not use SQL Command from Parameter in OLEDB Source, or ExcuteSQLTask result set can not be a parameter or For each Loop variable mapping does not support parameters.

Wrap-Up:

As I mentioned earlier, It is nice to see incremental package deployment. It’s great that Microsft listened to user’s voice and implemented this feature. I don’t know what’s going on behind the screens, but when Packages deployed individually to a server, SSISDB nicely references the previously deployed packages. Probably just checking the version build of all packages and where it is same it just copy the existing packages metadata. But if they managed package deployment, I don’t think it will be difficult for Microsoft to develop SSISDB reference under Execute Package Task, maybe in the next release …

Next blog post in this series will be my favourite feature, Control Flow Templates.

Till next time,

Prathy 🙂

6 June 2016 0 comment