Home Power BI
Category:

Power BI

For me, Stephen Few has always been the guru of Data visualisations. Recently, while I was reading one of his paper, I came across most talked Bullet Chart and Highlighting a particular value. Then the first thought that came to my mind was, Can I do it in Power BI?

Bullet Chart is not one of the built-in visual in Power BI Desktop, but it is available as a custom visual. In fact, we have two bullet charts available one by Microsoft, and one by SQL BI / Ok Viz and they both are excellent.

Highlight a Specific Data Point:

My idea was to create something similar to following report which was originally shared by Stephen Few at http://perceptualedge.com/articles/misc/WhyMostDashboardsFail.pdf .

As of now, with handy bullet charts and formatting options, highlighting is not something out of the box. However, by using a bit of DAX, I can do something similar.

For this example, I imported top grossing movies data; I just want to see which movie has the max difference when I compare between overseas and worldwide.  Whatever the observations I do on my dashboard, I want to analyse it with the difference measure. Hence that’s the key on my dashboard. So I would like to distinguish that value with others data items.

If you look at the following example, on the right, I can easily see which movie has the most difference. But on the left, having so many categories, it is a challenging task to identify the single category which has the max value.

So what have I done:

First I created a calculated column to calculate the difference

Difference = 'World Wide Gross'[Worldwide]-'World Wide Gross'[Overseas]

Another measure to calculate the MaxDifference of only data which has been selected

MaxDiff = CALCULATE(MAX([Difference]),ALLSELECTED('World Wide Gross'))

Then I created a flag, with a simple if statement to show a circle icon when the difference is equal to Max difference otherwise 0.

Flag = IF([MaxDiff]=sum([Difference]),"? ",0)
Followed by a new column to show the Circle icon, which has the max difference.
Title with Flag = IF([Flag]=0, [Title], CONCATENATE([Flag],[Title]))

I can hide all the measures and columns I don’t need and use the Title with Flag, where ever I want. With Lil bit of DAX, I created following reports.  Now I can easily find the Star Wars movie among many other movies. Thanks to Will Thomson, who showed me the trick of using Unicode characters in the measures.

Power BI Model Download
Till next time,

Prathy 🙂

4 comments
0 FacebookTwitterPinterestEmail

UK Power BI summit was taken place on 16th – 17th of February at Hemel Hempstead, HERTS, UK. It was organised by Jen Stirrup. She has put a lot of efforts toward this event, Congratulations to Jen Stirrup for hosting such a great event!

I feel immensely privileged to be part of this conference as a helper and a speaker. For me, one of the best parts of this conference was it was so near to my home, literally 9 min 🙂

Overall we received great feedback about the entire conference, speakers, sponsors and lunch. I have to say attendees were fantastic. They were very engaged, which made roundtable conversations a huge success. I was immensely happy to see so many familiar faces from London Power BI User Group. Thanks to all Sponsors, speakers and attendees. Here are some pictures from the event ( I still regret not bringing my DSLR)

Till next time,

Prathy

0 comment
0 FacebookTwitterPinterestEmail
Find level of Directories / Find String in Power BI

Directories at different levels are critical for some organisations. One of the requirements I received recently was to import Flat Files from directories ( Hundreds of them ) at different levels. Apparently Power BI makes it easy to import files from a folder, however in my case, I needed to choose the files to import based on the folder level and many other conditions which are also related to the level of the folder and file names.  Folder level was a key, so let’s see how to find folder level:

When we choose Folder as a data source in Power BI, it imports some metadata along with file content. For my requirement, I used Folder Path column to find the level. In the Folder Path, if I count the number of backslashes (), that will give me the level of the folder.

Instantaneously I started using List.FindText. First I converted Folder Path column to a list, then I used List.FindText to find the position of “\”, then List.Count function to find no of items in the list. It works but notably slow. What I wanted was a Text function than List Functions.

Even though the description was a bit misleading in the documentation, I decided to look into Text.PositionOf. Like many Power Query functions, this function also has optional arguments like Occurrence and comparer. When we use Text.PositionOf with no occurrence, it returns a number. But if we want all occurrences of a substring, then it returns a list.

For my scenario, I just needed no of occurrences, so I used List.Count to find no of items in the List. And this approach was significantly faster than using List.FindText.

At the end I used List.FindText to find the level, in my real example, I used a lot more custom filters to fulfil my requirement. Here are the formulas I used:
Using List.FindText – List.Count(List.FindText(Text.ToList([Folder Path] as text) as list, “\” as text) as list) as number
Using Text.PositionOf – 
List.Count(Text.PositionOf([Folder Path],”\”, Occurrence.All) ) as number

Summary: Hope this helps, one thing I learned working with Power Query is “Most of the functions has more functionality than I presume, and it often hidden under optional commands “

Prathy ♀️

0 comment
0 FacebookTwitterPinterestEmail

Don’t wait for the opportunity CREATE it.

I was just trying to create an opportunity to work with Power BI; that was the initial plan. Later I created a session and I presented a version of this session at NTK Conference, Slovenia, next at SQL Supper and recently Channel 9 Techdays online. None of them was likewise, but the heart of the topic was same Power BI + Property Data.

I first looked at the price paid data a few years ago when Chris Webb presented a power query session in a London User group. Fascinating dataset and as usual Chris did the fantastic job showing nearly all cool things you can do with the dataset. I thought what else I could do with it.After few glasses of wine, I thought why the only price paid data why don’t I pull data from Zoopla. Then I added a bit of personal touch by adding schools and commuting information. I want to do a Power BI data story with Ofsted data and schools admission boundary (working mum syndrome 😉 )

That’s the story behind the session thought process then after I presented some people asked me to share the PBI Model, and some asked me how I extracted data from different data sources. So here comes the blog post which I promised long ago …

The data story:

Schools Data:

First I extracted Ofsted data. This database has all the information related to schools like number of pupils, when did the last time inspection happen, etc. It was a simple single file. After I had imported files into Power BI, I can see I had an extra row, and I just clicked on the first row as column step, twice, and it was sorted. This two click Power BI task will be tedious in any other ETL tool. M is amazing !

Price Paid Data:
Next data set was price paid data. All price paid data, and additional sale information freely available to download from the Land Registry Price Paid data, and it gets monthly updated. Again a simple CSV file.

NEXT comes the cool bit…

BING Maps API:

I need to confess; this idea was purely stolen from Chris Webb, M presentation from London Power BI user group. He showed how to use Google Maps to find the distance, and I used to BING Maps API. Again, it is very easy. You can get API key by going to BING Maps Developer site. I went for transit option to find the commute time. I used Power Bi WEB data source with following URL

http://dev.virtualearth.net/REST/V1/Routes/Transit?wp.0=”&FromPostCode&”&wp.1=WC2H8HR&timeType=Departure&dateTime=9:00:00PM&output=xml&key=YourAPIKeyGoesHere

I used M function to find the duration.I am going to use this function in my next dataset to find the distance from my preferable postcode.

M code:

let FindDuration = (FromPostCode as text)=> let Source = Xml.Tables(Web.Contents("http://dev.virtualearth.net/REST/V1/Routes/Transit?wp.0="&FromPostCode&"&wp.1=WC2H8HR&timeType=Departure&dateTime=9:00:00PM&output=xml&key=YourAPIKeyGoesHere")), #"Changed Type" = Table.TransformColumnTypes(Source,{{"Copyright", type text}, {"BrandLogoUri", type text}, {"StatusCode", Int64.Type}, {"StatusDescription", type text}, {"AuthenticationResultCode", type text}, {"TraceId", type text}}), ResourceSets = #"Changed Type"{0}[ResourceSets], ResourceSet = ResourceSets{0}[ResourceSet], #"Changed Type1" = Table.TransformColumnTypes(ResourceSet,{{"EstimatedTotal", Int64.Type}}), Resources = #"Changed Type1"{0}[Resources], Route = Resources{0}[Route], #"Changed Type2" = Table.TransformColumnTypes(Route,{{"Id", type text}, {"DistanceUnit", type text}, {"DurationUnit", type text}, {"TravelDistance", type number}, {"TravelDuration", Int64.Type}, {"TravelDurationTraffic", Int64.Type}, {"TrafficDataUsed", type text}, {"TrafficCongestion", type text}}), #"Expanded RouteLeg" = Table.ExpandTableColumn(#"Changed Type2", "RouteLeg", {"ActualEnd", "ActualStart", "Cost", "EndLocation", "EndTime", "ItineraryItem", "RouteRegion", "RouteSubLeg", "StartLocation", "StartTime", "TravelDistance", "TravelDuration"}, {"RouteLeg.ActualEnd", "RouteLeg.ActualStart", "RouteLeg.Cost", "RouteLeg.EndLocation", "RouteLeg.EndTime", "RouteLeg.ItineraryItem", "RouteLeg.RouteRegion", "RouteLeg.RouteSubLeg", "RouteLeg.StartLocation", "RouteLeg.StartTime", "RouteLeg.TravelDistance", "RouteLeg.TravelDuration"}), #"Removed Other Columns" = Table.SelectColumns(#"Expanded RouteLeg",{"TravelDistance", "TravelDuration"}), #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "DurationInHours", each ([TravelDuration]/60)/60), #"Inserted Rounding" = Table.AddColumn(#"Added Custom", "Inserted Rounding", each Number.Round([DurationInHours], 2), type number), #"Renamed Columns" = Table.RenameColumns(#"Inserted Rounding",{{"Inserted Rounding", "TransitDuration"}}) in #"Renamed Columns" in FindDuration

ZOOPLA API:

You can create Zoopla API account from Zoopla Developer API site, pretty simple API. With the power of Power BI, I quickly imported data using WEB data source from Power BI desktop. Here is the URL I used

http://api.zoopla.co.uk/api/v1/property_listings.xml?postcode=wd17&api_key=xxxxxxx&radius=30.0&maximum_price=800000&page_number=20&page_size=100

I have given different parameters; you can get all parameters list from here. Once I imported the data, I renamed and expanded all the required columns. I used FindDuration function from the above query to find the distance from each property

Date Table:

I created the date table by following blog post by Matt Masson. There are different ways to do it; I preferred it

I imported few other datasets like UK postcodes data for maps and to relate different data sets. Also recession data, I created a table with static values just to show the recession period in the pulse chart.

Finally, my data model looked like this

2016-10-28_02-07-49.jpg

And I managed to do some cool dashboards with this data. You can have  a look at the reports from https://goo.gl/TqPUoo or some screenshots below

www.GIFCreator.me_VSM4YQ.gif

Creating this model was super easy, finding all those data sources were tricky and editing all the properties of different visualisations was time-consuming, but rest was simple. I chose this topic to present to show how easy it is to work with data from various data sources and create stunning visualisations in a jiffy.

Let me know what you think?

Power BI Model – https://goo.gl/mkiGLh

Prathy??

2 comments
1 FacebookTwitterPinterestEmail

Quick Power BI Tip:

Case: How to add extra / new columns to an existing Power BI file which has CSV as data source?

Solution:

When I was working with Power BI today, first I created a model with my CSV files, while I was doing the report; that is when I realised I need one more column from the data source. After I had received the new file, I thought I would just refresh the data source and power BI automatically refreshes my query. In fact, Power BI does automatically update my query but what I noticed was, it excluded the last column. When I looked in the advanced editor query looked like my example here

let
    Source = Csv.Document(File.Contents("\\Mac\Home\Desktop\Blog\2.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source)
in
    #"Promoted Headers"

2016-08-04_06-08-57

I can see, it is strictly bringing two columns only. So the tip is to simply get all the columns into the existing Power BI file, just change the number of columns to the new number of columns in the CSV file, which is 3 in my case.

Updated Query:

2016-08-04_06-21-21

let
    Source = Csv.Document(File.Contents("\\Mac\Home\Desktop\Blog\3.csv"),[Delimiter=",", Columns=3, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source)
in
    #"Promoted Headers"

2016-08-04_06-21-49

That’s all for now 🙂

Prathy

2 comments
0 FacebookTwitterPinterestEmail

Most of the open data source providers supply data in a compressed format, especially when files are large. That is what I noticed with Company House data. I wanted to explore company house data using Power BI. But, I was not at all looking forward to downloading each file then unzip so that I can import into Power BI model. I have many files to work with, so I wanted to automate things. At least, I wanted to unzip all files using Power BI. Quite quickly, I understood Power BI currently does not support all compressed formats. Chris Webb explains here, how to work with compressed files and why currently all compressed formats are not supported.

However, Power BI supports R Script, which means I can easily automate my steps.Until Power BI Desktop supports unzip functionality, we can use new R Script in the Power BI Desktop. Power BI support documented very well regarding how to run R scripts in Power BI. Using R Script, I can download and unzip company house files.

R Script as a source in Power BI Desktop:

Microsoft recommends it is better to write and execute R Scripts in RStudio( or any other R tools) before you use it in Power BI. Power BI does not have a rich R interface, but I found it does give reasonable error information. Currently, R Script imports only data frames to Power BI.

Here is a simple R data frame in RStudio and Power BI Desktop

2016-04-01_21-33-14

RScript in R Studio

2016-04-01_21-33-31

RScript data frame in R Studio

2016-04-01_21-21-41

R Script In Power BI

Power BI Desktop uses R.Execute to execute R Script.

Download and Unzip from web using Power BI and R Script:

I wanted to download more than one file which has different URL with an incremental number. Hence, I used for loop to get the file number I wanted to download. With few lines of R script, I can easily download and unzip all these files.

R script I used in R Studio:

for(i in 0:4)

{

URLFull="http://download.companieshouse.gov.uk/BasicCompanyData-2016-03-01-part"

i=i+1

SourceURL=paste(URLFull,i,"_5.zip",sep="")

Destination="\\\\Mac\\Home\\Desktop\\Working\\csv\\"

DestinationURL=paste(Destination,i,".zip",sep="")

download.file(url=SourceURL, destfile=DestinationURL, method = "auto", quiet=FALSE)

unzipfolder="\\\\Mac\\Home\\Desktop\\Working\\csv\\CSVFiles"

unzip (zipfile=DestinationURL,exdir=unzipfolder)

}

This script does not return any data frame. If I use the same query in PowerBI Desktop, it would still download and unzip files, but I would not be able to import the data to Power BI. Also, I would not be able to use it as a step in the Power BI query.

We can either return an empty R Data Frame and create a separate query in Power BI to read the unzipped files or use a blank query as a source instead of R Script as the source, and do all operations in the single query.

Same script as above in Power BI Desktop returning empty data frame:

for(i in 0:1)

{

URLFull="http://download.companieshouse.gov.uk/BasicCompanyData-2016-03-01-part"

i=i+1

SourceURL=paste(URLFull,i,"_5.zip",sep="")

Destination="\\\\Mac\\Home\\Desktop\\Working\\csv\\"

DestinationURL=paste(Destination,i,".zip",sep="")

download.file(url=SourceURL, destfile=DestinationURL, method = "auto", quiet=FALSE)

unzipfolder="\\\\Mac\\Home\\Desktop\\Working\\csv\\CSVFiles"

unzip (zipfile=DestinationURL,exdir=unzipfolder)

}

df= data.frame()

Alternatively, same R Script without returning any data frame, this executes the R Script and does other Power Query operations:

let

Source = R.Execute

("

for(i in 0:4)#(lf)#(lf)

{#(lf)#(lf)

URLFull=""http://download.companieshouse.gov.uk/BasicCompanyData-2016-03-01-part""#(lf)#(lf)

i=i+1#(lf)#(lf)

SourceURL=paste(URLFull,i,""_5.zip"",sep="""")#(lf)#(lf)

Destination=""\\\\Mac\\Home\\Desktop\\Working\\csv\\""#(lf)#(lf)

DestinationURL=paste(Destination,i,"".zip"",sep="""")#(lf)#(lf)

download.file(url=SourceURL, destfile=DestinationURL, method = ""auto"", quiet=FALSE)#(lf)#(lf)

unzipfolder=""\\\\Mac\\Home\\Desktop\\Working\\csv\\CSVFiles""#(lf)#(lf)

unzip (zipfile=DestinationURL,exdir=unzipfolder)#(lf)



}

"),

df1 = Folder.Files("\\Mac\Home\Desktop\Working\csv\CSVFiles")

in

df1
2016-04-03_20-56-33

Downloaded R Files in Power BI

That is all; as usually now we can do additional operations in Power BI. However, there are few things to consider:

  • There is a 30 min timeout limit on all R Scripts, if you are downloading many large files, Power BI may raise a timeout error.
  • Every time you refresh PowerBI it would re-run the R Script; work around for this to run R Script using one query and do other Power BI operations in another query so you can disable the query which using R Script.

Among these are few other things I noticed:

  • I always had to import a data frame. Otherwise, it would still execute the R script, but I would have nothing in Power BI. In fact, I had to cancel the query, which did not save R Script either.
  • If I cancel Power query refresh, it will lock the files or folders which were in the process – need to close and reopen the Power BI to unlock!
  • If I want to change the R Script, before it finishes execution or Power BI refresh, again Power BI caches the script or values I given in the script, which mean I need to close and reopen Power BI. E.g.After I canceled the query and If I changed the for loop condition, it will still use the condition I given before I cancelled the query.

Overall it is exciting to see R Script in Power BI. It enables many features those Power BI does not support currently. Remember you never know what incredible Power BI team can deliver; an element which is missing today may appear in the next minute 🙂

Till next time,

Prathy 🙂

5 comments
0 FacebookTwitterPinterestEmail
Older Posts