Home Power BI Download And Unzip files using Power BI Desktop and R Script

Download And Unzip files using Power BI Desktop and R Script

by Prathy Kamasani

Most of the open data source providers supply data in a compressed format, especially when files are large. That is what I noticed with Company House data. I wanted to explore company house data using Power BI. But, I was not at all looking forward to downloading each file then unzip so that I can import into Power BI model. I have many files to work with, so I wanted to automate things. At least, I wanted to unzip all files using Power BI. Quite quickly, I understood Power BI currently does not support all compressed formats. Chris Webb explains here, how to work with compressed files and why currently all compressed formats are not supported.

However, Power BI supports R Script, which means I can easily automate my steps.Until Power BI Desktop supports unzip functionality, we can use new R Script in the Power BI Desktop. Power BI support documented very well regarding how to run R scripts in Power BI. Using R Script, I can download and unzip company house files.

R Script as a source in Power BI Desktop:

Microsoft recommends it is better to write and execute R Scripts in RStudio( or any other R tools) before you use it in Power BI. Power BI does not have a rich R interface, but I found it does give reasonable error information. Currently, R Script imports only data frames to Power BI.

Here is a simple R data frame in RStudio and Power BI Desktop

2016-04-01_21-33-14

RScript in R Studio

2016-04-01_21-33-31

RScript data frame in R Studio

2016-04-01_21-21-41

R Script In Power BI

Power BI Desktop uses R.Execute to execute R Script.

Download and Unzip from web using Power BI and R Script:

I wanted to download more than one file which has different URL with an incremental number. Hence, I used for loop to get the file number I wanted to download. With few lines of R script, I can easily download and unzip all these files.

R script I used in R Studio:

for(i in 0:4)

{

URLFull="http://download.companieshouse.gov.uk/BasicCompanyData-2016-03-01-part"

i=i+1

SourceURL=paste(URLFull,i,"_5.zip",sep="")

Destination="\\\\Mac\\Home\\Desktop\\Working\\csv\\"

DestinationURL=paste(Destination,i,".zip",sep="")

download.file(url=SourceURL, destfile=DestinationURL, method = "auto", quiet=FALSE)

unzipfolder="\\\\Mac\\Home\\Desktop\\Working\\csv\\CSVFiles"

unzip (zipfile=DestinationURL,exdir=unzipfolder)

}

This script does not return any data frame. If I use the same query in PowerBI Desktop, it would still download and unzip files, but I would not be able to import the data to Power BI. Also, I would not be able to use it as a step in the Power BI query.

We can either return an empty R Data Frame and create a separate query in Power BI to read the unzipped files or use a blank query as a source instead of R Script as the source, and do all operations in the single query.

Same script as above in Power BI Desktop returning empty data frame:

for(i in 0:1)

{

URLFull="http://download.companieshouse.gov.uk/BasicCompanyData-2016-03-01-part"

i=i+1

SourceURL=paste(URLFull,i,"_5.zip",sep="")

Destination="\\\\Mac\\Home\\Desktop\\Working\\csv\\"

DestinationURL=paste(Destination,i,".zip",sep="")

download.file(url=SourceURL, destfile=DestinationURL, method = "auto", quiet=FALSE)

unzipfolder="\\\\Mac\\Home\\Desktop\\Working\\csv\\CSVFiles"

unzip (zipfile=DestinationURL,exdir=unzipfolder)

}

df= data.frame()

Alternatively, same R Script without returning any data frame, this executes the R Script and does other Power Query operations:

let

Source = R.Execute

("

for(i in 0:4)#(lf)#(lf)

{#(lf)#(lf)

URLFull=""http://download.companieshouse.gov.uk/BasicCompanyData-2016-03-01-part""#(lf)#(lf)

i=i+1#(lf)#(lf)

SourceURL=paste(URLFull,i,""_5.zip"",sep="""")#(lf)#(lf)

Destination=""\\\\Mac\\Home\\Desktop\\Working\\csv\\""#(lf)#(lf)

DestinationURL=paste(Destination,i,"".zip"",sep="""")#(lf)#(lf)

download.file(url=SourceURL, destfile=DestinationURL, method = ""auto"", quiet=FALSE)#(lf)#(lf)

unzipfolder=""\\\\Mac\\Home\\Desktop\\Working\\csv\\CSVFiles""#(lf)#(lf)

unzip (zipfile=DestinationURL,exdir=unzipfolder)#(lf)



}

"),

df1 = Folder.Files("\\Mac\Home\Desktop\Working\csv\CSVFiles")

in

df1
2016-04-03_20-56-33

Downloaded R Files in Power BI

That is all; as usually now we can do additional operations in Power BI. However, there are few things to consider:

  • There is a 30 min timeout limit on all R Scripts, if you are downloading many large files, Power BI may raise a timeout error.
  • Every time you refresh PowerBI it would re-run the R Script; work around for this to run R Script using one query and do other Power BI operations in another query so you can disable the query which using R Script.

Among these are few other things I noticed:

  • I always had to import a data frame. Otherwise, it would still execute the R script, but I would have nothing in Power BI. In fact, I had to cancel the query, which did not save R Script either.
  • If I cancel Power query refresh, it will lock the files or folders which were in the process – need to close and reopen the Power BI to unlock!
  • If I want to change the R Script, before it finishes execution or Power BI refresh, again Power BI caches the script or values I given in the script, which mean I need to close and reopen Power BI. E.g.After I canceled the query and If I changed the for loop condition, it will still use the condition I given before I cancelled the query.

Overall it is exciting to see R Script in Power BI. It enables many features those Power BI does not support currently. Remember you never know what incredible Power BI team can deliver; an element which is missing today may appear in the next minute 🙂

Till next time,

Prathy 🙂

You may also like

5 comments

Naveen Koushal 27 November 2018 - 12:15 pm

Hello PBI team,

I have a query related to R integration in Power BI. Need immediate help!!

I have created a dashboard on Power BI. There is a live direct query connection build between PBI and SQL Azure database.
I have written R script in my dashboard, which do some calculation and save the output on the SQL Azure database. This output then used as input in the PBI using direct query mode. I have created RODBC connectin in my local machine.

The script runs perfectly fine on the PBI desktop however when I publish this dashboard on Power BI servece and run the dashboard, R script do not run and throw errors “ODBC connection failed”. Is that mean that I need to create ODBC connection on SQL Server? If yes then how to do that. Please respond with suggestion to solve this issue.

thanks,
Naveen

Reply
Franklin 29 May 2016 - 10:22 pm

Very interesting article – Thanks and congrats Prathy !
Unzipping is much easier in R than in VBA … 🙂

Reply
Prathy 4 June 2016 - 8:18 am

Thanks 🙂

Reply
Franklin 20 March 2018 - 4:00 pm

Very interesting article – Thanks and congrats Prathy !
Unzipping is much easier in R than in VBA … 🙂

Reply
Prathy 20 March 2018 - 4:00 pm

Thanks 🙂

Reply

Leave a Reply to PrathyCancel reply