Home Author
Author

Prathy Kamasani

Can you believe how amazing and hectic March has been?! So many community members got to experience some major conferences all next to each other. We had the MVP Summit, SQLBits, and the first ever Microsoft Fabric Conference! And, I had the chance to participate in all three! THANKS to the virtual option for MVP Summit. It made it all possible.

SQL Bits

SQLBits is seriously the best conference ever. In the community keynote, Chris Webb said, “I’ll go until I physically can’t anymore,” and that’s the truth. It’s a bunch of people who love the same stuff, all super hyped to learn, meet new folks, and share ideas. You leave SQLBits feeling inspired and pumped up. I’ve been going since 2012, and I’m not leaving anytime soon!

Sessions:

Did a couple of sessions at SQLBits. The first one was “One Dataset and Three Uses Cases with Microsoft Fabric”, but honestly, the title needs work. I’m definitely going to change it up next time. But the session went great; I got picked to present and had a packed room. I got some fantastic feedback, too! Basically, I showed how I used open data to look at different stuff in MS Fabric and why you’d pick one thing over another. I learned a lot making this session, TBH.
Here are the slides and demos: https://bit.ly/48F2sDF

I also did a beginner session on notebooks. We discussed what Power BI devs do and then talked about Python and Semantic Links. It looks like I can’t help but talk about Semantic links in every Fabric Session. But seriously, I had a blast creating this session and learned a lot about Notebooks in Fabric. Slides – https://infinitydataanalytics-my.sharepoint.com/:f:/g/personal/prathy_infinitydataanalytics_co_uk/En5UqnRjnvVIvJqGtMw50AkBGKUxtGSXwQHDwZKokU5zeg?e=8NGQe5

Awesome SQLBits Party:

It was an aviation theme, and even though I initially wanted to dress like a cloud, I ended up opting to be a first class passenger which needed less effort. It was a blast! I had so much fun with my team members and it felt like a much-needed break. I always love introducing the awesome Guy in a Cube guys to new community members and making their wow moments 🙂

My Favorite Moments:

Meeting my SSIS guru in person after so many years was definitely the highlight of the conference for me. I can’t remember if I’ve met him before, but reconnecting with Jamie Thomson and chatting with him was a pleasure. I think I read every single blog post he wrote about SSIS a long time ago! Obviously hosting London Fabric UG at Bits has to be special. I love how my little thought of starting a user group evolved into this beautiful community. 

Microsoft fabric Conference 2024

WOW, where can I start? First of all, thanks to Rie and Kelly for coordinating with us to participate in this community conference and for being such kind hosts. Huge shoutout to Avanade for supporting me on this, especially to Alan G, Ian M, and Jackie S for their help.

Session:

To make my life a tiny bit more complex, I decided to do a different session at the Fabric Conference. The good thing was I co-presented with Aaron. CoPresenting is always tricky, especially if you don’t know the speaker well. But I think our session was excellent. We used a collaborative approach and it was great, and I’m happy we got excellent feedback on our flow and presentation. Returning to the session, it was about OneLake – Empowering Data Platforms with Microsoft OneLake. Honestly, until the day before, I was learning things about OneLake and security. I remember sitting in the Speaker room with Arthi and other One Lake team discussing how different personas could securely access the data in One Lake. I learned a lot, and it was a great session.

Slides – Link to slides

Overall Conference

Congratulations to the entire team who were part of it, and I honestly didn’t envy the Fabric team. They were always engaged; it could be Partners evening, Keynote, sessions, booth, Ask the Experts or Power Hour. I love how they engaged with passion. 4000+ people under one roof, I feel lucky to have been a part of it as an MVP, speaker, partner, and attendee. Meeting people from all over the world (including my global team!) was so cool. But above all, what stood out to me was how kind and collaborative everyone was. The Microsoft Fabric Team is seriously inspiring, and I feel blessed to have connected with such amazing folks. PS I was in keynote video along with many other community members. Always nice to see yourself on big screen, especially when lighting is in favour 🙂

Beyond Conferences

Conferences give you a chance to explore beyond the airport and hotel room. I have always been sceptical about Vegas, and as I anticipated, I am not a huge fan of big casinos and hotels. But the conference centre was great, and some parts of Vegas were significant. I also managed to tick an item off my bucket list by going to the Grand Canyon.I love seeing these Amazing natural formations; I think it just makes me feel more grounded.

The delay of my first connecting flight made the trip memorable, which resulted in United Airlines rerouting my journey. This gave me the opportunity to catch my sister by surprise at her home, and her reaction will always remain priceless in my heart.

 

Reflecting on these conferences today in the middle of the night with jet lag makes me realize how much they pushed me out of my comfort zone. Even though the sessions were stressful because of the new concepts they introduced, I left feeling like I had learned a lot. Seeing all the successful and inspiring people there encouraged me to reflect on my own blogging journey once again. 

Until we meet again 🙂

0 comment
0 FacebookTwitterPinterestEmail

Like many I am also playing with Fabric, many of my clients are also excited about Fabric and want to know more about it. Being a solution architect in the consulting world one of the most common questions I get asked is: “When certain features will be available, Where are they in the roadmap?”. That’s what sparked the idea of scraping the Microsoft Fabric Roadmap and creating this Power BI report. It is based on a Direct Lake connection, so it has been a bit temperamental.

So, what did I do it? If you are not interested in the whole story. Here is Python code you can run to get a road map. If you are interested in my process carry on reading 🙂

Data Integration

I am not proficient in Python but I have done web scraping with Power BI many times. So I know my way around web scraping a bit. However, I started with ChatGapt, doing a rough skeleton using Python.

url = "https://learn.microsoft.com/en-us/fabric/release-plan"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='main')
# print(results.prettify())

The above script gives me the entire page. I am obviously not interested in the entire page. I went to developer tools and figured out snippets I am interested in.

It looked like, I need all Hrefs and then I can filter out only those I am interested in.

# get all hrefs
hrefs = results.find_all('a', href=True)

# keep all href name and link in a list
href_list = []
for href in hrefs:
    href_list.append([href.text, href['href']])

# add a column name to the list with full url
href_list = pd.DataFrame(href_list, columns=['name', 'url'])

# remove rows containing https  
href_list = href_list[~href_list['url'].str.contains("https")]

# add a new column with full url    
href_list['full_url'] = 'https://learn.microsoft.com/en-us/fabric/release-plan/' + href_list['url']
href_list.loc[href_list['name'] == 'Power BI', 'full_url'] = 'https://learn.microsoft.com/en-us/power-platform/release-plan/2023wave2/power-bi/planned-features'

I was able to extract the hrefs from the web page using the above code snippet. It was a bit messy, but I managed to clean it up and get the list I wanted. Now, I have to iterate over the links and scrape the data from each page. The pages have different formats and layouts, so I need to adjust my code accordingly. I decided to try chat GPT for some help, and AI suggested using Pd_readhtml to read the tables from the HTML pages.

table_list = []
# loop through href_list, check if it has tables and if it does, append to table_list
for index, row in href_list.iterrows():
    table = pd.read_html(row['full_url'], extract_links="all")
    table[0]['product'] = row['name']
    table[0]['FullURL'] = row['full_url']
    table_list.append(table[0])

Pandas_df = pd.concat(table_list, ignore_index=True)
Pandas_df.columns = ['Feature', 'Estimated_release_timeline', 'Product','FullURL', 'Enabled_for', 'Public_Preview','GA']
Pandas_df = Pandas_df.astype({'Feature': 'string', 'Estimated_release_timeline': 'string', 'Product': 'string', 'Enabled_for': 'string', 'Public_Preview': 'string', 'GA': 'string'})

spark_df = spark.createDataFrame(Pandas_df)

# display(spark_df)

I decided to make some improvements to the data. I added some extra columns to identify the products more easily, I converted the data types to string for consistency, and I gave meaningful names to the columns, avoiding any spaces that could cause problems later.

Next, I applied some more transformations with Spark. I learned that Spark Data Frame is much faster than Pandas, so I switched to Spark DataFrame.

Script for all transformations
from pyspark.sql.functions import split, col
spark_df = spark_df.withColumn('Feature_Name', split(col('Feature'), '\',')[0])
spark_df = spark_df.withColumn('Feature_URL', split(col('Feature'), '\',')[1])
# display(spark_df)

spark_df = spark_df.withColumn('Feature_Name', regexp_replace('Feature_Name', '\(\'', ''))
spark_df = spark_df.withColumn('Feature_Name', regexp_replace('Feature_Name', '\'', ''))
spark_df = spark_df.withColumn('Feature_URL', regexp_replace('Feature_URL', '\'\)', ''))
spark_df = spark_df.withColumn('Feature_URL', regexp_replace('Feature_URL', '\'', ''))
spark_df = spark_df.withColumn('Feature_URL', regexp_replace('Feature_URL', ' ', ''))
# display(spark_df)

# spark_df = spark_df.withColumn('Feature_URL', concat(col('FullURL'), col('Feature_URL')))
spark_df = spark_df.withColumn('Feature_URL', when(col('Product') == 'Power BI'
    , concat(lit('https://learn.microsoft.com/en-us/power-platform/release-plan/2023wave2/power-bi/'), col('Feature_URL')))
    .otherwise(concat(col('FullURL'), col('Feature_URL'))))

spark_df = spark_df.withColumn('Estimated_release_timeline', split(col('Estimated_release_timeline'), ',')[0])
spark_df = spark_df.withColumn('Estimated_release_timeline', regexp_replace('Estimated_release_timeline', '\(\'', ''))
spark_df = spark_df.withColumn('Estimated_release_timeline', regexp_replace('Estimated_release_timeline', '\'', ''))
spark_df = spark_df.drop('Feature')
# display(spark_df)
spark_df = spark_df.withColumn('Enabled_for', split(col('Enabled_for'), '\',')[0])
spark_df = spark_df.withColumn('Enabled_for', regexp_replace('Enabled_for', '\(\'', ''))
spark_df = spark_df.withColumn('Enabled_for', regexp_replace('Enabled_for', '\'', ''))
spark_df = spark_df.withColumn('Enabled_for', regexp_replace('Enabled_for', '\)', ''))

#split GA column into two columns
spark_df = spark_df.withColumn('GA', split(col('GA'), ',')[0])
spark_df = spark_df.withColumn('GA', regexp_replace('GA', '\(\'', ''))
spark_df = spark_df.withColumn('GA', regexp_replace('GA', '\'', ''))

#split Public_Preview column into two columns
spark_df = spark_df.withColumn('Public_Preview', split(col('Public_Preview'), ',')[0])
spark_df = spark_df.withColumn('Public_Preview', regexp_replace('Public_Preview', '\(\'', ''))
spark_df = spark_df.withColumn('Public_Preview', regexp_replace('Public_Preview', '\'', ''))

# spark_df.show()  
# display(spark_df)

Table_Name = 'Fabric_Release_Plan'
# print(Table_Name)
spark_df.write.format("delta").option("mergeSchema", "true").mode("overwrite").saveAsTable(Table_Name)
print('Table ' + Table_Name + ' has been created successfully')

Finally, inserted all data into the Spark Delta Table

Table_Name = 'Fabric_Release_Plan'
# print(Table_Name)
spark_df.write.format("delta").option("mergeSchema", "true").mode("overwrite").saveAsTable(Table_Name)
print('Table ' + Table_Name + ' has been created successfully')

Then, I went to get descriptions of each item as well and displayed them in the tooltips of my report. This was similar to the rest of the code, but more fun and challenging.

Script to get Descriptions

url = "https://learn.microsoft.com/en-us/fabric/release-plan"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='main')

# get all hrefs
hrefs = results.find_all('a', href=True)

# keep all href name and link in a list
href_list = []
for href in hrefs:
    href_list.append([href.text, href['href']])

# add a column name to the list with full url
href_list = pd.DataFrame(href_list, columns=['name', 'url'])

# remove rows containing https  
href_list = href_list[~href_list['url'].str.contains("https")]

# add a new column with full url    
href_list['full_url'] = 'https://learn.microsoft.com/en-us/fabric/release-plan/' + href_list['url']
href_list.loc[href_list['name'] == 'Power BI', 'full_url'] = 'https://learn.microsoft.com/en-us/power-platform/release-plan/2023wave2/power-bi/planned-features'

Desc_List=[]
table_list = []
# loop through href_list, check if it has tables and if it does, append to table_list
for index, row in href_list.iterrows():
    
    # also get all descriptions
    url = row['full_url']
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    results = soup.find(id='main')

    h3 = results.find_all('h3')
    h3_ids = []
    for h in h3:
        h3_ids.append(h.get('id'))
        h3_ids.append(h.get_text())
        h3_ids.append(h.find_next('p').get_text())
        h3_ids.append(h.find_next('p').find_next('p').get_text())
        # print(h3_ids)
        # transpose list
    h3_ids = [h3_ids[i:i+4] for i in range(0, len(h3_ids), 4)]
    h3_ids = pd.DataFrame(h3_ids, columns=['id', 'name', 'timeline', 'full_description'])
    h3_ids['url'] = url
    # add product name
    h3_ids['product'] = row['name']
    # add full url with concat of url and id
    h3_ids['full_url'] = url + '#' + h3_ids['id']
    Desc_List.append(h3_ids)

# concat all desc_list dataframes
Desc_List = pd.concat(Desc_List)
# display(Desc_List)

spark_df = spark.createDataFrame(Desc_List)
# spark_df.show()

Table_Name = 'Fabric_Release_Plan_with_desc'
# print(Table_Name)
spark_df.write.format("delta").option("mergeSchema", "true").mode("overwrite").saveAsTable(Table_Name)
print('Table ' + Table_Name + ' has been created successfully')

Next, I ran a simple script to get the time my notebook refreshed to display the last refresh date and time on the report.

# get notebook run time and save in a table
import time
from datetime import datetime
from pyspark.sql.functions import lit
import pandas as pd

now = datetime.now()
current_time = now.strftime("%Y-%m-%d %H:%M:%S")

df = pd.DataFrame([current_time], columns=['Time'])
spark_df=spark.createDataFrame(df)
spark_df.write.format("delta").option("mergeSchema", "true").mode("overwrite").saveAsTable("Notebook_runtime")

Modelling

Modelling was pretty easy. I created relationships using the Sql Endpoint relationships view.

Then I chose which tables to show in my default semantic model.

This lineage view explains the flow.

Reporting

I created a simple Power BI report on the default semantic model using the Power BI service. I published to the web and you can view the report here – https://app.powerbi.com/view?r=eyJrIjoiOTYxMmM3NjYtMGIxZS00MDFmLTljOWMtYTYyMzBmYjA3ZGJjIiwidCI6IjM4YmU3MjU5LTMyMGQtNDZkYy04YzNiLWNkZDA5OTM1NGExMyIsImMiOjh9

Until next time,
Prathy 🙂

0 comment
1 FacebookTwitterPinterestEmail
If you are a consultant like me, you know how hard it can be to access Power BI Admin API or Service Principal. Sometimes, you need to see all the workspaces you have permission for and what’s inside them. Well, I found with MS Fabric, we can use notebooks and achieve it with a few steps:
  1. Get Tennant ID
  2. Connect to Tennant for POWER BI
  3. Call POWER BI REST API, get data and save data in Lake House
  4. Report using Power BI

If you are new to Microsoft Fabric and Notebooks, checkout this series on MS Docs – How to use notebooks – Microsoft Fabric | Microsoft Learn.

Get Tennant ID

There might be simple ways to get Tennant ID using Python, but the below code works to get Tennant ID of the current Notebook. 

import re
conf_list = spark.sparkContext.getConf().getAll()

# filter the configuration parameters to get the driver log URL
filtered_list = filter(lambda x: x[0] == 'spark.tracking.driverLogUrl', conf_list)
value=next(filtered_list,None)
url = value[1]

parts = url.split('/')
index=parts.index('sparkui') 
TennantID=parts[index+1]
print(TennantID)

Connect to Tennant for Power BI

First, I need an access token, so I use the authority URL to authenticate with my login. This gives me the workspaces I can access, but you can also use Azure APP and Service Principal authentication.

# Import the necessary libraries
# ! pip install adal
from adal import AuthenticationContext
import requests# Set the required parameters
TENANT_ID = TennantID
CLIENT_ID ='1aea3f97-edc6-4453-a59b-b88b0b803711'# this is Power BI Client Integrations client ID
RESOURCE = 'https://analysis.windows.net/powerbi/api'# Set the authority URL and create an authentication context
authority_url = f'https://login.microsoftonline.com/{TENANT_ID}'
context = AuthenticationContext(authority_url)# Acquire a user code and device code to authenticate the user
code_response = context.acquire_user_code(RESOURCE, CLIENT_ID)
print(code_response['message'])token_response = context.acquire_token_with_device_code(RESOURCE, code_response, CLIENT_ID)
access_token = token_response['accessToken']

The code above uses the Power BI Client Integration Client ID to connect to the service. It prompts me to enter a code to authenticate myself when I run it. This is fine for occasional or manual scripts but not very convenient otherwise.

Another option is to create an app and a service principal in Azure. This lets you authenticate and authorise your app to access Azure resources. To learn how to do this, check out this link –Power BI REST APIs for embedded analytics and automation – Power BI REST API | Microsoft Learn

!pip install azure.identity
from azure.identity import ClientSecretCredential
import json, requests, pandas as pdtenant_id= TennnatID
client_id = 'Your ClientID' client_secret='Your Client Secret' # Good to not hardcode client secret here, try to see how you can keep in Azure Key Valult safely
ResourceURL= 'https://analysis.windows.net/powerbi/api' APIURL='https://analysis.windows.net/powerbi/api/.default'
auth = ClientSecretCredential(authority = 'https://login.microsoftonline.com/',tenant_id = tenant_id,client_id = client_id,client_secret = client_secret)
access_token = auth.get_token(APIURL) access_token = access_token.token# print(access_token)

Following Call Power BI REST APIs and save them in Lakehouse

In this example, I’m only fetching the workspace ID, the dataset ID and the report ID, but you can use any other endpoint you can access. Just make sure you have the proper permissions and authentication tokens. The code is in Python, but you can use any language that supports HTTP requests.

# Set the base URL and headers for the API call
Baseurl = 'https://api.powerbi.com/v1.0/'#myorg/groups'
headers = {'Authorization': f'Bearer {access_token}'}
# print(headers)# Get a list of all workspaces in the organization
response = requests.get(Baseurl+'myorg/groups', headers=headers)if response.status_code == 200:
workspaces = response.json()['value']
# get list of all workspaces and their IDs and keep in a dataframe
import pandas as pd
for workspace in workspaces:
# print(workspace['name'], workspace['id'])
workspaces_df = pd.DataFrame(workspaces)
workspaces_df.head()
display(workspaces_df)
else:
print(f"Error: {response.status_code} - {response.text}")#create a dataframe to hold the list of all workspaces
workspaces_df = pd.DataFrame(workspaces)
workspaces_df.head()workspaces_df = workspaces_df.fillna('')
# replace spaces in columns with _
workspaces_df.columns = workspaces_df.columns.str.replace(' ', '_')
workspaces_df.head()#Create a spark dataframe from the Datasets_df
workspaces_spark_df = spark.createDataFrame(workspaces_df)
spark.sql("DROP TABLE IF EXISTS Workspaces")
workspaces_spark_df.write.format("delta").mode("overwrite").saveAsTable("Workspaces")

Datasets=[]
# Get a list of all datasets in the organization
for workspace in workspaces:
    # print(workspace['name'], workspace['id'])
    response = requests.get(Baseurl+f"myorg/groups/{workspace['id']}/datasets", headers=headers)
    if response.status_code == 200:
        datasets = response.json()['value']
        # get list of all datasets and their IDs and keep in a dataframe
        for dataset in datasets:
            # print(dataset['name'], dataset['id'])
            Datasets.append(dataset)
            #add workspace name and ID to the dataset
            dataset['Workspace Name'] = workspace['name']
            dataset['Workspace ID'] = workspace['id']


    else:
        print(f"Error: {response.status_code} - {response.text}")
# display(Datasets)


# Create a dataframe from the Datasets list
Datasets_df = pd.DataFrame(Datasets)
# select key columns
Datasets_df = Datasets_df[['name', 'id', 'Workspace Name', 'Workspace ID','configuredBy','webUrl','createdDate']]
# rename name and id to Dataset Name and Dataset ID
Datasets_df = Datasets_df.rename(columns={'name':'Dataset Name', 'id':'Dataset ID'})
# replace the null values with empty string
Datasets_df = Datasets_df.fillna('')
# replace spaces in columns with _  
Datasets_df.columns = Datasets_df.columns.str.replace(' ', '_')
Datasets_df.head()


#Create a spark dataframe from the Datasets_df
datasets_spark_df = spark.createDataFrame(Datasets_df)
spark.sql("DROP TABLE IF EXISTS Datasets")
datasets_spark_df.write.format("delta").mode("overwrite").saveAsTable("Datasets")
Reports=[]
# Get a list of all reports in the organization
for workspace in workspaces:
# print(workspace['name'], workspace['id'])
response = requests.get(Baseurl+f"myorg/groups/{workspace['id']}/reports", headers=headers)
if response.status_code == 200:
reports = response.json()['value']
# get list of all reports and their IDs and keep in a dataframe
for report in reports:
# print(report['name'], report['id'])
Reports.append(report)
#add workspace name and ID to the report
report['Workspace Name'] = workspace['name']
report['Workspace ID'] = workspace['id']


else:
print(f"Error: {response.status_code} - {response.text}")
# display(Reports)


# Create a dataframe from the Reports list
Reports_df = pd.DataFrame(Reports)
# select key columns
Reports_df = Reports_df[['name', 'id', 'Workspace Name', 'Workspace ID','webUrl','datasetId','datasetWorkspaceId']]
# rename name and id to Report Name and Report ID
Reports_df = Reports_df.rename(columns={'name':'Report Name', 'id':'Report ID'})
# replace the null values with empty string
Reports_df = Reports_df.fillna('')
# replace spaces in columns with _
Reports_df.columns = Reports_df.columns.str.replace(' ', '_')
# Reports_df.head()


#Create a spark dataframe from the Reports_df
reports_spark_df = spark.createDataFrame(Reports_df)
spark.sql("DROP TABLE IF EXISTS Reports")
reports_spark_df.write.format("delta").mode("overwrite").saveAsTable("Reports")
 
 In the code snippet above, I started by defining the Base URL to add the REST API URLs to it later. Then, I used the response status to handle any possible API issues. After that, I stored the response in the panda’s data frame and converted it to a spark data frame before saving it in the Lakehouse as a delta table. I also removed spaces from the column names because Spark tables don’t allow them. This is the same pattern that I followed for all the other APIs. 
 

Create Report

When I create a Lakehouse in Microsoft Fabric, it will automatically create some default options.

When I click on the ellipsis of two default options, Dataset(Default) or SQL endpoint, there are various ways to analyse data.

Now, you might be overwhelmed with all these different visualisation options available. Each one does serve a different purpose if you know the difference between them. But for my use case, I want a simple list of all workspaces, reports, who created them, configured them, etc. So,

I selected the Dataset(Default) option and clicked Create Paginated Report.

I dragged and dropped the columns I wanted to see in the table, such as workspace name, report name, owner, created date, etc.

That’s it! I now have a simple and handy list of all my workspaces in Microsoft Fabric. I could export, share or distribute this report to further audiences.


This is a simple use case, but it has much potential. Instead of using the default dataset, I could have created some relationships. I could have even used TOM to create a Dataset, relationships, measures, etc. I didn’t have enough time to explore that. Additionally, I wasn’t sure how to do it in Python. Maybe someone can try it and let me know how it goes. 

I was curious; I started everything with embedded API and how it could read data from different sources in Power BI. I wanted to document Workspaces, Datasets and Reports with all the details with Embedded API. But, I encountered a problem: “Embedding a DirectLake dataset is not supported with V1 embed token“. Maybe this will change, and then we can do amazing things. We could document various artifacts in a workspace, create reports and publish them in Power BI App—an intelligent way to document. For now, I used REST operations.

Another option is to use the Admin API and scanner API, start a trace and get the dataset metadata, document measures, etc. I just wanted to show these notebooks open doors to many amazing things. 🙂

Until next time,

Prathy 🙂

0 comment
0 FacebookTwitterPinterestEmail

I had been following the hints and clues on Twitter, where some Power BI team members were changing their job titles to Synapse. I was curious and hopeful about what they were working on, but I also didn’t want Power BI to lose momentum. When I saw the leadership changes, I felt reassured that whatever they did would be amazing.

Fast forward, at Build, Microsoft announced the long-awaited Microsoft Fabric. What is it? It’s a SaaS Data Fabric platform with a common data lake called OneLake. It’s a one-stop shop for all the analytics needs of every enterprise. With fabric, there is no need to stitch together services from multiple vendors and all data is saved as delta parquet files. No more vendor lock-in or proprietary data formats!

The data Fabric concept is not new; as per datanamiNoel Yuhanna has been at the forefront of Data Fabric. In his words, “A data fabric is essentially an abstraction layer that links a disparate collection of data tools that address key pain points in big data projects. A data fabric solution should deliver capabilities in the areas of data access, discovery, transformation, integration, security, governance, lineage, and orchestration. It should also provide self-service capabilities, as well as some graph capabilities to identify connected data.

By providing a way to bring these data management capabilities to bear on data spanning all these silos, a data fabric can help alleviate core data management challenges holding companies back from higher-level data use cases, including advanced analytics and AI in the cloud.”

Microsoft Fabric takes this concept to the next level by integrating it with Power BI, Microsoft Office and Fluent UI frameworks, Integration with Office and Power BI makes the time and steps to data insights very flexible. Fluent UI provides a consistent and beautiful user experience across all artefacts. I think Microsoft Fabric is the ultimate solution for anyone who wants to leverage the power of data in the cloud. Whether you prefer a centralized or distributed approach, Microsoft Fabric can handle it all. You don’t have to change your existing skills, and you can use the tool or language you are comfortable with. There is plenty of information on MS Docs, and I am sure many people will be talking about it. I want to take my take on Microsoft Fabric from an ex-Microsoft SQL Server BI developer.

You see, I have a long history with Microsoft SQL Server and BI. I used to work with SSIS, SSRS and SSAS to create amazing data solutions. I loved how I could integrate, transform and analyze data using these tools. I felt like I had everything I needed in one place: SQL Server. Ever since, no matter what tool I worked with, I still somehow try to connect the dots back to MSBI. This might be because it helps me to understand the topic better or not to get intimidated by changing world. 

Fabric sounds game changer, but as an ex-MS SQL BI developer, I find it familiar. In my older days, I accumulated data in SQL Server databases. I made ETL or ELT using SQL-stored procedures and my first love SSIS. All my data integration tasks were mainly done using SSIS but sometimes with a few bits of C# and hardcore SQL Procedures. I used SSAS to create dimensional models; all calculations were traditionally created by MDX and then DAX.

I created beautiful SSRS reports with cool features like data-driven parameters, conditional formatting, look-up sets. I kept my end users on top of their KPIs using SSRS subscriptions. Not only end users, hardcore IT guys knew when the server was hitting limits when jobs were failing, even some occasional happy birthday emails using SQL Server Task Manager and sometimes with a bit of help from the Windows Server task manager. Whatever I did, I kind of moved, transformed and analysed all the data on my SQL Server. There were occasional intruders like Oracle, My Sql etc but I was faithful to SQL Server.

But times have changed, and so has the data landscape. Now we have more data sources, more data types, and more data challenges. We need to be able to handle big data, streaming data, unstructured data and more. We need to be able to scale, secure and optimize our data pipelines. We need to be able to deliver insights faster and easier.

That’s where Microsoft Fabric comes in. It’s a platform that lets me connect all these different data sources and destinations. If I have an existing data platform, nicely formatted and all; I used to use linked servers, now I would use Shortcuts on Fabric. Back then, used to access data files with a mapped network drive on Server. Now I would probably mount my storage on Fabric. Microsoft Fabric lets me connect all kinds of data sources and destinations. Whether I have a well-structured data platform or a bunch of files on a server, I can easily access them with Shortcuts and Mounts on Fabric.

For Data Integration, I used a combination of SSIS and SQL Server functionalities like stored procedures and functions. Now on Fabric, I can use Data Pipelines and Data Flows Gen-II to create visual and intuitive workflows for my data transformations. Or I can use Notebooks to write code in any language I want. I think I will be mostly using Pipelines and Data flows Gen-II. I am a more visual person.

I haven’t played enough, but it looks like, I can create multiple data flows and call those data flows from Data Factory pipelines. That would be really useful for scenarios like For Each Loop. 

Next comes methodologies or frameworks of data storage. The proven approach then was ETL – Extract Raw data from operational data sources, Transform and store in a Transactional layer, then Load into a Data warehouse; in a presentable format for further data analysis. I think the old ETL is now Medallion Lakehouse architecture. For larger data warehouses we used to create Databases schemas or data marts for each department or domain, and now that’s domain-based data mesh architecture. 

Microsoft Fabric supports any data storage framework or architecture I need. I can use Workspaces and Domains to organize my data into different stages and domains. I can follow the Medallion Lakehouse approach or create custom data marts for each department or domain.

Next data analytics, in the past, we had three options for creating an analytics layer: SQL Views, SQL DW and SSAS Model. The choice depended on many factors, such as the urgency, the use case, the budget and the developer’s skills. But now, with Fabric, which gives us more flexibility and power. We can use gold-level lakehouse tables for fast and reliable data access, or we can build a Warehouse with custom measures and calculations, or we can create a Power BI Dataset for interactive visualization. 

In my opinion, the priority would be time to insights and performance. Using the benefits of DeltaLake and one copy would be my highest priority. But again, there are best practices and client scenarios. It’s not always easy to follow a recommended framework.

But that’s not all. Analytics is not just about making nice reports with actionable insights. We are in the age of AI. We want those insights to reach the end users in the most effective way possible. In the past, I used SQL Task Manager and Windows Task Manager to automate things. I used SSRS and Excel for further analysis. I relied on support teams for ad hoc report requests. In the modern era, different export options, and self-service BI is not enough. We need scalable platforms with no propriety locks to use the final output for further analysis. Gold-level data is not an end. We may want to analyse the final dataset created by the self-service end user. We may want to add pixie dust of Open AI on that final data frame created by the citizen developer. That’s where Fabric is perfect. I am no longer limited to the final stage of Load in my ETL framework or Gold layer of Medallion Lakehouse architecture. We are emerging into the world of creating Platinum and diamond layers. 

Microsoft Fabric, there are so many elements I don’t know yet. There are these whole data science bits, I don’t understand yet. But as an ex-MS SQL BI developer, I find it familiar. It reminds me of how I used to work with SQL Server and BI tools. It’s like a modern version of them. It’s like a new way of doing old things. It’s time to bring back Data Platforms and analytics platforms together. Together in Cloud 🙂

What do you think? Are you excited about Microsoft Fabric? Do you have any questions or comments? Let me know in the comments section below. 

Until next time, Prathy 🙂

0 comment
1 FacebookTwitterPinterestEmail

A colleague of mine came to me with an interesting use case, “ Switch between a summarized value or selected value with multiple legends”. For example, I have five countries and their GDP values. When the end user goes to the report, the user would like to see the average GDP of all countries, but when the user selects single or multiple countries on the slicers, the line chart should show only selected values. Like below:

Default view
Selected view

There are multiple ways to achieve this; some are more elegant than others.

Layering visuals

First, I create two DAX measures to show values based on selection.

All avg =
VAR V1 = CALCULATE(COUNTROWS(VALUES('Table'[Country])),ALLSELECTED('Table'[Country]))
VAR V2 = CALCULATE(COUNTROWS(VALUES('Table'[Country])),ALL('Table'[Country]))
Return
if(
v1=V2,CALCULATE(AVERAGE('owid-energy-data'[Value]),ALL('Table'[Country])))
selected avg =
VAR V1 = CALCULATE(COUNTROWS(VALUES('Table'[Country])),ALLSELECTED('Table'[Country]))
VAR V2 = CALCULATE(COUNTROWS(VALUES('Table'[Country])),ALL('Table'[Country]))
Return
if(
v1=V2
,BLANK()
,AVERAGE('owid-energy-data'[Value])
)


Then I create two line charts, using the year on the X-axis and one of these measures on the Y-axis. Enable Legend on Visual, where I am using the “selected avg” measure.

Create two other measures for Titles:

Title All =
VAR V1 = CALCULATE(COUNTROWS(VALUES('Table'[Country])),ALLSELECTED('Table'[Country]))
VAR V2 = CALCULATE(COUNTROWS(VALUES('Table'[Country])),ALL('Table'[Country]))
Return
if(
v1=V2,"Average of all countries")

Title Selected =
VAR V1 = CALCULATE(COUNTROWS(VALUES('Table'[Country])),ALLSELECTED('Table'[Country]))
VAR V2 = CALCULATE(COUNTROWS(VALUES('Table'[Country])),ALL('Table'[Country]))
Return
if(
v1<>V2,"showing selected average")

Update respective visuals titles with conditional formatting

Make a few more changes, like disabling Axis titles and making more line charts of the same size. Here I have not updated Min and max axis, but sometimes that needs to be addressed based on data.

Then I get below viz. It nicely shows, Average value by default, and when the user chooses a country, it shows the selected country value.

One main issue with this approach is I can only see tooltips of the top visual. The same goes for all other “More options” of the visual.

Using a single measure

Another approach is to, instead of layering, use one measure to switch between average and selected values.

I create a new measure

Dynamic Avg =
VAR V1 = CALCULATE(COUNTROWS(VALUES('Table'[Country])),ALLSELECTED('Table'[Country]))
VAR V2 = CALCULATE(COUNTROWS(VALUES('Table'[Country])),ALL('Table'[Country]))
Return
if(
v1=V2,[All avg],[selected avg])

Also, create a new Title measure

Title =
VAR V1 = CALCULATE(COUNTROWS(VALUES('Table'[Country])),ALLSELECTED('Table'[Country]))
VAR V2 = CALCULATE(COUNTROWS(VALUES('Table'[Country])),ALL('Table'[Country]))
VAR SelctedCountries = CONCATENATEX(VALUES('Table'[Country]),[Country],",")
Return
if(
v1=V2,"Average of all countries", IF(
COUNTROWS(values('Table'[Country]))=1, "GDP value of "& SelctedCountries&" ",
"GDP values of countries "& SelctedCountries&" "))

This enables the end user to view the average as the default and select value based on slicer selection.

But for User Experience, we need a legend to show what slicer has been selected, and things change as soon as we enable legend. Legend is enabled even when the average is displayed, and it causes confusion for the end user.

Instead of enabling legend, we may enable axis titles, but again for average, it automatically chooses a country based on sort order. This approach is good as long as you don’t want a legend.

Alternative to legend

The following approach is to find an alternative to legend. Instead of showing a legend, I make a copy of the visual, convert it into a tree map, and use it as a legend. Same logic as I blogged here – https://prathy.com/2022/06/using-tree-map-as-legend-on-a-page-in-power-bi/

Make a few more changes. Instead of using the Title on the Line chart, add a text box and show the Title. Group all three visuals and backgrounds to give the illusion of a single visual.

When a user lands on this page, it shows the average GDP value by default. When a value is selected in the slicer, it shows only that country. For usability, I have enough signifiers like Title and legend colour.

I hope this inspires someone out there. Until next time
Prathy 🙂

1 comment
1 FacebookTwitterPinterestEmail
USING TREE MAP AS LEGEND ON A PAGE in Power BI

I recently worked on two projects where the client wanted to show multiple metrics sliced by the same categorical data. For example, seeing how various metrics are performing over different regions or different product groups. A use case like this can be achieved in many ways; probably the best approach is to use small multiples functionality or to keep it simple, five same visuals with different metrics.

Let’s look into it with energy consumption data. Here, I want to show metrics 1 to 5 on different income levels over the years.

Five different visuals

When you know exactly how many categories you have in the requirement and how you want to display your data, then going for a certain number of visuals is my favourite approach.

UntitledImage

Using Small multiples

When there is no clarity about the number of categories, the small multiples feature is significant. For this use case, I went for two columns. Due to five categories, I get an extra blank space. It doesn’t look that bad, but I would like more control over the placement of visuals on the canvas. 

SmallMUltiples

When I compare Five individual visuals to small multiples, small multiples may have better performance from the number of queries point of view, and it also looks a bit tidier with one legend, one axis.

To have the best of both, we can use TreeMap visual as a legend

Tree Map visual as a legend

For the tree map, I used the income level column as a category and distinct count of the same column as value. I have turned off data labels, header icons and tool tips. I ensured I used the same “Max Axis” value for all visuals, making the comparison seamless. 

This approach is cleaner than showing the same legend for each visual and serves the same purpose as the usual legend: a user can identify what is what based on colour or shape and interact by clicking on the legend. 
https://prathy.com/wp-content/uploads/2022/06/2022-06-25_14-49-20-1-3.gif

The benefits of this approach are that your report stands out, a nice big legend is easy to interact with on touch screen devices and looks sleek rather than repeating the same legend. I hope the blog inspires someone out there.

Until next time,

Prathy 🙂

0 comment
3 FacebookTwitterPinterestEmail

In one of the projects I was working on, I received feedback saying it is hard to understand how many items they have selected in a slicer, and it is not the first time I came across this. It is a valid point, especially when you have quite a few items in a slicer, you use a search bar to look for items, you select a couple, but you were not sure how many were selected.

Obviously, there are many solutions to one problem. I have seen many reports displaying custom labels to handle this situation. Still, after some research for a better UX, I found this example on the dribble, which looked like an excellent design method.

shot.gif (800×600) (dribbble.com)

And 

And doing this in 

 

The logic here, I will use a DAX measure to figure out how many items are selected and count those Items to display in an object. I can display this many ways, but I went for a Shape. Having shape allows me to use a shape type as a background; I can use a dynamic DAX expression to show the value. The only thing to note is, my DAX expression has to be a string type. Hence, I created my measure like below. Next is the slicer visual. By default, I enabled the slicer header and used an empty character as text in the slicer header. If I keep the slicer header off, whenever I hover over a slicer, it jumps down/up to make more options visible. If I keep the slicer header on, that won’t be an issue. The only way to have the slicer header off with no title and no tooltip to display is by using Empty Character. So, I used an empty character as the title for the slicer header. Then I went to change the title to DAX measure. And then I grouped altogether, slicer and the shape. Now when I select multiple items, it nicely shows how many items were selected. When none is selected, it shows all the items available in the slicer. In this example, I don’t have many values, so a nice round shape works but based on several results, I may go for different Shape types and keep my shape. But to make it work perfectly, I mean to make this shape to show the right number irrespective of other visuals cross-filtering can be a tiring job. One way is to make all the rest of the visuals do not interact with this shape. That can be time-consuming when you have a lot of objects on a page. The other way to handle this is by creating a new table and creating a relationship. So, in this example, I have Artists and their Total streams. I am using Artist in my slicer. When I select an Artist in another visual, by default, my measure also gets filters. If I create another table with a distinct Artist use that in my slicer, and use that column to count, and in all other visuals, I use the Artist from a different table; then I get the result I want.1. Created an Artist table2. Create a relationship between my streams table and Artist table3. Update my slicer to get Artist names from Artist table4. Update measure to get Artist name from Artist table now. Cross-filtering doesn’t impact the value I’m showing. I hope this inspires someone out there.

4 comments
2 FacebookTwitterPinterestEmail

In my recent open data project, I created a single page report model with a sparse slicer. It’s a good trick for anyone who wants to make their slicer look a bit sleeker. Like any other visual in Power BI, Slicers also have many properties. By default, below is how slicer looks in Power BI, but I made few changes to make it look like the one on left, in a few steps.

Steps

  • Place the slicer on the report page
  • Go to Items in Properties, add a frame around items
  • Go to General in properties, update outline weight, I prefer anything more than 7 or 8
  •  Change Outline color to your visual background or Report Page background.

That’s all. Here is a video me explaining it – https://youtu.be/MYIW2_c4AAk

And a video of the full report design here – https://youtu.be/MYIW2_c4AAk?t=1

How I designed, Top 20 Drinks #PowerBI Model

0 comment
1 FacebookTwitterPinterestEmail

The inspiration behind this post solely came from WHO website https://www.who.int/. I went onto to check some COVID data and found quick links on the main page quite interesting, hence this post. Using Buttons in Power BI is straight forward, and we see many Power BI report designers using it more often. What I observed, not everyone uses the amazing extra functionalities of Power BI Buttons. In this post, I want to use properties of Buttons, Shapes and try to achieve the look and feel I saw on the WHO website. The look I am going for is below:

 

Let’s see how we can create it in Power BI:

  1. Create a button, follow this article to create a button in Power BI Desktop – https://docs.microsoft.com/en-us/power-bi/create-reports/desktop-buttons
  2. Add Text to Button. For the sake of the blog post, I am just going for Button 1, Button 2, etc.
  3. Add line shape and put it behind the Button, Send it back!
  4. Add Fill to All buttons same as the background colour of the page and no transparency.
  5. Select all buttons, change the fill transparency to 100 % on hover. Now your buttons will look something like below.
  6. Now change the text of each Button on Hover with a prefix of Empty Characters. Yes, empty characters!

https://emptycharacter.com/ is your friend when it comes to empty characters. From here copy an empty character and paste infant of your text in the Button. In this example, I pasted Empty Characters before Button Text on Hover. I went for three empty characters before each Button Text. That’s it, and you will have a subtle animated effect on your buttons.

I am sure someone out there must be thinking, why don’t we add a space before the text, I am sorry but that doesn’t work. Don’t know why! I am sure, Power BI doing something cleaver to not to show empty spaces before the text. Hope someone finds this post inspiring.
PBIX – https://1drv.ms/u/s!Avm7gbgZtlMlyhlTp2zkQxArZsOG?e=Fcd1Lm

Until next time,

Prathy 🙂

6 comments
6 FacebookTwitterPinterestEmail

Power BI Bookmarks, the secret behind many sleek reports, It revolutionised the entire PBI Report design approach. Initially, Bookmarks were portrayed more as some saved views, which can be used for storytelling than for navigation. However, we certainly saw more report designers using Bookmarks for navigation than for storytelling. In March 2020, we have a new functionality called Page navigation,  which brings us to the dilemma of which one to use and when?

#PowerBI Bookmarks vs Page Navigation

Power BI Page Navigation is an action type. Even though many have been using bookmarks for page navigation previously, the new Page Navigation action makes it a lot easier.

Power BI Bookmarks:

With bookmarks in Power BI Desktop, you capture the currently configured view of a report page, including filtering and the state of visuals. Later, you can go back to that state by selecting the saved Bookmark. – https://docs.microsoft.com/en-us/power-bi/desktop-bookmarks

One of the essential factor to remember when creating a Bookmark is “Saved Elements”. As per Microsoft Docs, below elements are saved with Bookmark:

  • The current page
  • Filters
  • Slicers, including slicer type (for example, dropdown or list) and slicer state
  • Visual selection state (such as cross-highlight filters)
  • Sort order
  • Drill location
  • Visibility of an object (by using the Selection pane)
  • The focus or Spotlight modes of any visible object

We can use bookmarks for storytelling by using the View option of Bookmarks. Shapes, Images and Buttons action property can be used to navigate between bookmarks.

When it comes to moving between pages, both functionalities does the job perfectly, but one gives better usability than others. Our scenarios help us to choose the functionalities, and both have their pros and cons.

1. Strictly to move between pages:

We don’t see this use case when you are distributing Power BI Report Model using Power BI APPS in the Power BI Service; in all other distribution methods, when users want to have a navigation pane, with an action to go to a page; in this use case, Page Navigation functionality is the best choice.

PROS
  • Fewer Bookmarks to create and maintain
  • Clean Bookmarks window
CONS

They are limited to buttons only. If users want to use images/shapes, then they need to overlay image/shape with a blank button, then add Page Navigation action.

2. Many Visuals and many selections

I work with this use case more often than I want to. When working with many visuals on a single page and hiding some is so fiddly. When I design, I prefer to use different pages. I used to create a page for each action, followed by a bookmark for each page, hide all pages not needed.  This approach reduces potential human errors, also helps to troubleshoot. So for this approach, again, Page navigation works better.

PROS
  • Less hassle with selections
  • No need to worry about updating Bookmarks
  • Easy to troubleshoot
CONS
  • More Report pages, sometimes duplication of Report pages and content in them
  • Performance of visuals
  • Doesn’t work well with all Report Layouts

3. Different Report Layouts

When you are working with fancy layouts, like a scrollable page, tabbed navigation, pop up windows, etc., Page navigation just doesn’t work.  Even though it’s fiddly, Bookmarks give a seamless user experience.

For example, in below example, when I use page navigation functionality between report pages, every time I land on the page, it takes me back to the top of the page. But using Bookmarks, leave me at the same position on the page.

PROS
  • Seamless interaction with different report layouts
  • No need to load all visuals again
CONS
  • Need to update Bookmarks, every single time there is change
  • Copy and paste doesn’t work great
  • Working with many bookmarks gets fiddly
Using Page Navigation Action

Using Bookmarks

Summary

These are just a few scenarios. Both functionalities give the same user experience. End-user doesn’t know if you are using Bookmark or Page Navigation Action. As much as I love using bookmarks, grouping, selections in Power BI, I also struggle with many questions:

  1. Page navigation is limited to Button only,  if you are anyway using an image or shape, overlaying that with a blank button to navigate to a page. Is it worth to use Page Navigation or just create a bookmark?
  2. When you use many visuals on a single page, is it worth trying to figure out in which Bookmark you hid some shape or worth to create separate pages?
  3. When a visual take some time to load, is it reasonable to duplicate those over separate pages and use Page Navigation action or is it right to use bookmarks? So you don’t have to load time-consuming visuals again when switching between report pages?
  4. Is it worth fiddle to update Bookmark each time you make a change to your report layout or good to have separate pages?
  5. Use Bookmarks grouping or several pages, which one is easier when you have far too many actions to do
  6. Hiding visual does not run a query, which helps performance. So should I just use Bookmarks?

Strictly what is essential to you defines, which way to go. Page Navigation is impressive when I am working with small datasets. It helps a lot with designing and troubleshooting but yeah bookmarks gives a seamless user experience.

Till next time,

Prathy 🙂

3 comments
1 FacebookTwitterPinterestEmail
Newer Posts