Category:

Microsoft Fabric

#MicrosoftFabric Road Map and Web Scraping with Fabric

by Prathy Kamasani 27 November 2023

written by Prathy Kamasani

Like many I am also playing with Fabric, many of my clients are also excited about Fabric and want to know more about it. Being a solution architect in the consulting world one of the most common questions I get asked is: “When certain features will be available, Where are they in the roadmap?”. That’s what sparked the idea of scraping the Microsoft Fabric Roadmap and creating this Power BI report. It is based on a Direct Lake connection, so it has been a bit temperamental.

So, what did I do it? If you are not interested in the whole story. Here is Python code you can run to get a road map. If you are interested in my process carry on reading 🙂

Data Integration

I am not proficient in Python but I have done web scraping with Power BI many times. So I know my way around web scraping a bit. However, I started with ChatGapt, doing a rough skeleton using Python.

url = "https://learn.microsoft.com/en-us/fabric/release-plan"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='main')
# print(results.prettify())

The above script gives me the entire page. I am obviously not interested in the entire page. I went to developer tools and figured out snippets I am interested in.

It looked like, I need all Hrefs and then I can filter out only those I am interested in.

# get all hrefs
hrefs = results.find_all('a', href=True)

# keep all href name and link in a list
href_list = []
for href in hrefs:
    href_list.append([href.text, href['href']])

# add a column name to the list with full url
href_list = pd.DataFrame(href_list, columns=['name', 'url'])

# remove rows containing https  
href_list = href_list[~href_list['url'].str.contains("https")]

# add a new column with full url    
href_list['full_url'] = 'https://learn.microsoft.com/en-us/fabric/release-plan/' + href_list['url']
href_list.loc[href_list['name'] == 'Power BI', 'full_url'] = 'https://learn.microsoft.com/en-us/power-platform/release-plan/2023wave2/power-bi/planned-features'

I was able to extract the hrefs from the web page using the above code snippet. It was a bit messy, but I managed to clean it up and get the list I wanted. Now, I have to iterate over the links and scrape the data from each page. The pages have different formats and layouts, so I need to adjust my code accordingly. I decided to try chat GPT for some help, and AI suggested using Pd_readhtml to read the tables from the HTML pages.

table_list = []
# loop through href_list, check if it has tables and if it does, append to table_list
for index, row in href_list.iterrows():
    table = pd.read_html(row['full_url'], extract_links="all")
    table[0]['product'] = row['name']
    table[0]['FullURL'] = row['full_url']
    table_list.append(table[0])

Pandas_df = pd.concat(table_list, ignore_index=True)
Pandas_df.columns = ['Feature', 'Estimated_release_timeline', 'Product','FullURL', 'Enabled_for', 'Public_Preview','GA']
Pandas_df = Pandas_df.astype({'Feature': 'string', 'Estimated_release_timeline': 'string', 'Product': 'string', 'Enabled_for': 'string', 'Public_Preview': 'string', 'GA': 'string'})

spark_df = spark.createDataFrame(Pandas_df)

# display(spark_df)

I decided to make some improvements to the data. I added some extra columns to identify the products more easily, I converted the data types to string for consistency, and I gave meaningful names to the columns, avoiding any spaces that could cause problems later.

Next, I applied some more transformations with Spark. I learned that Spark Data Frame is much faster than Pandas, so I switched to Spark DataFrame.

Script for all transformations

from pyspark.sql.functions import split, col
spark_df = spark_df.withColumn('Feature_Name', split(col('Feature'), '\',')[0])
spark_df = spark_df.withColumn('Feature_URL', split(col('Feature'), '\',')[1])
# display(spark_df)

spark_df = spark_df.withColumn('Feature_Name', regexp_replace('Feature_Name', '\(\'', ''))
spark_df = spark_df.withColumn('Feature_Name', regexp_replace('Feature_Name', '\'', ''))
spark_df = spark_df.withColumn('Feature_URL', regexp_replace('Feature_URL', '\'\)', ''))
spark_df = spark_df.withColumn('Feature_URL', regexp_replace('Feature_URL', '\'', ''))
spark_df = spark_df.withColumn('Feature_URL', regexp_replace('Feature_URL', ' ', ''))
# display(spark_df)

# spark_df = spark_df.withColumn('Feature_URL', concat(col('FullURL'), col('Feature_URL')))
spark_df = spark_df.withColumn('Feature_URL', when(col('Product') == 'Power BI'
    , concat(lit('https://learn.microsoft.com/en-us/power-platform/release-plan/2023wave2/power-bi/'), col('Feature_URL')))
    .otherwise(concat(col('FullURL'), col('Feature_URL'))))

spark_df = spark_df.withColumn('Estimated_release_timeline', split(col('Estimated_release_timeline'), ',')[0])
spark_df = spark_df.withColumn('Estimated_release_timeline', regexp_replace('Estimated_release_timeline', '\(\'', ''))
spark_df = spark_df.withColumn('Estimated_release_timeline', regexp_replace('Estimated_release_timeline', '\'', ''))
spark_df = spark_df.drop('Feature')
# display(spark_df)
spark_df = spark_df.withColumn('Enabled_for', split(col('Enabled_for'), '\',')[0])
spark_df = spark_df.withColumn('Enabled_for', regexp_replace('Enabled_for', '\(\'', ''))
spark_df = spark_df.withColumn('Enabled_for', regexp_replace('Enabled_for', '\'', ''))
spark_df = spark_df.withColumn('Enabled_for', regexp_replace('Enabled_for', '\)', ''))

#split GA column into two columns
spark_df = spark_df.withColumn('GA', split(col('GA'), ',')[0])
spark_df = spark_df.withColumn('GA', regexp_replace('GA', '\(\'', ''))
spark_df = spark_df.withColumn('GA', regexp_replace('GA', '\'', ''))

#split Public_Preview column into two columns
spark_df = spark_df.withColumn('Public_Preview', split(col('Public_Preview'), ',')[0])
spark_df = spark_df.withColumn('Public_Preview', regexp_replace('Public_Preview', '\(\'', ''))
spark_df = spark_df.withColumn('Public_Preview', regexp_replace('Public_Preview', '\'', ''))

# spark_df.show()  
# display(spark_df)

Table_Name = 'Fabric_Release_Plan'
# print(Table_Name)
spark_df.write.format("delta").option("mergeSchema", "true").mode("overwrite").saveAsTable(Table_Name)
print('Table ' + Table_Name + ' has been created successfully')

Finally, inserted all data into the Spark Delta Table

Table_Name = 'Fabric_Release_Plan'
# print(Table_Name)
spark_df.write.format("delta").option("mergeSchema", "true").mode("overwrite").saveAsTable(Table_Name)
print('Table ' + Table_Name + ' has been created successfully')

Then, I went to get descriptions of each item as well and displayed them in the tooltips of my report. This was similar to the rest of the code, but more fun and challenging.

Script to get Descriptions

url = "https://learn.microsoft.com/en-us/fabric/release-plan"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='main')

# get all hrefs
hrefs = results.find_all('a', href=True)

# keep all href name and link in a list
href_list = []
for href in hrefs:
    href_list.append([href.text, href['href']])

# add a column name to the list with full url
href_list = pd.DataFrame(href_list, columns=['name', 'url'])

# remove rows containing https  
href_list = href_list[~href_list['url'].str.contains("https")]

# add a new column with full url    
href_list['full_url'] = 'https://learn.microsoft.com/en-us/fabric/release-plan/' + href_list['url']
href_list.loc[href_list['name'] == 'Power BI', 'full_url'] = 'https://learn.microsoft.com/en-us/power-platform/release-plan/2023wave2/power-bi/planned-features'

Desc_List=[]
table_list = []
# loop through href_list, check if it has tables and if it does, append to table_list
for index, row in href_list.iterrows():
    
    # also get all descriptions
    url = row['full_url']
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    results = soup.find(id='main')

    h3 = results.find_all('h3')
    h3_ids = []
    for h in h3:
        h3_ids.append(h.get('id'))
        h3_ids.append(h.get_text())
        h3_ids.append(h.find_next('p').get_text())
        h3_ids.append(h.find_next('p').find_next('p').get_text())
        # print(h3_ids)
        # transpose list
    h3_ids = [h3_ids[i:i+4] for i in range(0, len(h3_ids), 4)]
    h3_ids = pd.DataFrame(h3_ids, columns=['id', 'name', 'timeline', 'full_description'])
    h3_ids['url'] = url
    # add product name
    h3_ids['product'] = row['name']
    # add full url with concat of url and id
    h3_ids['full_url'] = url + '#' + h3_ids['id']
    Desc_List.append(h3_ids)

# concat all desc_list dataframes
Desc_List = pd.concat(Desc_List)
# display(Desc_List)

spark_df = spark.createDataFrame(Desc_List)
# spark_df.show()

Table_Name = 'Fabric_Release_Plan_with_desc'
# print(Table_Name)
spark_df.write.format("delta").option("mergeSchema", "true").mode("overwrite").saveAsTable(Table_Name)
print('Table ' + Table_Name + ' has been created successfully')

Next, I ran a simple script to get the time my notebook refreshed to display the last refresh date and time on the report.

# get notebook run time and save in a table
import time
from datetime import datetime
from pyspark.sql.functions import lit
import pandas as pd

now = datetime.now()
current_time = now.strftime("%Y-%m-%d %H:%M:%S")

df = pd.DataFrame([current_time], columns=['Time'])
spark_df=spark.createDataFrame(df)
spark_df.write.format("delta").option("mergeSchema", "true").mode("overwrite").saveAsTable("Notebook_runtime")

Modelling

Modelling was pretty easy. I created relationships using the Sql Endpoint relationships view.

Then I chose which tables to show in my default semantic model.

This lineage view explains the flow.

Reporting

I created a simple Power BI report on the default semantic model using the Power BI service. I published to the web and you can view the report here – https://app.powerbi.com/view?r=eyJrIjoiOTYxMmM3NjYtMGIxZS00MDFmLTljOWMtYTYyMzBmYjA3ZGJjIiwidCI6IjM4YmU3MjU5LTMyMGQtNDZkYy04YzNiLWNkZDA5OTM1NGExMyIsImMiOjh9

Until next time,
Prathy 🙂

27 November 2023 0 comment

Document #PowerBI Workspaces with #MicrosoftFabric #Notebooks

by Prathy Kamasani 11 September 2023

written by Prathy Kamasani

If you are a consultant like me, you know how hard it can be to access Power BI Admin API or Service Principal. Sometimes, you need to see all the workspaces you have permission for and what’s inside them. Well, I found with MS Fabric, we can use notebooks and achieve it with a few steps:

Get Tennant ID
Connect to Tennant for POWER BI
Call POWER BI REST API, get data and save data in Lake House
Report using Power BI

If you are new to Microsoft Fabric and Notebooks, checkout this series on MS Docs – How to use notebooks – Microsoft Fabric | Microsoft Learn.

Get Tennant ID

There might be simple ways to get Tennant ID using Python, but the below code works to get Tennant ID of the current Notebook.

import re
conf_list = spark.sparkContext.getConf().getAll()

# filter the configuration parameters to get the driver log URL
filtered_list = filter(lambda x: x[0] == 'spark.tracking.driverLogUrl', conf_list)
value=next(filtered_list,None)
url = value[1]

parts = url.split('/')
index=parts.index('sparkui') 
TennantID=parts[index+1]
print(TennantID)

Connect to Tennant for Power BI

First, I need an access token, so I use the authority URL to authenticate with my login. This gives me the workspaces I can access, but you can also use Azure APP and Service Principal authentication.

# Import the necessary libraries
# ! pip install adal
from adal import AuthenticationContext
import requests# Set the required parameters
TENANT_ID = TennantID
CLIENT_ID ='1aea3f97-edc6-4453-a59b-b88b0b803711'# this is Power BI Client Integrations client ID
RESOURCE = 'https://analysis.windows.net/powerbi/api'# Set the authority URL and create an authentication context
authority_url = f'https://login.microsoftonline.com/{TENANT_ID}'
context = AuthenticationContext(authority_url)# Acquire a user code and device code to authenticate the user
code_response = context.acquire_user_code(RESOURCE, CLIENT_ID)
print(code_response['message'])token_response = context.acquire_token_with_device_code(RESOURCE, code_response, CLIENT_ID)
access_token = token_response['accessToken']

The code above uses the Power BI Client Integration Client ID to connect to the service. It prompts me to enter a code to authenticate myself when I run it. This is fine for occasional or manual scripts but not very convenient otherwise.

Another option is to create an app and a service principal in Azure. This lets you authenticate and authorise your app to access Azure resources. To learn how to do this, check out this link –Power BI REST APIs for embedded analytics and automation – Power BI REST API | Microsoft Learn

!pip install azure.identity
from azure.identity import ClientSecretCredential
import json, requests, pandas as pdtenant_id= TennnatID

client_id = 'Your ClientID'
client_secret='Your Client Secret' # Good to not hardcode client secret here, try to see how you can keep in Azure Key Valult safely
ResourceURL= 'https://analysis.windows.net/powerbi/api'
APIURL='https://analysis.windows.net/powerbi/api/.default'
auth = ClientSecretCredential(authority = 'https://login.microsoftonline.com/',tenant_id = tenant_id,client_id = client_id,client_secret = client_secret)
access_token = auth.get_token(APIURL)
access_token = access_token.token# print(access_token)

Following Call Power BI REST APIs and save them in Lakehouse

In this example, I’m only fetching the workspace ID, the dataset ID and the report ID, but you can use any other endpoint you can access. Just make sure you have the proper permissions and authentication tokens. The code is in Python, but you can use any language that supports HTTP requests.

# Set the base URL and headers for the API call
Baseurl = 'https://api.powerbi.com/v1.0/'#myorg/groups'
headers = {'Authorization': f'Bearer {access_token}'}
# print(headers)# Get a list of all workspaces in the organization
response = requests.get(Baseurl+'myorg/groups', headers=headers)if response.status_code == 200:
workspaces = response.json()['value']
# get list of all workspaces and their IDs and keep in a dataframe
import pandas as pd
for workspace in workspaces:
# print(workspace['name'], workspace['id'])
workspaces_df = pd.DataFrame(workspaces)
workspaces_df.head()
display(workspaces_df)
else:
print(f"Error: {response.status_code} - {response.text}")#create a dataframe to hold the list of all workspaces
workspaces_df = pd.DataFrame(workspaces)
workspaces_df.head()workspaces_df = workspaces_df.fillna('')
# replace spaces in columns with _
workspaces_df.columns = workspaces_df.columns.str.replace(' ', '_')
workspaces_df.head()#Create a spark dataframe from the Datasets_df
workspaces_spark_df = spark.createDataFrame(workspaces_df)
spark.sql("DROP TABLE IF EXISTS Workspaces")
workspaces_spark_df.write.format("delta").mode("overwrite").saveAsTable("Workspaces")

Datasets=[]
# Get a list of all datasets in the organization
for workspace in workspaces:
    # print(workspace['name'], workspace['id'])
    response = requests.get(Baseurl+f"myorg/groups/{workspace['id']}/datasets", headers=headers)
    if response.status_code == 200:
        datasets = response.json()['value']
        # get list of all datasets and their IDs and keep in a dataframe
        for dataset in datasets:
            # print(dataset['name'], dataset['id'])
            Datasets.append(dataset)
            #add workspace name and ID to the dataset
            dataset['Workspace Name'] = workspace['name']
            dataset['Workspace ID'] = workspace['id']


    else:
        print(f"Error: {response.status_code} - {response.text}")
# display(Datasets)


# Create a dataframe from the Datasets list
Datasets_df = pd.DataFrame(Datasets)
# select key columns
Datasets_df = Datasets_df[['name', 'id', 'Workspace Name', 'Workspace ID','configuredBy','webUrl','createdDate']]
# rename name and id to Dataset Name and Dataset ID
Datasets_df = Datasets_df.rename(columns={'name':'Dataset Name', 'id':'Dataset ID'})
# replace the null values with empty string
Datasets_df = Datasets_df.fillna('')
# replace spaces in columns with _  
Datasets_df.columns = Datasets_df.columns.str.replace(' ', '_')
Datasets_df.head()


#Create a spark dataframe from the Datasets_df
datasets_spark_df = spark.createDataFrame(Datasets_df)
spark.sql("DROP TABLE IF EXISTS Datasets")
datasets_spark_df.write.format("delta").mode("overwrite").saveAsTable("Datasets")
Reports=[]
# Get a list of all reports in the organization
for workspace in workspaces:
# print(workspace['name'], workspace['id'])
response = requests.get(Baseurl+f"myorg/groups/{workspace['id']}/reports", headers=headers)
if response.status_code == 200:
reports = response.json()['value']
# get list of all reports and their IDs and keep in a dataframe
for report in reports:
# print(report['name'], report['id'])
Reports.append(report)
#add workspace name and ID to the report
report['Workspace Name'] = workspace['name']
report['Workspace ID'] = workspace['id']


else:
print(f"Error: {response.status_code} - {response.text}")
# display(Reports)


# Create a dataframe from the Reports list
Reports_df = pd.DataFrame(Reports)
# select key columns
Reports_df = Reports_df[['name', 'id', 'Workspace Name', 'Workspace ID','webUrl','datasetId','datasetWorkspaceId']]
# rename name and id to Report Name and Report ID
Reports_df = Reports_df.rename(columns={'name':'Report Name', 'id':'Report ID'})
# replace the null values with empty string
Reports_df = Reports_df.fillna('')
# replace spaces in columns with _
Reports_df.columns = Reports_df.columns.str.replace(' ', '_')
# Reports_df.head()


#Create a spark dataframe from the Reports_df
reports_spark_df = spark.createDataFrame(Reports_df)
spark.sql("DROP TABLE IF EXISTS Reports")
reports_spark_df.write.format("delta").mode("overwrite").saveAsTable("Reports")

In the code snippet above, I started by defining the Base URL to add the REST API URLs to it later. Then, I used the response status to handle any possible API issues. After that, I stored the response in the panda’s data frame and converted it to a spark data frame before saving it in the Lakehouse as a delta table. I also removed spaces from the column names because Spark tables don’t allow them. This is the same pattern that I followed for all the other APIs.

Create Report

When I create a Lakehouse in Microsoft Fabric, it will automatically create some default options.

When I click on the ellipsis of two default options, Dataset(Default) or SQL endpoint, there are various ways to analyse data.

Now, you might be overwhelmed with all these different visualisation options available. Each one does serve a different purpose if you know the difference between them. But for my use case, I want a simple list of all workspaces, reports, who created them, configured them, etc. So,

I selected the Dataset(Default) option and clicked Create Paginated Report.

I dragged and dropped the columns I wanted to see in the table, such as workspace name, report name, owner, created date, etc.

That’s it! I now have a simple and handy list of all my workspaces in Microsoft Fabric. I could export, share or distribute this report to further audiences.

This is a simple use case, but it has much potential. Instead of using the default dataset, I could have created some relationships. I could have even used TOM to create a Dataset, relationships, measures, etc. I didn’t have enough time to explore that. Additionally, I wasn’t sure how to do it in Python. Maybe someone can try it and let me know how it goes.

I was curious; I started everything with embedded API and how it could read data from different sources in Power BI. I wanted to document Workspaces, Datasets and Reports with all the details with Embedded API. But, I encountered a problem: “Embedding a DirectLake dataset is not supported with V1 embed token“. Maybe this will change, and then we can do amazing things. We could document various artifacts in a workspace, create reports and publish them in Power BI App—an intelligent way to document. For now, I used REST operations.

Another option is to use the Admin API and scanner API, start a trace and get the dataset metadata, document measures, etc. I just wanted to show these notebooks open doors to many amazing things. 🙂

Until next time,

Prathy 🙂

11 September 2023 0 comment

Microsoft Fabric

What is #MicrosoftFabric and why am I excited about it?

by Prathy Kamasani 24 May 2023

written by Prathy Kamasani

I had been following the hints and clues on Twitter, where some Power BI team members were changing their job titles to Synapse. I was curious and hopeful about what they were working on, but I also didn’t want Power BI to lose momentum. When I saw the leadership changes, I felt reassured that whatever they did would be amazing.

Fast forward, at Build, Microsoft announced the long-awaited Microsoft Fabric. What is it? It’s a SaaS Data Fabric platform with a common data lake called OneLake. It’s a one-stop shop for all the analytics needs of every enterprise. With fabric, there is no need to stitch together services from multiple vendors and all data is saved as delta parquet files. No more vendor lock-in or proprietary data formats!

The data Fabric concept is not new; as per datanami, Noel Yuhanna has been at the forefront of Data Fabric. In his words, “A data fabric is essentially an abstraction layer that links a disparate collection of data tools that address key pain points in big data projects. A data fabric solution should deliver capabilities in the areas of data access, discovery, transformation, integration, security, governance, lineage, and orchestration. It should also provide self-service capabilities, as well as some graph capabilities to identify connected data.

By providing a way to bring these data management capabilities to bear on data spanning all these silos, a data fabric can help alleviate core data management challenges holding companies back from higher-level data use cases, including advanced analytics and AI in the cloud.”

Microsoft Fabric takes this concept to the next level by integrating it with Power BI, Microsoft Office and Fluent UI frameworks, Integration with Office and Power BI makes the time and steps to data insights very flexible. Fluent UI provides a consistent and beautiful user experience across all artefacts. I think Microsoft Fabric is the ultimate solution for anyone who wants to leverage the power of data in the cloud. Whether you prefer a centralized or distributed approach, Microsoft Fabric can handle it all. You don’t have to change your existing skills, and you can use the tool or language you are comfortable with. There is plenty of information on MS Docs, and I am sure many people will be talking about it. I want to take my take on Microsoft Fabric from an ex-Microsoft SQL Server BI developer.

You see, I have a long history with Microsoft SQL Server and BI. I used to work with SSIS, SSRS and SSAS to create amazing data solutions. I loved how I could integrate, transform and analyze data using these tools. I felt like I had everything I needed in one place: SQL Server. Ever since, no matter what tool I worked with, I still somehow try to connect the dots back to MSBI. This might be because it helps me to understand the topic better or not to get intimidated by changing world.

Fabric sounds game changer, but as an ex-MS SQL BI developer, I find it familiar. In my older days, I accumulated data in SQL Server databases. I made ETL or ELT using SQL-stored procedures and my first love SSIS. All my data integration tasks were mainly done using SSIS but sometimes with a few bits of C# and hardcore SQL Procedures. I used SSAS to create dimensional models; all calculations were traditionally created by MDX and then DAX.

I created beautiful SSRS reports with cool features like data-driven parameters, conditional formatting, look-up sets. I kept my end users on top of their KPIs using SSRS subscriptions. Not only end users, hardcore IT guys knew when the server was hitting limits when jobs were failing, even some occasional happy birthday emails using SQL Server Task Manager and sometimes with a bit of help from the Windows Server task manager. Whatever I did, I kind of moved, transformed and analysed all the data on my SQL Server. There were occasional intruders like Oracle, My Sql etc but I was faithful to SQL Server.

But times have changed, and so has the data landscape. Now we have more data sources, more data types, and more data challenges. We need to be able to handle big data, streaming data, unstructured data and more. We need to be able to scale, secure and optimize our data pipelines. We need to be able to deliver insights faster and easier.

That’s where Microsoft Fabric comes in. It’s a platform that lets me connect all these different data sources and destinations. If I have an existing data platform, nicely formatted and all; I used to use linked servers, now I would use Shortcuts on Fabric. Back then, used to access data files with a mapped network drive on Server. Now I would probably mount my storage on Fabric. Microsoft Fabric lets me connect all kinds of data sources and destinations. Whether I have a well-structured data platform or a bunch of files on a server, I can easily access them with Shortcuts and Mounts on Fabric.

For Data Integration, I used a combination of SSIS and SQL Server functionalities like stored procedures and functions. Now on Fabric, I can use Data Pipelines and Data Flows Gen-II to create visual and intuitive workflows for my data transformations. Or I can use Notebooks to write code in any language I want. I think I will be mostly using Pipelines and Data flows Gen-II. I am a more visual person.

I haven’t played enough, but it looks like, I can create multiple data flows and call those data flows from Data Factory pipelines. That would be really useful for scenarios like For Each Loop.

Next comes methodologies or frameworks of data storage. The proven approach then was ETL – Extract Raw data from operational data sources, Transform and store in a Transactional layer, then Load into a Data warehouse; in a presentable format for further data analysis. I think the old ETL is now Medallion Lakehouse architecture. For larger data warehouses we used to create Databases schemas or data marts for each department or domain, and now that’s domain-based data mesh architecture.

Microsoft Fabric supports any data storage framework or architecture I need. I can use Workspaces and Domains to organize my data into different stages and domains. I can follow the Medallion Lakehouse approach or create custom data marts for each department or domain.

Next data analytics, in the past, we had three options for creating an analytics layer: SQL Views, SQL DW and SSAS Model. The choice depended on many factors, such as the urgency, the use case, the budget and the developer’s skills. But now, with Fabric, which gives us more flexibility and power. We can use gold-level lakehouse tables for fast and reliable data access, or we can build a Warehouse with custom measures and calculations, or we can create a Power BI Dataset for interactive visualization.

In my opinion, the priority would be time to insights and performance. Using the benefits of DeltaLake and one copy would be my highest priority. But again, there are best practices and client scenarios. It’s not always easy to follow a recommended framework.

But that’s not all. Analytics is not just about making nice reports with actionable insights. We are in the age of AI. We want those insights to reach the end users in the most effective way possible. In the past, I used SQL Task Manager and Windows Task Manager to automate things. I used SSRS and Excel for further analysis. I relied on support teams for ad hoc report requests. In the modern era, different export options, and self-service BI is not enough. We need scalable platforms with no propriety locks to use the final output for further analysis. Gold-level data is not an end. We may want to analyse the final dataset created by the self-service end user. We may want to add pixie dust of Open AI on that final data frame created by the citizen developer. That’s where Fabric is perfect. I am no longer limited to the final stage of Load in my ETL framework or Gold layer of Medallion Lakehouse architecture. We are emerging into the world of creating Platinum and diamond layers.

Microsoft Fabric, there are so many elements I don’t know yet. There are these whole data science bits, I don’t understand yet. But as an ex-MS SQL BI developer, I find it familiar. It reminds me of how I used to work with SQL Server and BI tools. It’s like a modern version of them. It’s like a new way of doing old things. It’s time to bring back Data Platforms and analytics platforms together. Together in Cloud 🙂

What do you think? Are you excited about Microsoft Fabric? Do you have any questions or comments? Let me know in the comments section below.

Until next time, Prathy 🙂

24 May 2023 0 comment