Home Microsoft Fabric Document #PowerBI Workspaces with #MicrosoftFabric #Notebooks

Document #PowerBI Workspaces with #MicrosoftFabric #Notebooks

by Prathy Kamasani
If you are a consultant like me, you know how hard it can be to access Power BI Admin API or Service Principal. Sometimes, you need to see all the workspaces you have permission for and what’s inside them. Well, I found with MS Fabric, we can use notebooks and achieve it with a few steps:
  1. Get Tennant ID
  2. Connect to Tennant for POWER BI
  3. Call POWER BI REST API, get data and save data in Lake House
  4. Report using Power BI

If you are new to Microsoft Fabric and Notebooks, checkout this series on MS Docs – How to use notebooks – Microsoft Fabric | Microsoft Learn.

Get Tennant ID

There might be simple ways to get Tennant ID using Python, but the below code works to get Tennant ID of the current Notebook. 

import re
conf_list = spark.sparkContext.getConf().getAll()

# filter the configuration parameters to get the driver log URL
filtered_list = filter(lambda x: x[0] == 'spark.tracking.driverLogUrl', conf_list)
value=next(filtered_list,None)
url = value[1]

parts = url.split('/')
index=parts.index('sparkui') 
TennantID=parts[index+1]
print(TennantID)

Connect to Tennant for Power BI

First, I need an access token, so I use the authority URL to authenticate with my login. This gives me the workspaces I can access, but you can also use Azure APP and Service Principal authentication.

# Import the necessary libraries
# ! pip install adal
from adal import AuthenticationContext
import requests# Set the required parameters
TENANT_ID = TennantID
CLIENT_ID ='1aea3f97-edc6-4453-a59b-b88b0b803711'# this is Power BI Client Integrations client ID
RESOURCE = 'https://analysis.windows.net/powerbi/api'# Set the authority URL and create an authentication context
authority_url = f'https://login.microsoftonline.com/{TENANT_ID}'
context = AuthenticationContext(authority_url)# Acquire a user code and device code to authenticate the user
code_response = context.acquire_user_code(RESOURCE, CLIENT_ID)
print(code_response['message'])token_response = context.acquire_token_with_device_code(RESOURCE, code_response, CLIENT_ID)
access_token = token_response['accessToken']

The code above uses the Power BI Client Integration Client ID to connect to the service. It prompts me to enter a code to authenticate myself when I run it. This is fine for occasional or manual scripts but not very convenient otherwise.

Another option is to create an app and a service principal in Azure. This lets you authenticate and authorise your app to access Azure resources. To learn how to do this, check out this link –Power BI REST APIs for embedded analytics and automation – Power BI REST API | Microsoft Learn

!pip install azure.identity
from azure.identity import ClientSecretCredential
import json, requests, pandas as pdtenant_id= TennnatID
client_id = 'Your ClientID' client_secret='Your Client Secret' # Good to not hardcode client secret here, try to see how you can keep in Azure Key Valult safely
ResourceURL= 'https://analysis.windows.net/powerbi/api' APIURL='https://analysis.windows.net/powerbi/api/.default'
auth = ClientSecretCredential(authority = 'https://login.microsoftonline.com/',tenant_id = tenant_id,client_id = client_id,client_secret = client_secret)
access_token = auth.get_token(APIURL) access_token = access_token.token# print(access_token)

Following Call Power BI REST APIs and save them in Lakehouse

In this example, I’m only fetching the workspace ID, the dataset ID and the report ID, but you can use any other endpoint you can access. Just make sure you have the proper permissions and authentication tokens. The code is in Python, but you can use any language that supports HTTP requests.

# Set the base URL and headers for the API call
Baseurl = 'https://api.powerbi.com/v1.0/'#myorg/groups'
headers = {'Authorization': f'Bearer {access_token}'}
# print(headers)# Get a list of all workspaces in the organization
response = requests.get(Baseurl+'myorg/groups', headers=headers)if response.status_code == 200:
workspaces = response.json()['value']
# get list of all workspaces and their IDs and keep in a dataframe
import pandas as pd
for workspace in workspaces:
# print(workspace['name'], workspace['id'])
workspaces_df = pd.DataFrame(workspaces)
workspaces_df.head()
display(workspaces_df)
else:
print(f"Error: {response.status_code} - {response.text}")#create a dataframe to hold the list of all workspaces
workspaces_df = pd.DataFrame(workspaces)
workspaces_df.head()workspaces_df = workspaces_df.fillna('')
# replace spaces in columns with _
workspaces_df.columns = workspaces_df.columns.str.replace(' ', '_')
workspaces_df.head()#Create a spark dataframe from the Datasets_df
workspaces_spark_df = spark.createDataFrame(workspaces_df)
spark.sql("DROP TABLE IF EXISTS Workspaces")
workspaces_spark_df.write.format("delta").mode("overwrite").saveAsTable("Workspaces")

Datasets=[]
# Get a list of all datasets in the organization
for workspace in workspaces:
    # print(workspace['name'], workspace['id'])
    response = requests.get(Baseurl+f"myorg/groups/{workspace['id']}/datasets", headers=headers)
    if response.status_code == 200:
        datasets = response.json()['value']
        # get list of all datasets and their IDs and keep in a dataframe
        for dataset in datasets:
            # print(dataset['name'], dataset['id'])
            Datasets.append(dataset)
            #add workspace name and ID to the dataset
            dataset['Workspace Name'] = workspace['name']
            dataset['Workspace ID'] = workspace['id']


    else:
        print(f"Error: {response.status_code} - {response.text}")
# display(Datasets)


# Create a dataframe from the Datasets list
Datasets_df = pd.DataFrame(Datasets)
# select key columns
Datasets_df = Datasets_df[['name', 'id', 'Workspace Name', 'Workspace ID','configuredBy','webUrl','createdDate']]
# rename name and id to Dataset Name and Dataset ID
Datasets_df = Datasets_df.rename(columns={'name':'Dataset Name', 'id':'Dataset ID'})
# replace the null values with empty string
Datasets_df = Datasets_df.fillna('')
# replace spaces in columns with _  
Datasets_df.columns = Datasets_df.columns.str.replace(' ', '_')
Datasets_df.head()


#Create a spark dataframe from the Datasets_df
datasets_spark_df = spark.createDataFrame(Datasets_df)
spark.sql("DROP TABLE IF EXISTS Datasets")
datasets_spark_df.write.format("delta").mode("overwrite").saveAsTable("Datasets")
Reports=[]
# Get a list of all reports in the organization
for workspace in workspaces:
# print(workspace['name'], workspace['id'])
response = requests.get(Baseurl+f"myorg/groups/{workspace['id']}/reports", headers=headers)
if response.status_code == 200:
reports = response.json()['value']
# get list of all reports and their IDs and keep in a dataframe
for report in reports:
# print(report['name'], report['id'])
Reports.append(report)
#add workspace name and ID to the report
report['Workspace Name'] = workspace['name']
report['Workspace ID'] = workspace['id']


else:
print(f"Error: {response.status_code} - {response.text}")
# display(Reports)


# Create a dataframe from the Reports list
Reports_df = pd.DataFrame(Reports)
# select key columns
Reports_df = Reports_df[['name', 'id', 'Workspace Name', 'Workspace ID','webUrl','datasetId','datasetWorkspaceId']]
# rename name and id to Report Name and Report ID
Reports_df = Reports_df.rename(columns={'name':'Report Name', 'id':'Report ID'})
# replace the null values with empty string
Reports_df = Reports_df.fillna('')
# replace spaces in columns with _
Reports_df.columns = Reports_df.columns.str.replace(' ', '_')
# Reports_df.head()


#Create a spark dataframe from the Reports_df
reports_spark_df = spark.createDataFrame(Reports_df)
spark.sql("DROP TABLE IF EXISTS Reports")
reports_spark_df.write.format("delta").mode("overwrite").saveAsTable("Reports")
 
 In the code snippet above, I started by defining the Base URL to add the REST API URLs to it later. Then, I used the response status to handle any possible API issues. After that, I stored the response in the panda’s data frame and converted it to a spark data frame before saving it in the Lakehouse as a delta table. I also removed spaces from the column names because Spark tables don’t allow them. This is the same pattern that I followed for all the other APIs. 
 

Create Report

When I create a Lakehouse in Microsoft Fabric, it will automatically create some default options.

When I click on the ellipsis of two default options, Dataset(Default) or SQL endpoint, there are various ways to analyse data.

Now, you might be overwhelmed with all these different visualisation options available. Each one does serve a different purpose if you know the difference between them. But for my use case, I want a simple list of all workspaces, reports, who created them, configured them, etc. So,

I selected the Dataset(Default) option and clicked Create Paginated Report.

I dragged and dropped the columns I wanted to see in the table, such as workspace name, report name, owner, created date, etc.

That’s it! I now have a simple and handy list of all my workspaces in Microsoft Fabric. I could export, share or distribute this report to further audiences.


This is a simple use case, but it has much potential. Instead of using the default dataset, I could have created some relationships. I could have even used TOM to create a Dataset, relationships, measures, etc. I didn’t have enough time to explore that. Additionally, I wasn’t sure how to do it in Python. Maybe someone can try it and let me know how it goes. 

I was curious; I started everything with embedded API and how it could read data from different sources in Power BI. I wanted to document Workspaces, Datasets and Reports with all the details with Embedded API. But, I encountered a problem: “Embedding a DirectLake dataset is not supported with V1 embed token“. Maybe this will change, and then we can do amazing things. We could document various artifacts in a workspace, create reports and publish them in Power BI App—an intelligent way to document. For now, I used REST operations.

Another option is to use the Admin API and scanner API, start a trace and get the dataset metadata, document measures, etc. I just wanted to show these notebooks open doors to many amazing things. 🙂

Until next time,

Prathy 🙂

You may also like

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.