- Get Tennant ID
- Connect to Tennant for POWER BI
- Call POWER BI REST API, get data and save data in Lake House
- Report using Power BI
If you are new to Microsoft Fabric and Notebooks, checkout this series on MS Docs – How to use notebooks – Microsoft Fabric | Microsoft Learn.
Get Tennant ID
There might be simple ways to get Tennant ID using Python, but the below code works to get Tennant ID of the current Notebook.Â
import re conf_list = spark.sparkContext.getConf().getAll() # filter the configuration parameters to get the driver log URL filtered_list = filter(lambda x: x[0] == 'spark.tracking.driverLogUrl', conf_list) value=next(filtered_list,None) url = value[1] parts = url.split('/') index=parts.index('sparkui') TennantID=parts[index+1] print(TennantID)
Connect to Tennant for Power BI
First, I need an access token, so I use the authority URL to authenticate with my login. This gives me the workspaces I can access, but you can also use Azure APP and Service Principal authentication.
# Import the necessary libraries # ! pip install adal from adal import AuthenticationContext import requests# Set the required parameters TENANT_ID = TennantID CLIENT_ID ='1aea3f97-edc6-4453-a59b-b88b0b803711'# this is Power BI Client Integrations client ID RESOURCE = 'https://analysis.windows.net/powerbi/api'# Set the authority URL and create an authentication context authority_url = f'https://login.microsoftonline.com/{TENANT_ID}' context = AuthenticationContext(authority_url)# Acquire a user code and device code to authenticate the user code_response = context.acquire_user_code(RESOURCE, CLIENT_ID) print(code_response['message'])token_response = context.acquire_token_with_device_code(RESOURCE, code_response, CLIENT_ID) access_token = token_response['accessToken']
The code above uses the Power BI Client Integration Client ID to connect to the service. It prompts me to enter a code to authenticate myself when I run it. This is fine for occasional or manual scripts but not very convenient otherwise.
Another option is to create an app and a service principal in Azure. This lets you authenticate and authorise your app to access Azure resources. To learn how to do this, check out this link –Power BI REST APIs for embedded analytics and automation – Power BI REST API | Microsoft Learn
!pip install azure.identity from azure.identity import ClientSecretCredential import json, requests, pandas as pdtenant_id= TennnatID
client_id = 'Your ClientID' client_secret='Your Client Secret' # Good to not hardcode client secret here, try to see how you can keep in Azure Key Valult safely
ResourceURL= 'https://analysis.windows.net/powerbi/api' APIURL='https://analysis.windows.net/powerbi/api/.default'
auth = ClientSecretCredential(authority = 'https://login.microsoftonline.com/',tenant_id = tenant_id,client_id = client_id,client_secret = client_secret)
access_token = auth.get_token(APIURL) access_token = access_token.token# print(access_token)
Following Call Power BI REST APIs and save them in Lakehouse
In this example, I’m only fetching the workspace ID, the dataset ID and the report ID, but you can use any other endpoint you can access. Just make sure you have the proper permissions and authentication tokens. The code is in Python, but you can use any language that supports HTTP requests.
# Set the base URL and headers for the API call Baseurl = 'https://api.powerbi.com/v1.0/'#myorg/groups' headers = {'Authorization': f'Bearer {access_token}'} # print(headers)# Get a list of all workspaces in the organization response = requests.get(Baseurl+'myorg/groups', headers=headers)if response.status_code == 200: workspaces = response.json()['value'] # get list of all workspaces and their IDs and keep in a dataframe import pandas as pd for workspace in workspaces: # print(workspace['name'], workspace['id']) workspaces_df = pd.DataFrame(workspaces) workspaces_df.head() display(workspaces_df) else: print(f"Error: {response.status_code} - {response.text}")#create a dataframe to hold the list of all workspaces workspaces_df = pd.DataFrame(workspaces) workspaces_df.head()workspaces_df = workspaces_df.fillna('') # replace spaces in columns with _ workspaces_df.columns = workspaces_df.columns.str.replace(' ', '_') workspaces_df.head()#Create a spark dataframe from the Datasets_df workspaces_spark_df = spark.createDataFrame(workspaces_df) spark.sql("DROP TABLE IF EXISTS Workspaces") workspaces_spark_df.write.format("delta").mode("overwrite").saveAsTable("Workspaces") Datasets=[] # Get a list of all datasets in the organization for workspace in workspaces:     # print(workspace['name'], workspace['id'])     response = requests.get(Baseurl+f"myorg/groups/{workspace['id']}/datasets", headers=headers)     if response.status_code == 200:         datasets = response.json()['value']         # get list of all datasets and their IDs and keep in a dataframe         for dataset in datasets:             # print(dataset['name'], dataset['id'])             Datasets.append(dataset)             #add workspace name and ID to the dataset             dataset['Workspace Name'] = workspace['name']             dataset['Workspace ID'] = workspace['id']     else:         print(f"Error: {response.status_code} - {response.text}") # display(Datasets) # Create a dataframe from the Datasets list Datasets_df = pd.DataFrame(Datasets) # select key columns Datasets_df = Datasets_df[['name', 'id', 'Workspace Name', 'Workspace ID','configuredBy','webUrl','createdDate']] # rename name and id to Dataset Name and Dataset ID Datasets_df = Datasets_df.rename(columns={'name':'Dataset Name', 'id':'Dataset ID'}) # replace the null values with empty string Datasets_df = Datasets_df.fillna('') # replace spaces in columns with _  Datasets_df.columns = Datasets_df.columns.str.replace(' ', '_') Datasets_df.head() #Create a spark dataframe from the Datasets_df datasets_spark_df = spark.createDataFrame(Datasets_df) spark.sql("DROP TABLE IF EXISTS Datasets") datasets_spark_df.write.format("delta").mode("overwrite").saveAsTable("Datasets") Reports=[] # Get a list of all reports in the organization for workspace in workspaces: # print(workspace['name'], workspace['id']) response = requests.get(Baseurl+f"myorg/groups/{workspace['id']}/reports", headers=headers) if response.status_code == 200: reports = response.json()['value'] # get list of all reports and their IDs and keep in a dataframe for report in reports: # print(report['name'], report['id']) Reports.append(report) #add workspace name and ID to the report report['Workspace Name'] = workspace['name'] report['Workspace ID'] = workspace['id'] else: print(f"Error: {response.status_code} - {response.text}") # display(Reports) # Create a dataframe from the Reports list Reports_df = pd.DataFrame(Reports) # select key columns Reports_df = Reports_df[['name', 'id', 'Workspace Name', 'Workspace ID','webUrl','datasetId','datasetWorkspaceId']] # rename name and id to Report Name and Report ID Reports_df = Reports_df.rename(columns={'name':'Report Name', 'id':'Report ID'}) # replace the null values with empty string Reports_df = Reports_df.fillna('') # replace spaces in columns with _ Reports_df.columns = Reports_df.columns.str.replace(' ', '_') # Reports_df.head() #Create a spark dataframe from the Reports_df reports_spark_df = spark.createDataFrame(Reports_df) spark.sql("DROP TABLE IF EXISTS Reports") reports_spark_df.write.format("delta").mode("overwrite").saveAsTable("Reports")
Create Report
When I create a Lakehouse in Microsoft Fabric, it will automatically create some default options.
When I click on the ellipsis of two default options, Dataset(Default) or SQL endpoint, there are various ways to analyse data.
Now, you might be overwhelmed with all these different visualisation options available. Each one does serve a different purpose if you know the difference between them. But for my use case, I want a simple list of all workspaces, reports, who created them, configured them, etc. So,
I selected the Dataset(Default) option and clicked Create Paginated Report.
I dragged and dropped the columns I wanted to see in the table, such as workspace name, report name, owner, created date, etc.
That’s it! I now have a simple and handy list of all my workspaces in Microsoft Fabric. I could export, share or distribute this report to further audiences.
This is a simple use case, but it has much potential. Instead of using the default dataset, I could have created some relationships. I could have even used TOM to create a Dataset, relationships, measures, etc. I didn’t have enough time to explore that. Additionally, I wasn’t sure how to do it in Python. Maybe someone can try it and let me know how it goes.Â
I was curious; I started everything with embedded API and how it could read data from different sources in Power BI. I wanted to document Workspaces, Datasets and Reports with all the details with Embedded API. But, I encountered a problem: “Embedding a DirectLake dataset is not supported with V1 embed token“. Maybe this will change, and then we can do amazing things. We could document various artifacts in a workspace, create reports and publish them in Power BI App—an intelligent way to document. For now, I used REST operations.
Another option is to use the Admin API and scanner API, start a trace and get the dataset metadata, document measures, etc. I just wanted to show these notebooks open doors to many amazing things. 🙂
Until next time,
Prathy 🙂