Write Data to Unattached Lakehouses with Fabric Notebooks

by Prathy Kamasani

It’s one of those blog posts I write for my own reference as I often find myself needing this code. I hope it helps someone out there too 🙂

Regardless of which architecture we follow, during stages of data integration and transformation there’s always a step to move data from one location to another. And, we work with multiple tables, schemas, and even lake houses.Same goes with Fabric Notebooks. I often find myself in scenarios where I don’t want to attach Lakehouse to my notebook, but I do want to read or write data from various bakehouses.

I recently blogged about a way to achieve this as part of documenting your workspaces. In that post, I described how to write data to a workspace that was not attached to the notebook. I used MsSparkUtil(renamed to NotebookUtils) to mount and then write data in the Lakehouse as Delta tables.

# Define Lakehouse name and description. This will the LH where all documentation will be saved
LH_Name = "LH_Fabric_Documentation"
LH_desc = "Lakehouse for Fabric Documentation"

# Mount the Lakehouse for direct file system access
lakehouse = mssparkutils.lakehouse.get(LH_Name)
mssparkutils.fs.mount(lakehouse.get("properties").get("abfsPath"), f"/{LH_Name}")

# Retrieve and store local and ABFS paths of the mounted Lakehouse
local_path = mssparkutils.fs.getMountPath(f"/{LH_Name}")
lh_abfs_path = lakehouse.get("properties").get("abfsPath")
print(f'Local path: {local_path}')
print(f'ABFS path: {lh_abfs_path}')
-----
-----

Table_Name = "XXXX"
  spark_df_Fabric_all_items_type.write.format("delta").option("mergeSchema", "true").mode("append").save(f"{lh_abfs_path}/Tables/{Table_Name}")
    print(Table_Name, "created at :", f"{lh_abfs_path}/Tables/{Table_Name}")

In the above scenario, my Lakehouse and Notebooks were in the same workspace, and my Lakehouse was not enabled with schemas. I got errors when I tried the same approach when Lakehouse was in a different workspace. However, Aitor blogged in an elegant way as part of his blog post called “Databricks and Fabric — Writing to OneLake and ADLS Gen2,” which discusses how to read and write data with OneLake. We can use the same approach Aitor was using in the blog post.

workspace_id = "<workspace_id>"
lakehouse_id = "<lakehouse_id>"

# write data to Lakehouse
table_name = "XXXX"
df.write.format("delta").mode("overwrite").save(f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Tables/"{table_name})

And if your Lakehouse is equipped with Lakehouse Schemas, you can also include schema names. Here is a working example with sample data.

# Sample DataFrame creation
data = [("John", 28), ("Jane", 34), ("Mike", 45)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)

# Display the first 5 rows of the DataFrame
df.show(5)

# Define workspace and lakehouse IDs
workspace_id = "xxxxxxx"
lakehouse_id = "xxxxxxx"

# Define the table name with Lakehouse Schema Name
table_name = "test.schemaTable"

# Write the DataFrame to the Lakehouse as Delta tables
df.write.format("delta").mode("overwrite").save(f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Tables/{table_name}")

You may also like

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More