It’s one of those blog posts I write for my own reference as I often find myself needing this code. I hope it helps someone out there too 🙂
Regardless of which architecture we follow, during stages of data integration and transformation there’s always a step to move data from one location to another. And, we work with multiple tables, schemas, and even lake houses.Same goes with Fabric Notebooks. I often find myself in scenarios where I don’t want to attach Lakehouse to my notebook, but I do want to read or write data from various bakehouses.
I recently blogged about a way to achieve this as part of documenting your workspaces. In that post, I described how to write data to a workspace that was not attached to the notebook. I used MsSparkUtil
(renamed to NotebookUtils) to mount and then write data in the Lakehouse as Delta tables.
# Define Lakehouse name and description. This will the LH where all documentation will be saved
LH_Name = "LH_Fabric_Documentation"
LH_desc = "Lakehouse for Fabric Documentation"
# Mount the Lakehouse for direct file system access
lakehouse = mssparkutils.lakehouse.get(LH_Name)
mssparkutils.fs.mount(lakehouse.get("properties").get("abfsPath"), f"/{LH_Name}")
# Retrieve and store local and ABFS paths of the mounted Lakehouse
local_path = mssparkutils.fs.getMountPath(f"/{LH_Name}")
lh_abfs_path = lakehouse.get("properties").get("abfsPath")
print(f'Local path: {local_path}')
print(f'ABFS path: {lh_abfs_path}')
-----
-----
Table_Name = "XXXX"
spark_df_Fabric_all_items_type.write.format("delta").option("mergeSchema", "true").mode("append").save(f"{lh_abfs_path}/Tables/{Table_Name}")
print(Table_Name, "created at :", f"{lh_abfs_path}/Tables/{Table_Name}")
In the above scenario, my Lakehouse and Notebooks were in the same workspace, and my Lakehouse was not enabled with schemas. I got errors when I tried the same approach when Lakehouse was in a different workspace. However, Aitor blogged in an elegant way as part of his blog post called “Databricks and Fabric — Writing to OneLake and ADLS Gen2,” which discusses how to read and write data with OneLake. We can use the same approach Aitor was using in the blog post.
workspace_id = "<workspace_id>"
lakehouse_id = "<lakehouse_id>"
# write data to Lakehouse
table_name = "XXXX"
df.write.format("delta").mode("overwrite").save(f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Tables/"{table_name})
And if your Lakehouse is equipped with Lakehouse Schemas, you can also include schema names. Here is a working example with sample data.
# Sample DataFrame creation
data = [("John", 28), ("Jane", 34), ("Mike", 45)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)
# Display the first 5 rows of the DataFrame
df.show(5)
# Define workspace and lakehouse IDs
workspace_id = "xxxxxxx"
lakehouse_id = "xxxxxxx"
# Define the table name with Lakehouse Schema Name
table_name = "test.schemaTable"
# Write the DataFrame to the Lakehouse as Delta tables
df.write.format("delta").mode("overwrite").save(f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Tables/{table_name}")