Spark–Data Warehouse Connector in Microsoft Fabric

Ishan Deshpande
May 18
2 min read

Modern data platforms demand seamless interoperability between large-scale processing engines like Apache Spark and governed analytical storage like Data Warehouses. With Microsoft Fabric, this integration is native, efficient, and secure—made possible by the Spark–Data Warehouse Connector.

What Is the Spark–Data Warehouse Connector?

The Spark–Data Warehouse connector in Microsoft Fabric enables direct access between Spark notebooks and Data Warehouses—allowing you to read from and write to warehouse tables without exporting to Lakehouse or files.

Key Benefits:

No intermediate storage required (e.g., Parquet/CSV)
High-performance read/write via Fabric’s internal DMS (Data Movement Service)
PySpark-native interface
Supports enterprise-grade governance with OneLake

Supported Capabilities

Operation	Supported	Notes
Read from Data Warehouse	✅	Using read.synapsesql()
Write to Data Warehouse	✅	Using .write.synapsesql()
Schema Inference	✅	Automatically handled
Table Creation	❌	Table must exist beforehand
Streaming	❌	Not supported
Column mapping	✅	Automatic or customizable

How to Read Data from a Data Warehouse Table

Syntax -

df = spark.read.synapsesql("<warehouse/lakehouse name>.<schema name>.<table or view name>")

Example -

df = spark.read.synapsesql("SalesWarehouse.dbo.SalesOrderDetails")
df.display()

This loads the data directly from the SalesOrderDetails table in the SalesWarehouse Data Warehouse into a Spark DataFrame for further transformations, ML, or exploration.

Note - You must have read access to the workspace and table.

How to Write Data to a Data Warehouse Table

Syntax -

df.write.mode("mode").synapsesql("<warehouse/lakehouse name>.<schema name>.<table name>")

Supported Modes:

"overwrite" – Replace existing data
"append" – Add new records to existing table
"errorifexists" – Fails if table already has data
"ignore" - Ignore the error and append data

Note - Tables must already exist in the Data Warehouse with the appropriate schema. Schema mismatches will raise errors.

Security and Access Requirements

The connector uses Microsoft Entra ID (formerly Azure AD) for authentication, and all actions respect Fabric workspace RBAC.

Requirements:

The notebook must be in the same workspace as the target Data Warehouse
User must have Viewer or higher permissions
Table-specific permissions (read/write) apply
OneLake and Fabric security models apply

No connection strings or keys are needed—authentication is seamless via Fabric's unified platform.

Known Limitations

Limitation	Details
Table creation not supported	Must pre-create tables with correct schema
No DDL/DML	Use Warehouse SQL endpoint for CREATE/ALTER/DELETE operations
No support for streaming	Cannot use with Spark Structured Streaming
No pushdown filtering	Entire table is loaded into Spark for filtering/joins
Same workspace required	DW and notebook must reside in the same workspace

Conclusion

The Spark–Data Warehouse connector in Microsoft Fabric is a powerful bridge between advanced data transformation (Spark) and enterprise-grade analytics (DW). With simple PySpark syntax and secure, high-performance data exchange:

You remove unnecessary data hops
Maintain security and governance
Empower cross-functional teams with a unified analytics workflow

It’s an essential tool for any data engineer working in Fabric.