Spark–Data Warehouse Connector in Microsoft Fabric
- Ishan Deshpande
- May 18
- 2 min read

Modern data platforms demand seamless interoperability between large-scale processing engines like Apache Spark and governed analytical storage like Data Warehouses. With Microsoft Fabric, this integration is native, efficient, and secure—made possible by the Spark–Data Warehouse Connector.
What Is the Spark–Data Warehouse Connector?
The Spark–Data Warehouse connector in Microsoft Fabric enables direct access between Spark notebooks and Data Warehouses—allowing you to read from and write to warehouse tables without exporting to Lakehouse or files.
Key Benefits:
No intermediate storage required (e.g., Parquet/CSV)
High-performance read/write via Fabric’s internal DMS (Data Movement Service)
PySpark-native interface
Supports enterprise-grade governance with OneLake
Supported Capabilities
Operation | Supported | Notes |
Read from Data Warehouse | ✅ | Using read.synapsesql() |
Write to Data Warehouse | ✅ | Using .write.synapsesql() |
Schema Inference | ✅ | Automatically handled |
Table Creation | ❌ | Table must exist beforehand |
Streaming | ❌ | Not supported |
Column mapping | ✅ | Automatic or customizable |
How to Read Data from a Data Warehouse Table
Syntax -
df = spark.read.synapsesql("<warehouse/lakehouse name>.<schema name>.<table or view name>")
Example -
df = spark.read.synapsesql("SalesWarehouse.dbo.SalesOrderDetails")
df.display()
This loads the data directly from the SalesOrderDetails table in the SalesWarehouse Data Warehouse into a Spark DataFrame for further transformations, ML, or exploration.
Note - You must have read access to the workspace and table.
How to Write Data to a Data Warehouse Table
Syntax -
df.write.mode("mode").synapsesql("<warehouse/lakehouse name>.<schema name>.<table name>")
Supported Modes:
"overwrite" – Replace existing data
"append" – Add new records to existing table
"errorifexists" – Fails if table already has data
"ignore" - Ignore the error and append data
Note - Tables must already exist in the Data Warehouse with the appropriate schema. Schema mismatches will raise errors.
Security and Access Requirements
The connector uses Microsoft Entra ID (formerly Azure AD) for authentication, and all actions respect Fabric workspace RBAC.
Requirements:
The notebook must be in the same workspace as the target Data Warehouse
User must have Viewer or higher permissions
Table-specific permissions (read/write) apply
OneLake and Fabric security models apply
No connection strings or keys are needed—authentication is seamless via Fabric's unified platform.
Known Limitations
Limitation | Details |
Table creation not supported | Must pre-create tables with correct schema |
No DDL/DML | Use Warehouse SQL endpoint for CREATE/ALTER/DELETE operations |
No support for streaming | Cannot use with Spark Structured Streaming |
No pushdown filtering | Entire table is loaded into Spark for filtering/joins |
Same workspace required | DW and notebook must reside in the same workspace |
Conclusion
The Spark–Data Warehouse connector in Microsoft Fabric is a powerful bridge between advanced data transformation (Spark) and enterprise-grade analytics (DW). With simple PySpark syntax and secure, high-performance data exchange:
You remove unnecessary data hops
Maintain security and governance
Empower cross-functional teams with a unified analytics workflow
It’s an essential tool for any data engineer working in Fabric.