top of page

Spark–Data Warehouse Connector in Microsoft Fabric

  • Writer: Ishan Deshpande
    Ishan Deshpande
  • May 18
  • 2 min read

Modern data platforms demand seamless interoperability between large-scale processing engines like Apache Spark and governed analytical storage like Data Warehouses. With Microsoft Fabric, this integration is native, efficient, and secure—made possible by the Spark–Data Warehouse Connector.



What Is the Spark–Data Warehouse Connector?


The Spark–Data Warehouse connector in Microsoft Fabric enables direct access between Spark notebooks and Data Warehouses—allowing you to read from and write to warehouse tables without exporting to Lakehouse or files.


Key Benefits:

  • No intermediate storage required (e.g., Parquet/CSV)

  • High-performance read/write via Fabric’s internal DMS (Data Movement Service)

  • PySpark-native interface

  • Supports enterprise-grade governance with OneLake



Supported Capabilities

Operation

Supported

Notes

Read from Data Warehouse

Using read.synapsesql()

Write to Data Warehouse

Using .write.synapsesql()

Schema Inference

Automatically handled

Table Creation

Table must exist beforehand

Streaming

Not supported

Column mapping

Automatic or customizable


How to Read Data from a Data Warehouse Table


Syntax -

df = spark.read.synapsesql("<warehouse/lakehouse name>.<schema name>.<table or view name>")

Example -

df = spark.read.synapsesql("SalesWarehouse.dbo.SalesOrderDetails")
df.display()

This loads the data directly from the SalesOrderDetails table in the SalesWarehouse Data Warehouse into a Spark DataFrame for further transformations, ML, or exploration.

Note - You must have read access to the workspace and table.



How to Write Data to a Data Warehouse Table


Syntax -

df.write.mode("mode").synapsesql("<warehouse/lakehouse name>.<schema name>.<table name>")

Supported Modes:

  • "overwrite" – Replace existing data

  • "append" – Add new records to existing table

  • "errorifexists" – Fails if table already has data

  • "ignore" - Ignore the error and append data


Note - Tables must already exist in the Data Warehouse with the appropriate schema. Schema mismatches will raise errors.



Security and Access Requirements


The connector uses Microsoft Entra ID (formerly Azure AD) for authentication, and all actions respect Fabric workspace RBAC.


Requirements:

  • The notebook must be in the same workspace as the target Data Warehouse

  • User must have Viewer or higher permissions

  • Table-specific permissions (read/write) apply

  • OneLake and Fabric security models apply


No connection strings or keys are needed—authentication is seamless via Fabric's unified platform.



Known Limitations

Limitation

Details

Table creation not supported

Must pre-create tables with correct schema

No DDL/DML

Use Warehouse SQL endpoint for CREATE/ALTER/DELETE operations

No support for streaming

Cannot use with Spark Structured Streaming

No pushdown filtering

Entire table is loaded into Spark for filtering/joins

Same workspace required

DW and notebook must reside in the same workspace


Conclusion


The Spark–Data Warehouse connector in Microsoft Fabric is a powerful bridge between advanced data transformation (Spark) and enterprise-grade analytics (DW). With simple PySpark syntax and secure, high-performance data exchange:

  • You remove unnecessary data hops

  • Maintain security and governance

  • Empower cross-functional teams with a unified analytics workflow

It’s an essential tool for any data engineer working in Fabric.

bottom of page