Search Icon

Dataflows in SAP Data Warehouse Cloud

09 March 2021

banner image

SAP Data Warehouse Cloud (DWC) is a cloud-based data warehousing solution that combines both efficient data management and advanced analytics.

Dataflows have been introduced in SAP DWC as an easy-to-use data modeling experience for ETL requirements. It allows us to load and combine structured and semi-structured data from different data sources (SAP and non-SAP) like cloud file storage, database management systems (DBMS), or SAP S/4HANA and assists with standard data transformation capabilities and scripting for advanced requirements.

Dataflow builder architecture

The Dataflow builder leverages parts of the HANA Cloud-powered SAP Data Intelligence Cloud (DIC). The Data Warehouse cloud is built on top of the HANA cloud and the SAP Data Intelligence cloud is embedded into the DWC in the form of the Data Flow Builder and offers ETL functionalities. When the Dataflow builder is being used in the DWC, it uses a dedicated subset of the Data Intelligence cloud functionality. i.e., on triggering a Dataflow execution, there is a Data Intelligence pipeline generated in a side-by-side Data Intelligence cluster.

SAP Data Warehouse Cloud Dataflows


Figure 1: SAP HANA cloud services (Refer link)

Data Views vs Dataflows

How is a Dataflow different from a Data view? This is detailed in the table below.

DATA VIEW BUILDER

DATA FLOW

The main aim of the Data view builder is Data federation

The main aim of Dataflows is to persist data

Data outside the DWC is made accessible as one integrated dataset

Enables working with large data sources like datalakes, where federation would cause slow response times

Supports Graphical and SQL builder views and a standard set of data transformations

Supports a Graphical view and standard set of transformations; Also provides Python scripting functionality

Supports connections that in-turn support federation, real-time replication, or momentary data snapshots

Draws from a richer network of connections, including non-SAP sources, cloud file storage, or APIs

Single output structure, in an inherited form. The target will also be federated

Results come in multiple, definable output structures - you can choose to add/replace the data in an existing table or create a new output. The target will be persisted

One strategy is to use the Data view builder and Dataflows in a way that they complement each other – using Dataflows to move data from multiple sources to DWC and then, using the view builder to build quick insights.

Messer Webinar

Data operations in Dataflows

Dataflows offer several standard data operations (similar to those available in the Data view builder) which can be used to model data, such as Unions, Joins, Projections, Filters, Aggregations. One major advantage of Dataflows in DWC is that it includes a ‘Script’ operator which can be used to perform more advanced transformations in Python.

SAP Data Warehouse Cloud Dataflows


Figure 2: Data operators in DWC – marked in red (left to right – Join, Union, Projection, Aggregation, and Script)

Python scripting in Dataflows

The Script operator currently runs on Python 3.6.x. It allows for data manipulations and vector operations in Python by providing support for NumPy and Pandas modules. NumPy and Pandas functions can be referenced by aliases np and pd directly within the transform function without any explicit imports.

The incoming data is fed into the data parameter of the transform function of the Script node. It is accessible within the function as a pandas DataFrame for further data transformations. The return from this function is sent to the output.

It is important to note that the returning DataFrame from the transform function has the same column namesand types as specified in the output schema of the operator. Otherwise, the execution results in a failure.

NOTE: The operator is executed in sandbox mode; accessing the file system or network and importing other Python modules is restricted, as much as building classes and using coroutines. Restricted Pandas and NumPy functions are listed in the help section (in the Properties pane of the Script node). Updates if any to the Python scripting documentation are also added here.

Dataflow execution with Python scripting will be discussed in detail in an upcoming blog.

Further reading:

Related Blogs

Clean core blog banner

18 June 2025

Why Clean Core for your Journey to RISE and AI

A group of round wooden circles with black people icons

16 May 2025

Roles and Authorization – The Often-Neglected Aspect of a S/4HANA Migration Journey

Celerite Assessment Webinar for S/4Hana Migration

25 April 2025

Why the Right Assessment is Key to a Successful S/4HANA Migration

applexus-runway-approach-blog-banner

21 November 2024

Runway Approach: Revolutionizing Your S/4HANA Journey

Blog-retail-supply-chain-process.webp

11 November 2024

How SAP S/4HANA is Transforming Grocery Retail

bods-vs-slt-vs-sdi-blog_banner

04 September 2024

Seamlessly Unifying Data: Applexus Perspective on Data Integration and Management Tools

sap-data-integration-banner

26 August 2024

Efficient SAP Data Integration Using Ingestion Frameworks

ams banner.webp

08 August 2024

Mastering the Transition: What to Expect from Your AMS Provider Post S/4HANA Migration