Flattening Hierarchical Data: Working with SAP SETNODE and SETLEAF

Flattening SAP Hierarchical Data for PowerBI

Introduction

Every once in a while we have to work with hierarchical data structures. In accounting, we have account hierarchies that looks like below:

ACCT_ID PARENT_ACCT_ID
300000NS TIRPOPIN
310000NS 300000NS
311000NS 310000NS
312000NS 310000NS
320000NS 300000NS
321000NS 320000NS
322000NS 320000NS
322000NS_A
330000PC ERPFCOSA
330000PS ERPFCOSA
331000CG 400000ST
331000CG_A
331000PS 330000PS

In SAP we might create a cost center group like this:

KCA01_HIER
├── KCA01_L010
├── KCA01_N020 (Logistics)
│   ├── KCA01_L021
│   ├── KCA01_L022
│   ├── KCA01_L023
│   └── KCA01_L024
├── KCA01_L030
└── KCA01_N040 (Production)
    ├── KCA01_L041
    ├── KCA01_L042
    ├── KCA01_L043
    ├── KCA01_L044
    └── KCA01_L045

While this may seem like a distinctively different object, the data is stored in SAP in a similar way, only that they are stored in two different tables.

The node-to-node relationship is stored in SETNODE:

CHILD_ID PARENT_ID
KCA01_N020 KCA01_HIER
KCA01_N040 KCA01_HIER

The node-to-leaf relationship is stored in SETLEAF. LEAF means the lowest level in the hierarchy. So in the context of general ledger account groups, it is the posting level account; whereas for cost center groups, the leaf is the individual cost center.

CHILD_ID PARENT_ID
KCA01_L010 KCA01_HIER
KCA01_L021 KCA01_S020
KCA01_L022 KCA01_S020
KCA01_L023 KCA01_S020
KCA01_L024 KCA01_S020
KCA01_L030 KCA01_HIER
KCA01_L041 KCA01_S040
KCA01_L042 KCA01_S040
KCA01_L043 KCA01_S040
KCA01_L044 KCA01_S040
KCA01_L045 KCA01_S040

Above description is a bit over simplified as SAP has other tables related to hierarchies but for the purpose of this blog we are primarily interested in two tables.

The real column names are also different as we will see soon.

What we should also note is that SAP stored leaf information as a range, "From Value" and "To Value". How to handle the range to make it easier for our flattening function to work deserves a separate blog on it own, we will focus on our main task in this article. We will assume that our leafs contain individual values and no range is used (so From Value is the same as To Value). In reality this is ok most of the times and this simplication does not affect the generality of our solution.

It is easy to think that we need to union SETNODE with SETLEAF so that we have a complete set of input data to flatten.

CHILD_ID PARENT_ID
KCA01_L010 KCA01_HIER
KCA01_N020 KCA01_HIER
KCA01_L021 KCA01_S020
KCA01_L022 KCA01_S020
KCA01_L023 KCA01_S020
KCA01_L024 KCA01_S020
KCA01_L030 KCA01_HIER
KCA01_N040 KCA01_HIER
KCA01_L041 KCA01_S040
KCA01_L042 KCA01_S040
KCA01_L043 KCA01_S040
KCA01_L044 KCA01_S040
KCA01_L045 KCA01_S040

Integrating this kind of hierarchical data into tools like PowerBI presents unique challenges, particularly when dealing with variable-depth hierarchies. This post explores a robust solution using Snowflake's Python capabilities to flatten and process this data, enabling seamless visualization in PowerBI.

Understanding SAP SETNODE and SETLEAF Tables

SAP uses two primary tables to store hierarchical relationships:

  • SETNODE: Contains node-level information including set class, organizational unit, set name, and subordinate set IDs. This table represents the structural elements of the hierarchy.
  • SETLEAF: Stores leaf-level details with set class, organizational unit, set name, and value ranges (from_value, to_value). These represent the actual data points at the bottom of the hierarchy.

The relationship between these tables creates a tree structure where SETNODE entries can have multiple children, some of which may be other SETNODE entries or SETLEAF entries.

The same set of tables are used for cost center groups, profit center groups, cost element groups, internal order groups etc.... And they are differentiated by set_class.

The Challenge: Dynamic Hierarchy Depths

The primary challenge arises from the variable depth of these hierarchies. Unlike fixed-depth structures, SAP hierarchies can have anywhere from 2 to 10+ levels depending on the organizational complexity. This variability creates several issues:

  1. PowerBI Limitations: Traditional flattening methods require predefined column structures, making dynamic depths difficult to handle.
  2. Performance Issues: Recursive queries can become inefficient with deep hierarchies.
  3. Data Consistency: Ensuring all paths are properly represented across different depths.
  4. Maintenance Complexity: Changes in hierarchy depth require schema modifications.

Solution: Python-Based Hierarchy Flattening

We have a two-step approach to flatten any hierarchical data.

First we use a Python User-Defined Table Function (UDTF) to dynamically flatten the hierarchy.

The input to this UDTF is a two-column data, mapping a child to a parent, just like the table we saw in the beginning.

The output from this UDTF contains also two columns: parent and child, where child column contains pipe-delimited paths representing the complete traversal from that parent to a leaf.

This solution is versatile and can be applied to any hierarchical data structure, including any relationship represented by SAP SETNODE and SETLEAF tables, such as:

  • Cost element groups
  • Cost center groups
  • Order groups
  • Profit center hierarchies
  • Organizational unit structures

The set_class field in these tables determines the type of hierarchy being represented, making the solution adaptable to various SAP data models.

Before we dive into the details of the flattening function, we have to make a union of the tables, SETNODE and SETLEAF, also explained in the beginning of this article.

Following code shows how to prepare the data as input to the flattening function.

create or replace view INPUT_HIERARCHY as (
    with setnode as (
        select set_class
                ,organizational_unit_as_set_subclass
                ,set_name
                ,subordinate_set_id
        from my_warehouse.my_schema.setnode
    ),
    setleaf as (
        select set_class
                ,organizational_unit_as_set_subclass
                ,set_name
                ,set_line_counter
                ,option_field_in_structure_of_select_options_tables
                ,from_value
                ,to_value
        from my_warehouse.my_schema.setleaf
    ),
    union_all as (
        select set_class
                ,organizational_unit_as_set_subclass
                ,set_name
                ,subordinate_set_id
        from setnode
        union
        select set_class
                ,organizational_unit_as_set_subclass
                ,set_name
                ,from_value as subordinate_set_id
        from setleaf
        union
        select set_class
                ,organizational_unit_as_set_subclass
                ,set_name
                ,to_value as subordinate_set_id
        from setleaf
    )
    select set_class
            ,organizational_unit_as_set_subclass
            ,set_name
            ,subordinate_set_id
    from union_all
);

The Flattening Function in Detail

Here is the complete code for the flattening function.

CREATE OR REPLACE FUNCTION UN5871.MASTERDATA.FLATTEN_COMBINED_FUNC (
    CHILDID VARCHAR,
    PARENTID VARCHAR,
    LEVEL VARCHAR DEFAULT '2'
)
RETURNS TABLE (
    PARENT VARCHAR,
    RESULT VARCHAR
)
LANGUAGE PYTHON
RUNTIME_VERSION = '3.10'
PACKAGES = ('pandas', 'numpy')
HANDLER = 'get_combined'
AS
$$
import pandas

class get_combined:
    """
    Combined hierarchy flattening handler class.
    Supports two modes:
    - level='2': Returns only leaf nodes (like original FLATTEN_FUNC)
    - level='all': Returns full hierarchy paths with uniform depth padding
    """
    
    data = []
    
    def process(self, childid: str, parentid: str, level: str = '2'):
        """Process each input row and store parent-child relationships."""
        self.data.append([parentid, childid, level])
    
    def end_partition(self):
        """
        Process the complete partition and return results based on level parameter.
        """
        # Get level parameter from first row (should be consistent across partition)
        level = self.data[0][2] if self.data else '2'
        
        # Build tree structure: parent -> [list of children]
        tree = {}
        for parent, child, _ in self.data:
            tree.setdefault(parent, []).append(child)

        # ==========================================
        # LEVEL = '2': Return only leaf nodes
        # ==========================================
        if level == '2':
            def acct_tree_desc(parent):
                """Recursively find all leaf descendants of a parent node."""
                children = tree.get(parent, [parent])
                
                # Process children in reverse order for safe list modification
                for child in reversed(children):
                    if child in tree:  # Child has its own children
                        lower_children = acct_tree_desc(child)
                        children.extend(lower_children)
                        children.remove(child)
                return children
            
            result = []
            for parent in set([i[0] for i in self.data]):
                if parent != 'NA':
                    leaf_children = acct_tree_desc(parent)
                    for child in leaf_children:
                        result.append((parent, child))
            return result

        # ==========================================
        # LEVEL = 'all': Return full hierarchy paths
        # ==========================================
        elif level == 'all':
            def build_paths(parent, path):
                """Recursively build all paths from parent to leaf nodes."""
                children = tree.get(parent, [])
                if not children:  # Leaf node reached
                    yield path
                else:
                    for child in children:
                        yield from build_paths(child, path + [child])

            # Collect all paths grouped by parent
            parent_paths = {}
            for parent in set([i[0] for i in self.data]):
                if parent != 'NA':
                    paths_for_parent = []
                    for path in build_paths(parent, [parent]):
                        if len(path) > 1:  # Only include paths with actual hierarchy
                            paths_for_parent.append(path)
                    if paths_for_parent:
                        parent_paths[parent] = paths_for_parent
            
            # Pad paths to uniform depth per parent and format results
            result = []
            for parent, paths in parent_paths.items():
                # Find maximum depth for this specific parent
                max_depth_for_parent = max(len(path) for path in paths)
                
                for path in paths:
                    # Right-pad with leaf value to reach maximum depth
                    padding_needed = max_depth_for_parent - len(path)
                    leaf_value = path[-1]  # Rightmost (leaf) value
                    padded_path = path + [leaf_value] * padding_needed
                    result.append((parent, '|'.join(padded_path)))
            return result
        
        # ==========================================
        # Invalid level parameter
        # ==========================================
        else:
            return [('ERROR', f'Invalid level parameter: {level}. Use "2" or "all"')]

$$;

The FLATTEN_COMBINED_FUNC is a Python UDTF that processes hierarchical data in two modes:

Mode 1: Leaf Nodes Only (level = '2')

def acct_tree_desc(parent):
    """Recursively find all leaf descendants of a parent node."""
    children = tree.get(parent, [parent])
    
    for child in reversed(children):
        if child in tree:
            lower_children = acct_tree_desc(child)
            children.extend(lower_children)
            children.remove(child)
    return children

This mode traverses the hierarchy recursively, collecting only the leaf nodes for each parent. It's efficient for scenarios where only end-level data is needed.

Mode 2: Full Hierarchy Paths (level = 'all')

def build_paths(parent, path):
    """Recursively build all paths from parent to leaf nodes."""
    children = tree.get(parent, [])
    if not children:
        yield path
    else:
        for child in children:
            yield from build_paths(child, path + [child])

The 'all' mode constructs complete paths from each parent to all its leaf descendants. To handle variable depths, it implements intelligent padding:

# Find maximum depth for this specific parent
max_depth_for_parent = max(len(path) for path in paths)

for path in paths:
    # Right-pad with leaf value to reach maximum depth
    padding_needed = max_depth_for_parent - len(path)
    leaf_value = path[-1]
    padded_path = path + [leaf_value] * padding_needed
    result.append((parent, '|'.join(padded_path)))

This ensures all paths for a given parent have uniform length, simplifying downstream processing.

To test run it, run the following code:

create or replace view cost_center_group as (
    with cost_center as (
    select set_class
            ,organizational_unit_as_set_subclass
            ,set_name as parent_id
            ,LTRIM(subordinate_set_id,'0') as child_id
    from INPUT_HIERARCHY
    where organizational_unit_as_set_subclass = 'ORG0'   --replace with your own organizational unit
    and set_class = '0101'  --0101 is for cost center group
    )
    SELECT 
        PARENT
        ,RESULT AS CHILD    
    FROM cost_center,
        TABLE(UN5871.MASTERDATA.FLATTEN_COMBINED_FUNC(CHILD_ID, PARENT_ID, 'all') OVER())    --call the flatten function with all mode
    WHERE PARENT = '1000000000' --we are interested in cost center group with ID 1000000000 as highest node
    ORDER BY PARENT, CHILD
    )

A two-column view cost_center_group is created, which will serve as input to the second step, i.e. a stored procedure to split the child path of dynamical depth.

Path Splitting: Stored Procedure vs. User-Defined Function

While a User-Defined Function (UDF) could split the paths, we chose a Stored Procedure for several reasons:

  1. Dynamic Schema Creation: Stored Procedures can execute dynamic SQL to create tables with variable numbers of columns, which UDFs cannot do.
  2. State Management: Procedures can maintain state across operations and perform multiple SQL statements.
  3. Error Handling: Better error handling and transaction management in procedures.
  4. Performance: For bulk operations, procedures are more efficient than row-by-row UDF processing.

The Path Splitting Stored Procedure

The split_child_column procedure dynamically determines the number of path segments and creates appropriately sized tables:

CREATE OR REPLACE PROCEDURE split_child_column(table_name STRING)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.10'
PACKAGES = ('snowflake-snowpark-python')
HANDLER = 'main'
AS
$$
import snowflake.snowpark as snowpark

def main(session: snowpark.Session, table_name: str) -> str:
    # Get a sample child value to determine number of parts
    sample_df = session.sql(f"SELECT child FROM {table_name} LIMIT 1").collect()
    if not sample_df:
        return "Table is empty."
    
    sample_child = sample_df[0]['CHILD']
    num_parts = len(sample_child.split('|'))
    
    # Build column list
    col_list = ['parent STRING']
    for i in range(1, num_parts + 1):
        col_list.append(f'level{i} STRING')
    columns_str = ', '.join(col_list)
    
    # Create the new table
    create_sql = f"CREATE OR REPLACE TABLE {table_name}_split ({columns_str})"
    session.sql(create_sql).collect()
    
    # Build SELECT list for INSERT
    select_parts = ['parent']
    for i in range(1, num_parts + 1):
        select_parts.append(f"SPLIT_PART(child, '|', {i}) AS level{i}")
    select_str = ', '.join(select_parts)
    insert_cols = ['parent'] + [f'level{i}' for i in range(1, num_parts + 1)]
    insert_cols_str = ', '.join(insert_cols)
    
    # Insert data
    insert_sql = f"INSERT INTO {table_name}_split ({insert_cols_str}) SELECT {select_str} FROM {table_name}"
    session.sql(insert_sql).collect()
    
    return f"Table {table_name}_split created successfully with {num_parts} child columns."
$$;

This procedure:

  1. Samples the data to determine path depth
  2. Dynamically creates a table with the correct number of level columns
  3. Uses SPLIT_PART to extract each segment
  4. Inserts the flattened data

To use it, simply do:

CALL split_child_column('cost_center_group'); --replace cost_center_group with your_table_name. It should be the output from previous python UDTF function

Implementation Example

Consider this simplified hierarchy:

TOP_IRP
├── LEAF_0
├── LEVEL1_A
│   ├── LEAF_1
│   └── LEAF_2
└── LEVEL1_B
    └── LEAF_3

The flattening function (all mode) produces:

PARENT CHILD
TOP_IRP TOP_IRP|LEAF_0|LEAF_0
TOP_IRP TOP_IRP|LEVEL1_A|LEAF_1
TOP_IRP TOP_IRP|LEVEL1_A|LEAF_2
TOP_IRP TOP_IRP|LEVEL1_B|LEAF_3

The splitting procedure creates:

parent level1 level2 level3
TOP_IRP TOP_IRP LEAF_0 LEAF_0
TOP_IRP TOP_IRP LEVEL1_A LEAF_1
TOP_IRP TOP_IRP LEVEL1_A LEAF_2
TOP_IRP TOP_IRP LEVEL1_B LEAF_3

If we choose to run the flattening function with level = '2', it will produce the following result, perfect for quick calculation:

parent level1
TOP_IRP LEAF_0
TOP_IRP LEAF_1
TOP_IRP LEAF_2
TOP_IRP LEAF_3

Conclusion

This approach transforms complex SAP hierarchical data into PowerBI-friendly formats by:

  • Handling dynamic depths through intelligent padding
  • Using Python UDTF for flexible processing
  • Leveraging stored procedures for dynamic schema creation
  • Maintaining data integrity throughout the transformation

The solution scales across different hierarchy complexities and provides a foundation for advanced PowerBI visualizations and analytics. It works with any hierarchical data structure, including the ones defined by SAP SETNODE and SETLEAF tables.


This technical guide demonstrates a production-ready solution for flattening SAP hierarchical data in Snowflake, optimized for PowerBI integration.

Comments

Popular posts from this blog

Building dbt Model Lineage in Python