Flattening Hierarchical Data: Working with SAP SETNODE and SETLEAF
Introduction
Every once in a while we have to work with hierarchical data structures. In accounting, we have account hierarchies that looks like below:
ACCT_ID | PARENT_ACCT_ID |
---|---|
300000NS | TIRPOPIN |
310000NS | 300000NS |
311000NS | 310000NS |
312000NS | 310000NS |
320000NS | 300000NS |
321000NS | 320000NS |
322000NS | 320000NS |
322000NS_A | |
330000PC | ERPFCOSA |
330000PS | ERPFCOSA |
331000CG | 400000ST |
331000CG_A | |
331000PS | 330000PS |
In SAP we might create a cost center group like this:
KCA01_HIER
├── KCA01_L010
├── KCA01_N020 (Logistics)
│ ├── KCA01_L021
│ ├── KCA01_L022
│ ├── KCA01_L023
│ └── KCA01_L024
├── KCA01_L030
└── KCA01_N040 (Production)
├── KCA01_L041
├── KCA01_L042
├── KCA01_L043
├── KCA01_L044
└── KCA01_L045
While this may seem like a distinctively different object, the data is stored in SAP in a similar way, only that they are stored in two different tables.
The node-to-node relationship is stored in SETNODE:
CHILD_ID | PARENT_ID |
---|---|
KCA01_N020 | KCA01_HIER |
KCA01_N040 | KCA01_HIER |
The node-to-leaf relationship is stored in SETLEAF. LEAF means the lowest level in the hierarchy. So in the context of general ledger account groups, it is the posting level account; whereas for cost center groups, the leaf is the individual cost center.
CHILD_ID | PARENT_ID |
---|---|
KCA01_L010 | KCA01_HIER |
KCA01_L021 | KCA01_S020 |
KCA01_L022 | KCA01_S020 |
KCA01_L023 | KCA01_S020 |
KCA01_L024 | KCA01_S020 |
KCA01_L030 | KCA01_HIER |
KCA01_L041 | KCA01_S040 |
KCA01_L042 | KCA01_S040 |
KCA01_L043 | KCA01_S040 |
KCA01_L044 | KCA01_S040 |
KCA01_L045 | KCA01_S040 |
Above description is a bit over simplified as SAP has other tables related to hierarchies but for the purpose of this blog we are primarily interested in two tables.
The real column names are also different as we will see soon.
What we should also note is that SAP stored leaf information as a range, "From Value" and "To Value". How to handle the range to make it easier for our flattening function to work deserves a separate blog on it own, we will focus on our main task in this article. We will assume that our leafs contain individual values and no range is used (so From Value is the same as To Value). In reality this is ok most of the times and this simplication does not affect the generality of our solution.
It is easy to think that we need to union SETNODE with SETLEAF so that we have a complete set of input data to flatten.
CHILD_ID | PARENT_ID |
---|---|
KCA01_L010 | KCA01_HIER |
KCA01_N020 | KCA01_HIER |
KCA01_L021 | KCA01_S020 |
KCA01_L022 | KCA01_S020 |
KCA01_L023 | KCA01_S020 |
KCA01_L024 | KCA01_S020 |
KCA01_L030 | KCA01_HIER |
KCA01_N040 | KCA01_HIER |
KCA01_L041 | KCA01_S040 |
KCA01_L042 | KCA01_S040 |
KCA01_L043 | KCA01_S040 |
KCA01_L044 | KCA01_S040 |
KCA01_L045 | KCA01_S040 |
Integrating this kind of hierarchical data into tools like PowerBI presents unique challenges, particularly when dealing with variable-depth hierarchies. This post explores a robust solution using Snowflake's Python capabilities to flatten and process this data, enabling seamless visualization in PowerBI.
Understanding SAP SETNODE and SETLEAF Tables
SAP uses two primary tables to store hierarchical relationships:
- SETNODE: Contains node-level information including set class, organizational unit, set name, and subordinate set IDs. This table represents the structural elements of the hierarchy.
- SETLEAF: Stores leaf-level details with set class, organizational unit, set name, and value ranges (from_value, to_value). These represent the actual data points at the bottom of the hierarchy.
The relationship between these tables creates a tree structure where SETNODE entries can have multiple children, some of which may be other SETNODE entries or SETLEAF entries.
The same set of tables are used for cost center groups, profit center groups, cost element groups, internal order groups etc.... And they are differentiated by set_class.
The Challenge: Dynamic Hierarchy Depths
The primary challenge arises from the variable depth of these hierarchies. Unlike fixed-depth structures, SAP hierarchies can have anywhere from 2 to 10+ levels depending on the organizational complexity. This variability creates several issues:
- PowerBI Limitations: Traditional flattening methods require predefined column structures, making dynamic depths difficult to handle.
- Performance Issues: Recursive queries can become inefficient with deep hierarchies.
- Data Consistency: Ensuring all paths are properly represented across different depths.
- Maintenance Complexity: Changes in hierarchy depth require schema modifications.
Solution: Python-Based Hierarchy Flattening
We have a two-step approach to flatten any hierarchical data.
First we use a Python User-Defined Table Function (UDTF) to dynamically flatten the hierarchy.
The input to this UDTF is a two-column data, mapping a child to a parent, just like the table we saw in the beginning.
The output from this UDTF contains also two columns: parent
and child
, where child
column contains pipe-delimited paths representing the complete traversal from that parent to a leaf.
This solution is versatile and can be applied to any hierarchical data structure, including any relationship represented by SAP SETNODE and SETLEAF tables, such as:
- Cost element groups
- Cost center groups
- Order groups
- Profit center hierarchies
- Organizational unit structures
The set_class
field in these tables determines the type of hierarchy being represented, making the solution adaptable to various SAP data models.
Before we dive into the details of the flattening function, we have to make a union of the tables, SETNODE and SETLEAF, also explained in the beginning of this article.
Following code shows how to prepare the data as input to the flattening function.
create or replace view INPUT_HIERARCHY as (
with setnode as (
select set_class
,organizational_unit_as_set_subclass
,set_name
,subordinate_set_id
from my_warehouse.my_schema.setnode
),
setleaf as (
select set_class
,organizational_unit_as_set_subclass
,set_name
,set_line_counter
,option_field_in_structure_of_select_options_tables
,from_value
,to_value
from my_warehouse.my_schema.setleaf
),
union_all as (
select set_class
,organizational_unit_as_set_subclass
,set_name
,subordinate_set_id
from setnode
union
select set_class
,organizational_unit_as_set_subclass
,set_name
,from_value as subordinate_set_id
from setleaf
union
select set_class
,organizational_unit_as_set_subclass
,set_name
,to_value as subordinate_set_id
from setleaf
)
select set_class
,organizational_unit_as_set_subclass
,set_name
,subordinate_set_id
from union_all
);
The Flattening Function in Detail
Here is the complete code for the flattening function.
CREATE OR REPLACE FUNCTION UN5871.MASTERDATA.FLATTEN_COMBINED_FUNC (
CHILDID VARCHAR,
PARENTID VARCHAR,
LEVEL VARCHAR DEFAULT '2'
)
RETURNS TABLE (
PARENT VARCHAR,
RESULT VARCHAR
)
LANGUAGE PYTHON
RUNTIME_VERSION = '3.10'
PACKAGES = ('pandas', 'numpy')
HANDLER = 'get_combined'
AS
$$
import pandas
class get_combined:
"""
Combined hierarchy flattening handler class.
Supports two modes:
- level='2': Returns only leaf nodes (like original FLATTEN_FUNC)
- level='all': Returns full hierarchy paths with uniform depth padding
"""
data = []
def process(self, childid: str, parentid: str, level: str = '2'):
"""Process each input row and store parent-child relationships."""
self.data.append([parentid, childid, level])
def end_partition(self):
"""
Process the complete partition and return results based on level parameter.
"""
# Get level parameter from first row (should be consistent across partition)
level = self.data[0][2] if self.data else '2'
# Build tree structure: parent -> [list of children]
tree = {}
for parent, child, _ in self.data:
tree.setdefault(parent, []).append(child)
# ==========================================
# LEVEL = '2': Return only leaf nodes
# ==========================================
if level == '2':
def acct_tree_desc(parent):
"""Recursively find all leaf descendants of a parent node."""
children = tree.get(parent, [parent])
# Process children in reverse order for safe list modification
for child in reversed(children):
if child in tree: # Child has its own children
lower_children = acct_tree_desc(child)
children.extend(lower_children)
children.remove(child)
return children
result = []
for parent in set([i[0] for i in self.data]):
if parent != 'NA':
leaf_children = acct_tree_desc(parent)
for child in leaf_children:
result.append((parent, child))
return result
# ==========================================
# LEVEL = 'all': Return full hierarchy paths
# ==========================================
elif level == 'all':
def build_paths(parent, path):
"""Recursively build all paths from parent to leaf nodes."""
children = tree.get(parent, [])
if not children: # Leaf node reached
yield path
else:
for child in children:
yield from build_paths(child, path + [child])
# Collect all paths grouped by parent
parent_paths = {}
for parent in set([i[0] for i in self.data]):
if parent != 'NA':
paths_for_parent = []
for path in build_paths(parent, [parent]):
if len(path) > 1: # Only include paths with actual hierarchy
paths_for_parent.append(path)
if paths_for_parent:
parent_paths[parent] = paths_for_parent
# Pad paths to uniform depth per parent and format results
result = []
for parent, paths in parent_paths.items():
# Find maximum depth for this specific parent
max_depth_for_parent = max(len(path) for path in paths)
for path in paths:
# Right-pad with leaf value to reach maximum depth
padding_needed = max_depth_for_parent - len(path)
leaf_value = path[-1] # Rightmost (leaf) value
padded_path = path + [leaf_value] * padding_needed
result.append((parent, '|'.join(padded_path)))
return result
# ==========================================
# Invalid level parameter
# ==========================================
else:
return [('ERROR', f'Invalid level parameter: {level}. Use "2" or "all"')]
$$;
The FLATTEN_COMBINED_FUNC
is a Python UDTF that processes hierarchical data in two modes:
Mode 1: Leaf Nodes Only (level = '2'
)
def acct_tree_desc(parent):
"""Recursively find all leaf descendants of a parent node."""
children = tree.get(parent, [parent])
for child in reversed(children):
if child in tree:
lower_children = acct_tree_desc(child)
children.extend(lower_children)
children.remove(child)
return children
This mode traverses the hierarchy recursively, collecting only the leaf nodes for each parent. It's efficient for scenarios where only end-level data is needed.
Mode 2: Full Hierarchy Paths (level = 'all'
)
def build_paths(parent, path):
"""Recursively build all paths from parent to leaf nodes."""
children = tree.get(parent, [])
if not children:
yield path
else:
for child in children:
yield from build_paths(child, path + [child])
The 'all' mode constructs complete paths from each parent to all its leaf descendants. To handle variable depths, it implements intelligent padding:
# Find maximum depth for this specific parent
max_depth_for_parent = max(len(path) for path in paths)
for path in paths:
# Right-pad with leaf value to reach maximum depth
padding_needed = max_depth_for_parent - len(path)
leaf_value = path[-1]
padded_path = path + [leaf_value] * padding_needed
result.append((parent, '|'.join(padded_path)))
This ensures all paths for a given parent have uniform length, simplifying downstream processing.
To test run it, run the following code:
create or replace view cost_center_group as (
with cost_center as (
select set_class
,organizational_unit_as_set_subclass
,set_name as parent_id
,LTRIM(subordinate_set_id,'0') as child_id
from INPUT_HIERARCHY
where organizational_unit_as_set_subclass = 'ORG0' --replace with your own organizational unit
and set_class = '0101' --0101 is for cost center group
)
SELECT
PARENT
,RESULT AS CHILD
FROM cost_center,
TABLE(UN5871.MASTERDATA.FLATTEN_COMBINED_FUNC(CHILD_ID, PARENT_ID, 'all') OVER()) --call the flatten function with all mode
WHERE PARENT = '1000000000' --we are interested in cost center group with ID 1000000000 as highest node
ORDER BY PARENT, CHILD
)
A two-column view cost_center_group is created, which will serve as input to the second step, i.e. a stored procedure to split the child path of dynamical depth.
Path Splitting: Stored Procedure vs. User-Defined Function
While a User-Defined Function (UDF) could split the paths, we chose a Stored Procedure for several reasons:
- Dynamic Schema Creation: Stored Procedures can execute dynamic SQL to create tables with variable numbers of columns, which UDFs cannot do.
- State Management: Procedures can maintain state across operations and perform multiple SQL statements.
- Error Handling: Better error handling and transaction management in procedures.
- Performance: For bulk operations, procedures are more efficient than row-by-row UDF processing.
The Path Splitting Stored Procedure
The split_child_column
procedure dynamically determines the number of path segments and creates appropriately sized tables:
CREATE OR REPLACE PROCEDURE split_child_column(table_name STRING)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.10'
PACKAGES = ('snowflake-snowpark-python')
HANDLER = 'main'
AS
$$
import snowflake.snowpark as snowpark
def main(session: snowpark.Session, table_name: str) -> str:
# Get a sample child value to determine number of parts
sample_df = session.sql(f"SELECT child FROM {table_name} LIMIT 1").collect()
if not sample_df:
return "Table is empty."
sample_child = sample_df[0]['CHILD']
num_parts = len(sample_child.split('|'))
# Build column list
col_list = ['parent STRING']
for i in range(1, num_parts + 1):
col_list.append(f'level{i} STRING')
columns_str = ', '.join(col_list)
# Create the new table
create_sql = f"CREATE OR REPLACE TABLE {table_name}_split ({columns_str})"
session.sql(create_sql).collect()
# Build SELECT list for INSERT
select_parts = ['parent']
for i in range(1, num_parts + 1):
select_parts.append(f"SPLIT_PART(child, '|', {i}) AS level{i}")
select_str = ', '.join(select_parts)
insert_cols = ['parent'] + [f'level{i}' for i in range(1, num_parts + 1)]
insert_cols_str = ', '.join(insert_cols)
# Insert data
insert_sql = f"INSERT INTO {table_name}_split ({insert_cols_str}) SELECT {select_str} FROM {table_name}"
session.sql(insert_sql).collect()
return f"Table {table_name}_split created successfully with {num_parts} child columns."
$$;
This procedure:
- Samples the data to determine path depth
- Dynamically creates a table with the correct number of
level
columns - Uses
SPLIT_PART
to extract each segment - Inserts the flattened data
To use it, simply do:
CALL split_child_column('cost_center_group'); --replace cost_center_group with your_table_name. It should be the output from previous python UDTF function
Implementation Example
Consider this simplified hierarchy:
TOP_IRP
├── LEAF_0
├── LEVEL1_A
│ ├── LEAF_1
│ └── LEAF_2
└── LEVEL1_B
└── LEAF_3
The flattening function (all mode) produces:
PARENT | CHILD |
---|---|
TOP_IRP | TOP_IRP|LEAF_0|LEAF_0 |
TOP_IRP | TOP_IRP|LEVEL1_A|LEAF_1 |
TOP_IRP | TOP_IRP|LEVEL1_A|LEAF_2 |
TOP_IRP | TOP_IRP|LEVEL1_B|LEAF_3 |
The splitting procedure creates:
parent | level1 | level2 | level3 |
---|---|---|---|
TOP_IRP | TOP_IRP | LEAF_0 | LEAF_0 |
TOP_IRP | TOP_IRP | LEVEL1_A | LEAF_1 |
TOP_IRP | TOP_IRP | LEVEL1_A | LEAF_2 |
TOP_IRP | TOP_IRP | LEVEL1_B | LEAF_3 |
If we choose to run the flattening function with level = '2', it will produce the following result, perfect for quick calculation:
parent | level1 |
---|---|
TOP_IRP | LEAF_0 |
TOP_IRP | LEAF_1 |
TOP_IRP | LEAF_2 |
TOP_IRP | LEAF_3 |
Conclusion
This approach transforms complex SAP hierarchical data into PowerBI-friendly formats by:
- Handling dynamic depths through intelligent padding
- Using Python UDTF for flexible processing
- Leveraging stored procedures for dynamic schema creation
- Maintaining data integrity throughout the transformation
The solution scales across different hierarchy complexities and provides a foundation for advanced PowerBI visualizations and analytics. It works with any hierarchical data structure, including the ones defined by SAP SETNODE and SETLEAF tables.
This technical guide demonstrates a production-ready solution for flattening SAP hierarchical data in Snowflake, optimized for PowerBI integration.
Comments
Post a Comment