You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the Apache Beam Python SDK with the Dataflow runner. Our pipeline is designed to periodically fetch lookup data and use it to enrich the main processing pipeline. The main pipeline is configured as a session window, while the lookup side-input is provided using Periodic Impulse with a fixed window of 2 hours.
Currently, every time we run the pipeline, it executes for approximately 45 minutes before stopping at the enrich ParDo step, where the side input is passed using AsMultiMap.
SanjayPanda
changed the title
[Bug]: Periodic Impulse with fixed window getting stuck after sometime when used side input to a session window main pipeline
[Bug]: Periodic Impulse with Fixed Window Stalls When Used as a Side Input in a Session Window Pipeline
Feb 18, 2025
Also, to check whether Session Window and Fixed Window side inputs work, you probably can test it by removing windowing. I do not see why it shouldn't work.
I tried processing the data without windowing, but it still gets stalled at Enrich ParDo. Using triggers with a custom window seems to work but consumes too many resources.
Here is screenshots of where it stall. after this step no data is flowing.
This is the data freshness for this step,
Here is the example of one periodic impulse step
I have around few 21 sources/ each under 30MB data fairly static for 24hr. Do you think the enrich way will be better suite or side input?
What happened?
I am using the Apache Beam Python SDK with the Dataflow runner. Our pipeline is designed to periodically fetch lookup data and use it to enrich the main processing pipeline. The main pipeline is configured as a session window, while the lookup side-input is provided using Periodic Impulse with a fixed window of 2 hours.
Currently, every time we run the pipeline, it executes for approximately 45 minutes before stopping at the enrich ParDo step, where the side input is passed using AsMultiMap.
I investigated the issue and found two Stack Overflow discussions that describe similar behaviour:
google cloud dataflow - Apache beam blocked on unbounded side input - Stack Overflow
python - Apache Beam Cloud Dataflow Streaming Stuck Side Input - Stack Overflow
Would appreciate any insights or suggestions on resolving this problem.

Here is the data freshness graph

The Enrich Pardo stopped processing until next pulse triggered.
Questions:
Appreciate your help and support. Thanks 😊
I attempted to email using the provided email ID ('[email protected]'), but it didn't work, so I decided to create this issue instead.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: