20-08-2024.WorkflowEngine RFC #89
Pinned
JaktensTid
started this conversation in
RFCs
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
UPDATE
Prefect was chosen as a solution
Workflow overview
NoLabs is an open-source platform designed for collaborative bioinformatics research and AI model inference. A key feature of NoLabs is its workflow engine, which allows the creation and orchestration of Python components into directed acyclic graphs (DAGs). These components and inner component jobs can run parallel jobs on distributed clusters, such as Kubernetes (k8s), utilizing containerized environments.
Workflow engine RFC
Workflow Engine Features
Component Management:
Jobs Management:
Example Workflow:
.fasta
files from the Proteins List component and generatesN
folding jobs (N = number of files
).K
jobs (K = number of protein files * number of ligand files
) to perform molecule-protein binding.Requirements
Functional Requirements
Input/Output Mapping:
Job Parallelization:
Components and jobs Execution and Monitoring:
Job Execution Types:
Inter-Component Communication:
Non-Functional Requirements
Suggested Approach
Execution Framework
Custom Workflow Engine
Purpose: Acts as a facade for Airflow operators, validating mappings and ensuring input/output correctness.
Task Modules:
pyproject.toml
Dockerfile
for creating the task environment (including Apache Airflow Celery worker).ExecuteJobOperator
from the main project.Pros:
Cons:
Component Classes
Each component consists of three operators. Component operators must inherit these three classes and override
async def execute_async
function that contains code.SetupOperator:
ExecuteJobOperator:
ExecuteJobOperator
and overridesexecute_async
.Inter-Component Communication
Module Structure
Beta Was this translation helpful? Give feedback.
All reactions