Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime Parameters in Kedro Not Working as Expected #4437

Open
fpiedrah opened this issue Jan 22, 2025 · 5 comments
Open

Runtime Parameters in Kedro Not Working as Expected #4437

fpiedrah opened this issue Jan 22, 2025 · 5 comments
Labels
Community Issue/PR opened by the open-source community

Comments

@fpiedrah
Copy link

Description

Runtime parameters in Kedro are not being recognized when passed via --params in the CLI.

Context

I am trying to pass runtime parameters for model_name and model_identifier, with default values specified in a global file. However, Kedro does not recognize the runtime parameters and always falls back to the global defaults. If only the runtime parameter is provided, Kedro returns an error.

My parameters.yml file:

model:
  name: "${runtime_params:model_name}"
  identifier: "${runtime_params:model_identifier}"

When running:

kedro run --params model_name=llama,model_identifier=meta-llama/Llama-3.1-8

I get the error:

InterpolationResolutionError: Runtime parameter 'model_name' not found and no default value provided.

This prevents overriding these values dynamically at runtime.

Steps to Reproduce

  1. Define parameters.yml with runtime parameters:

    model:
      name: "${runtime_params:model_name}"
      identifier: "${runtime_params:model_identifier}"
  2. Run the pipeline with:

    kedro run --params model_name=llama,model_identifier=meta-llama/Llama-3.1-8
  3. Get the following error:

    InterpolationResolutionError: Runtime parameter 'model_name' not found and no default value provided.
    

Expected Result

Runtime parameters should be injected when passed via --params in the CLI.

Actual Result

InterpolationResolutionError: Runtime parameter 'model_name' not found and no default value provided.

Your Environment

  • Conda version: 24.9.2
  • Kedro version (kedro -V): 0.19.10
  • Python version (python -V): 3.12.7
  • Operating system: macOS Sonoma

I also made a change by commenting out parts of the settings.py file. While I don't believe this is related, I’m including my settings.py file:

from pathlib import Path  # noqa: E402
from kedro.config import OmegaConfigLoader  # noqa: E402

CONF_SOURCE = "configuration"
CONFIG_LOADER_CLASS = OmegaConfigLoader

CONFIG_LOADER_ARGS = {
    "base_env": "base",
    "default_run_env": "local",
}
@merelcht merelcht added the Community Issue/PR opened by the open-source community label Jan 22, 2025
@github-project-automation github-project-automation bot moved this to Wizard inbox in Kedro Wizard 🪄 Jan 22, 2025
@Galileo-Galilei
Copy link
Member

I think, but we should confirm with @ankatiyar (If I remember correctly she was the one who implemented runtime_params resolver) that this specific resolver does not work inside parameters.yml but only inside catalog.yml. I don't remember the rationale and hopefully someone can step in to explain, but I understand this behaviour is confusing.

I also tag @astrojuanlu as I think this is part of another discussion about unifying globals /runtime / "normal" parameters and clarifying what is allowed or not (preferably before 0.20 😉)

@ankatiyar
Copy link
Contributor

Hi, sorry I missed the tag earlier. So the runtime_params: resolver should work in the parameters.yml file. The only place it doesn't work is in globals.yml but the error message should say something like : UnsupportedInterpolationType: The runtime_params: resolver is not supported for globals.

I've tried to reproduce the error with the information mentioned above and the runtime parameters work for me as expected. Have you been able to resolve this issue in the mean time @fpiedrah? I would like to see the full stacktrace if it's possible otherwise?

@ankatiyar ankatiyar moved this from Wizard inbox to Needs more info in Kedro Wizard 🪄 Feb 3, 2025
@fpiedrah
Copy link
Author

Sorry for the delay—I went on a bit of a deep dive trying to track down the source of this weird behavior, and it took longer than expected.

@ankatiyar, you’re absolutely right. If I start with an empty project and use the same example I provided, everything works as expected. However, the issue arises when I try to load parameters from code as described in the documentation.

Here’s how I’m loading the parameters:

First, I add the following to {{project_name}}/src/{{project_name}}/__init__.py:

import os

CODE_SOURCE = os.path.dirname(os.path.abspath(__file__))
PROJECT_SOURCE = os.path.join(CODE_SOURCE, "../../")

BASE_ENVIRONMENT = "base"

__version__ = "0.1"

Then, in one of my pipelines, I load the parameters like this:

import os

from kedro.config import OmegaConfigLoader
from kedro.framework.project import settings
from kedro.pipeline import Pipeline, pipeline

from instructple import BASE_ENVIRONMENT, PROJECT_SOURCE
from .nodes import ...

base_pipeline = pipeline(
    [...]
)

def build_pipeline(model_parameters: dict) -> Pipeline:
    return pipeline(
        [base_pipeline],
        namespace="pipeline",
        inputs={...},
        outputs={...},
        parameters={...},
    )

def create_pipeline(**kwargs) -> Pipeline:
    configuration = OmegaConfigLoader(
        base_env=BASE_ENVIRONMENT,
        conf_source=os.path.join(PROJECT_SOURCE, settings.CONF_SOURCE),
    )["parameters"]

    model_parameters = configuration["model"]
    
    return build_pipeline(model_parameters)

The problem happens when I use OmegaConfigLoader this way—I start getting the error:

InterpolationResolutionError: Runtime parameter 'model_name' not found and no default value provided.

If I remove the code-based parameter loading, everything works fine. Any ideas on what might be going wrong?

@ankatiyar
Copy link
Contributor

Hey @fpiedrah, thanks for getting back on this! Is there a reason you need to load the parameters yourself in your pipeline.py?
When a Kedro pipeline is run, a "session" is created which loads the configuration from the conf/ (conf_source), then merges it with the run time parameters, which the pipeline receives as input. When you load the configuration through initialising OmegaConfigLoader here, it simply loads the stuff that is available in the yaml files (and doesn't have access to the run time parameters, thus, the error).

If you specify params:model as input to your pipeline, it would be the resolved config you expect, which is also the correct/recommended way to pass parameters to node functions.

@fpiedrah
Copy link
Author

I see, that makes sense. My goal is to add new LLMs into my pipelines seamlessly without needing to explicitly add them to the catalog. To achieve this, I dynamically create the pipeline using a dataset factory for the LLMs.

Here’s my current approach:

base_pipeline = pipeline(
    [
        node(
            func=to_zero_shot,
            inputs=[
                "tasks",
                "tokenizer",
                "params:to_zero_shot.use_chat_formatting",
                "params:to_zero_shot.max_prompts",
                "params:to_zero_shot.random_seed",
            ],
            outputs="zero_shot",
        ),
        ...
    ]
)

def build_pipeline(model_parameters: dict) -> Pipeline:
    return pipeline(
        [base_pipeline],
        namespace="prompting",
        inputs={
            "tasks": "tasks",
            "tokenizer": f"{model_parameters['identifier']}#HFTokenizer",
        },
        outputs={...},
        parameters={...},
    )

def create_pipeline(**kwargs) -> Pipeline:
    configuration = OmegaConfigLoader(
        base_env=BASE_ENVIRONMENT,
        conf_source=os.path.join(PROJECT_SOURCE, settings.CONF_SOURCE),
    )["parameters"]

    model_parameters = configuration["model"]
    
    return build_pipeline(model_parameters)

In the data catalog, I currently have:

"{organization}/{model}#HFTokenizer":
  type: instructple.datasets.HFTokenizer
  model_identifier: "{organization}/{model}"

I’m thinking that an alternative approach would be modifying this entry to:

HFTokenizer:
  type: instructple.datasets.HFTokenizer
  model_identifier: "${runtime_params:model_identifier}, ${globals:model.identifier}}"

This makes sense to me. However, I believe it would be helpful to include a warning in the documentation clarifying that runtime_params cannot be used simultaneously when loading parameters from code. That way, users can avoid potential conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community
Projects
Status: Needs more info
Development

No branches or pull requests

4 participants