Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net Processes - Map Step Feature #9339

Open
wants to merge 93 commits into
base: main
Choose a base branch
from
Open

.Net Processes - Map Step Feature #9339

wants to merge 93 commits into from

Conversation

crickman
Copy link
Contributor

@crickman crickman commented Oct 20, 2024

Motivation and Context

Fixes: #9193

Description

Map each value from a set to a map-operation and present the results as a set for potential reduction.

Includes:

  • ProcessMapBuilder (Core)
  • KernelProcessMap / KernelProcessMapState (Abstractions)
  • LocalMap (LocalRuntime)
  • MapActor (DaprRuntime)
  • Serialization
  • Unit Tests (exhaustive)
  • Integration Tests
  • Sample

Features:

  • Executes map operations in parallel
  • Handles when output type has been transformed from input type
  • Accepts either step or subprocess for map operation
  • Proxies events for edges defined directly on the map-step
  • Allows additional (non proxied) edges for the map-operation
  • Participates in state serialization

Follow-up:

Contribution Checklist

@crickman crickman added PR: in progress Under development and/or addressing feedback .NET Issue or Pull requests regarding .NET code experimental Associated with an experimental feature enhancement processes labels Oct 20, 2024
@crickman crickman self-assigned this Oct 20, 2024
@github-actions github-actions bot changed the title .NET Processes - Map Step Feature .Net Processes - Map Step Feature Oct 20, 2024
public const string MapEventId = "StartMap";

/// <summary>
/// The map operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] perhaps this description should be more detailed - mention that is the step/subprocess that will be applied to the array received as input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have our constants file, I've moved this definition (and updated comment).

@@ -66,6 +66,8 @@ public ProcessStepEdgeBuilder OnFunctionError(string functionName)

#endregion

internal const char EventIdSeparator = '.';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] remove since it seems unused (?)

Copy link
Contributor Author

@crickman crickman Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using now, as you identified...also moved to our constants file.

Verify.NotNullOrWhiteSpace(process.State?.Name);
Verify.NotNull(kernel);
Verify.NotNull(initialEvent);
Verify.NotNull(initialEvent, nameof(initialEvent));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general wondering if for initialEvent there should also be a check for the Id not not be empty

Copy link
Contributor Author

@crickman crickman Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. I've updated KernelProcessEvent.Id so its no longer nullable, but it might be whitespace or empty string. I suppose my thought is that this condition is relevant for any event that is emitted, so perhaps the check might be more useful as part of the processing loop. I've updated LocalProcess.RunOnceAsync to check for this (as opposed to the factory) and also LocalStep.EmitEventAsync and KernelProcessStepContext.EmitEventAsync.

Just to be clear: This change didn't introduce or regress the ability for null or empty event-id to be created, but its a good improvment. As you seen, I've been incrementally adjusting assumptions around our internal data flow (removing !, for example). I expect there's still more gaps to discover and address.

ProcessBuilder process = new(nameof(TestMapAsync));

ProcessStepBuilder computeStep = process.AddStepFromType<ComputeStep>();
ProcessMapBuilder mapStep = process.AddMapForTarget(new ProcessFunctionTargetBuilder(computeStep));
Copy link
Contributor

@esttenorio esttenorio Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[brainstorming hypothetical scenario]
wondering if these 2[line 40-41] should be a single function? since compute is not actually used by the root process directly having it instantiated on the level computeStep could in theory also be used as a regular step:

graph TD
root["root process"]-->map-->compute1["compute"]
root-->compute2["compute"]
Loading

Copy link
Contributor Author

@crickman crickman Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Ben and I were discussing this also. There is a case where an edge may be defined outside of the map-operation: https://github.com/microsoft/semantic-kernel/blob/processes-map-step/dotnet/src/Experimental/Process.UnitTests/Runtime.Local/LocalMapTests.cs#L295

graph TD
root["root process"]--N-->map
subgraph map
compute1["compute"]
end
compute1["compute"]--1-->orthogonal
compute1["compute"]--N-->union
root--extra/noharm-->compute2["compute"]
Loading

The other factor to consider is when the map operation is a subprocess...such a pattern would also result in being explicitly handled by the caller (first create sub-process, then associate with map). The difference for step is that the builders only allow creating one from the process builder...which in turn always adds it to the step collection of the process.

I'm am certainly open to exploring any potential improvements, but it also feels like we need to be careful to consider the breadth of patterns that make sense and avoid disrupting others that are useful.

FYI - I'll planning on spending some time exploring alternative patterns here...just want to be careful.

{
this._map = map;
this._logger = this._kernel.LoggerFactory?.CreateLogger(this._map.State.Name) ?? new NullLogger<LocalStep>();
this._mapEvents = [.. map.Edges.Keys.Select(key => key.Split('.').Last())];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this supposed to use the EventIdSeparator I saw somewhere?

Copy link
Contributor Author

@crickman crickman Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so...thanks for spotting this. (Plus I'll move this to our constants file now)...updated several usage points.

LocalKernelProcessContext processContext = new(process, this._kernel, context.Filter);
Task processTask =
processContext.StartWithEventAsync(
new KernelProcessEvent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so all events emitted from a map are internal right? maybe if one was marked as public at this level a warning can be raised so people know

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, events specific to the map-operation (map-step edges) are internal as they target the process in which the map step is defined. Any other proxied event will retain its original visibility.

@@ -0,0 +1,223 @@
// Copyright (c) Microsoft. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering how the error messages are dealt with the map step?
same as you get an [] of nullable results, when subscribing to onError you get an [] of errors?

Copy link
Contributor Author

@crickman crickman Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I believe so. if out of 5 steps 3 result in some type of error and 2 don't, three invocations of the error handler seems natural. i'm hestant to intercept and interpret external failure. If the mapstep itself experience an exception, I'd expect that to raise a single error event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement experimental Associated with an experimental feature .NET Issue or Pull requests regarding .NET code PR: ready for review All feedback addressed, ready for reviews processes
Projects
Status: Sprint: In Review
Development

Successfully merging this pull request may close these issues.

.Net: Process Framework: Map Step
2 participants