Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proposal architecture document for compiling shaders out of process #71

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

coopp
Copy link
Collaborator

@coopp coopp commented Sep 18, 2024

This is an initial commit with an out of process design for compiling shaders. This design will be exposed via a low-level c api that high level api implementations can use to compile.

This document is to propose a more detailed architecture for the approved
proposal: [0005](0005-inproc-outofproc-compiler-api-support.md).

The c api [0008](0008-c-interface-compiler-library.md) for compiling a shader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The c api [0008](0008-c-interface-compiler-library.md) for compiling a shader
The C API [0008](0008-c-interface-compiler-library.md) for compiling a shader

I think we should be case sensitive. C and API should always be uppercase when writing English text.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I fixed these up, but I will do another pass.


The c api [0008](0008-c-interface-compiler-library.md) for compiling a shader
will be implemented using this out of process architecture.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Motivation" section is missing - and I was actually looking for it:

  • why are we building an out of proc system?
  • what problem does using process pools and things solve?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Added.

Comment on lines 34 to 35
The process pool is owned by a singleton object and shared between each api
instance created in the calling process.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"singleton object" is a bit of a implementation detail that I'm not sure belongs here.

Suggested change
The process pool is owned by a singleton object and shared between each api
instance created in the calling process.
Each process that uses the API has a single process pool that is shared between all instances of the API that process creates.,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible alternative: there's a machine-global process pool that is shared between all processes that use the API.

(Might be worth including in the Alternatives Considered section)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed wording around the singleton implementation detail. I will add your possible alternative suggestion as it is worth a conversation at least.

Comment on lines 37 to 38
A set of low-level apis will be designed that will drive compilation work
across multiple processes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A set of low-level APIs will be designed that will drive compilation work across multiple processes.

IMO we should avoid saying things like "will be designed" and instead have the proposal be more definitive. However, I had trouble making a small change to make it that way.

Since we've discussed this offline I know what this is intended to say, but I'm not convinced that is clear here. For example, taking that sentence at face value leads to the question: "How does designing a set of low-level APIs drive compilation work across multiple processes?" And that's not what's intended here.

I think my overall suggestion here is that the low/high-level split of the libraries is a pretty fundamental part of your proposal and you should probably start the Proposed Solution section with an explanation of this.

I don't think you're intending this proposal to be the design of the low-level API, so I think you should just have a hand-wavy description of it and a note that the design on this is out-of-scope of this proposal (in fact, the design should be guided by the requirements discovered while designing the high-level API).

After that you can go into more detail about the high-level API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high-level/low-level api thing was really confusing. I removed this content.

Comment on lines 45 to 50
The high-level HLSL compiler c api implementation will be built on top of the
low-level c apis and implement any required synchronous and asynchronous api
policy behaviors. The high-level c api call will be blocked either waiting for
a worker process to become available or an already dispatched work to be
completed. This behavior depends on the type of call used
(synchronous/asynchronous).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The high-level HLSL compiler c api implementation will be built on top of the
low-level c apis and implement any required synchronous and asynchronous api
policy behaviors. The high-level c api call will be blocked either waiting for
a worker process to become available or an already dispatched work to be
completed. This behavior depends on the type of call used
(synchronous/asynchronous).
The high-level API supports both synchronous and asynchronous calling patterns.

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high-level/low-level api thing was really confusing. I removed this content.

Comment on lines 71 to 73
Worker monitoring threads use named pipes as IPC to communicate with their
workers and a different method (TBD) to communicate with the singleton's
api dispatching system.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have a design that requires a 1:1 mapping between threads in the calling process and the clang processes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I agree.

Comment on lines 130 to 133
A high-level api call starts in the api entrypoint implementation which will
call into low-level apis and implement sync/async behaviors. The low-level
c-api is responsible for finding an available worker process (clang.exe) to
perform the work. If a worker is not available the high-level api behavior
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm trying to understand the proposed design here since it is quite different to what I was expecting (see above). I can't quite understand what the difference is between the low- and high-level APIs in this design?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I changed this document. It was going in multiple directions at the same time which was really confusing. I removed lots of this type of content.

Comment on lines 142 to 145
The dispatching system uses IPC and a message protocol to communicate with
each worker process. Each worker process will know how to package up results
and communicate them back over the IPC mechanism to their monitoring threads
which will bubble up those results via the low-level apis.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all source files and generated binaries are transferred across this IPC channel?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that the caller would specify output paths as part of their command line args packed to the compile request. So clang would just do what it does today with generated binaries etc.

Comment on lines 362 to 365
An alternate design could be to reduce the total monitor threads down by
using one thread to monitor all clang.exe processes. This would centralize all
rpc traffic into a single place, but could bottleneck performance if
operations are waiting on workers to respond.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this alternative rejected?

could bottleneck performance if operations are waiting on workers to respond

This would only happen if we had synchronous communication with the workers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.. I changed my mind on this. I wasn't really thinking through out the dispatcher could be built so a monitor thread for multiple workers is totally doable.


( compiler needs an include, calls IPC wrapped included handler callback )

RESPONSE - "needinclude"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "needinclude" response feels more like clang wanting to call back into the calling process. Any reason not to model this in that way, having a more consistent REQUEST/RESPONSE matching to a function and a return value?

eg:

---> REQUEST  id:1 compile...
<--- REQUEST  id:2 need include...
---> RESPONSE id:2 include result
<--- REQUEST  id:3 need include...
---> RESPONSE id:3 include result
<--- RESPONSE id:1 compilation result

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea. I misunderstood the json-rpc protocol on my first read through. I thought the server was not allowed to send a request/response pair to the client, but I was wrong. That is totally supported.

Copy link
Collaborator

@tex3d tex3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this looks ready to merge in the Under Consideration state, with follow-ups to refine the detailed design as necessary after the document is created.

Don't forget to update NNNN in filename and Proposal: property to the next number just before merging.

This proxy callback will be used during compilation when an include is
required. Calling the proxy callback results in an IPC communication with
the client's dispatcher in the form of a `Request`/`Response` pair asking for a
buffer for the specified include.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current include handler architecture does a round-trip through the include handler for each potential include path constructed.

I think we will likely want to locate this include path search sequence on the client so it does not need to round trip through RPC for each search possibility. That brings the round-trip frequency down from includes * locations to just includes.

That would mean capturing the path for the file containing the include in the "need include" request, then cycling through this and the locally tracked include paths in the client process and calling the include handler for each one until a match is found (or not), then issuing the response.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a couple different reasons why custom include handlers are useful:

  1. They identify include dependencies.
  2. They allow remapping directory structures.
  3. They allow treating in-memory objects as files.

For (1), we should discourage using include handlers as the solution and instead have users adopt the compiler-generated dep files.

For (2), clang has a poorly-documented "header maps" feature that we could use, which would radically simplify our implementation needs.

For (3), having the server process doing file IO, defeats the point, so we'll likely need some model where we can communicate buffers of data from the host process to the compiler process. Streaming the data across a named pipe might be an acceptable initial implementation, but we likely need to do some shared memory solution to get decent performance and avoid redundant copies.

Copy link
Collaborator

@llvm-beanz llvm-beanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great start and we should merge it. We'll need to do some more thinking and design work around the details, but this gives us a really solid starting point for further discussion.


* Cleanup between compiler invocations. Destroying a process and bringing it
back up ensures that any state leaked or leftover from a previous compilation
will be cleaned up.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively being able to reuse processes can avoid process launch cost for environments where process launch cost is high.

You talk about process polling below, which is really only valuable if you keep processes resident for multiple jobs.

This proxy callback will be used during compilation when an include is
required. Calling the proxy callback results in an IPC communication with
the client's dispatcher in the form of a `Request`/`Response` pair asking for a
buffer for the specified include.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a couple different reasons why custom include handlers are useful:

  1. They identify include dependencies.
  2. They allow remapping directory structures.
  3. They allow treating in-memory objects as files.

For (1), we should discourage using include handlers as the solution and instead have users adopt the compiler-generated dep files.

For (2), clang has a poorly-documented "header maps" feature that we could use, which would radically simplify our implementation needs.

For (3), having the server process doing file IO, defeats the point, so we'll likely need some model where we can communicate buffers of data from the host process to the compiler process. Streaming the data across a named pipe might be an acceptable initial implementation, but we likely need to do some shared memory solution to get decent performance and avoid redundant copies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants