-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proposal architecture document for compiling shaders out of process #71
base: main
Are you sure you want to change the base?
Conversation
This document is to propose a more detailed architecture for the approved | ||
proposal: [0005](0005-inproc-outofproc-compiler-api-support.md). | ||
|
||
The c api [0008](0008-c-interface-compiler-library.md) for compiling a shader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The c api [0008](0008-c-interface-compiler-library.md) for compiling a shader | |
The C API [0008](0008-c-interface-compiler-library.md) for compiling a shader |
I think we should be case sensitive. C
and API
should always be uppercase when writing English text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I fixed these up, but I will do another pass.
|
||
The c api [0008](0008-c-interface-compiler-library.md) for compiling a shader | ||
will be implemented using this out of process architecture. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Motivation" section is missing - and I was actually looking for it:
- why are we building an out of proc system?
- what problem does using process pools and things solve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Added.
The process pool is owned by a singleton object and shared between each api | ||
instance created in the calling process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"singleton object" is a bit of a implementation detail that I'm not sure belongs here.
The process pool is owned by a singleton object and shared between each api | |
instance created in the calling process. | |
Each process that uses the API has a single process pool that is shared between all instances of the API that process creates., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible alternative: there's a machine-global process pool that is shared between all processes that use the API.
(Might be worth including in the Alternatives Considered section)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed wording around the singleton implementation detail. I will add your possible alternative suggestion as it is worth a conversation at least.
A set of low-level apis will be designed that will drive compilation work | ||
across multiple processes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A set of low-level APIs will be designed that will drive compilation work across multiple processes.
IMO we should avoid saying things like "will be designed" and instead have the proposal be more definitive. However, I had trouble making a small change to make it that way.
Since we've discussed this offline I know what this is intended to say, but I'm not convinced that is clear here. For example, taking that sentence at face value leads to the question: "How does designing a set of low-level APIs drive compilation work across multiple processes?" And that's not what's intended here.
I think my overall suggestion here is that the low/high-level split of the libraries is a pretty fundamental part of your proposal and you should probably start the Proposed Solution section with an explanation of this.
I don't think you're intending this proposal to be the design of the low-level API, so I think you should just have a hand-wavy description of it and a note that the design on this is out-of-scope of this proposal (in fact, the design should be guided by the requirements discovered while designing the high-level API).
After that you can go into more detail about the high-level API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The high-level/low-level api thing was really confusing. I removed this content.
The high-level HLSL compiler c api implementation will be built on top of the | ||
low-level c apis and implement any required synchronous and asynchronous api | ||
policy behaviors. The high-level c api call will be blocked either waiting for | ||
a worker process to become available or an already dispatched work to be | ||
completed. This behavior depends on the type of call used | ||
(synchronous/asynchronous). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The high-level HLSL compiler c api implementation will be built on top of the | |
low-level c apis and implement any required synchronous and asynchronous api | |
policy behaviors. The high-level c api call will be blocked either waiting for | |
a worker process to become available or an already dispatched work to be | |
completed. This behavior depends on the type of call used | |
(synchronous/asynchronous). | |
The high-level API supports both synchronous and asynchronous calling patterns. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The high-level/low-level api thing was really confusing. I removed this content.
Worker monitoring threads use named pipes as IPC to communicate with their | ||
workers and a different method (TBD) to communicate with the singleton's | ||
api dispatching system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should have a design that requires a 1:1 mapping between threads in the calling process and the clang processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I agree.
A high-level api call starts in the api entrypoint implementation which will | ||
call into low-level apis and implement sync/async behaviors. The low-level | ||
c-api is responsible for finding an available worker process (clang.exe) to | ||
perform the work. If a worker is not available the high-level api behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm trying to understand the proposed design here since it is quite different to what I was expecting (see above). I can't quite understand what the difference is between the low- and high-level APIs in this design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I changed this document. It was going in multiple directions at the same time which was really confusing. I removed lots of this type of content.
The dispatching system uses IPC and a message protocol to communicate with | ||
each worker process. Each worker process will know how to package up results | ||
and communicate them back over the IPC mechanism to their monitoring threads | ||
which will bubble up those results via the low-level apis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So all source files and generated binaries are transferred across this IPC channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that the caller would specify output paths as part of their command line args packed to the compile request. So clang would just do what it does today with generated binaries etc.
An alternate design could be to reduce the total monitor threads down by | ||
using one thread to monitor all clang.exe processes. This would centralize all | ||
rpc traffic into a single place, but could bottleneck performance if | ||
operations are waiting on workers to respond. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this alternative rejected?
could bottleneck performance if operations are waiting on workers to respond
This would only happen if we had synchronous communication with the workers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.. I changed my mind on this. I wasn't really thinking through out the dispatcher could be built so a monitor thread for multiple workers is totally doable.
|
||
( compiler needs an include, calls IPC wrapped included handler callback ) | ||
|
||
RESPONSE - "needinclude" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "needinclude" response feels more like clang wanting to call back into the calling process. Any reason not to model this in that way, having a more consistent REQUEST/RESPONSE matching to a function and a return value?
eg:
---> REQUEST id:1 compile...
<--- REQUEST id:2 need include...
---> RESPONSE id:2 include result
<--- REQUEST id:3 need include...
---> RESPONSE id:3 include result
<--- RESPONSE id:1 compilation result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. I misunderstood the json-rpc protocol on my first read through. I thought the server was not allowed to send a request/response pair to the client, but I was wrong. That is totally supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this looks ready to merge in the Under Consideration state, with follow-ups to refine the detailed design as necessary after the document is created.
Don't forget to update NNNN in filename and Proposal:
property to the next number just before merging.
This proxy callback will be used during compilation when an include is | ||
required. Calling the proxy callback results in an IPC communication with | ||
the client's dispatcher in the form of a `Request`/`Response` pair asking for a | ||
buffer for the specified include. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current include handler architecture does a round-trip through the include handler for each potential include path constructed.
I think we will likely want to locate this include path search sequence on the client so it does not need to round trip through RPC for each search possibility. That brings the round-trip frequency down from includes * locations
to just includes
.
That would mean capturing the path for the file containing the include in the "need include" request, then cycling through this and the locally tracked include paths in the client process and calling the include handler for each one until a match is found (or not), then issuing the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a couple different reasons why custom include handlers are useful:
- They identify include dependencies.
- They allow remapping directory structures.
- They allow treating in-memory objects as files.
For (1), we should discourage using include handlers as the solution and instead have users adopt the compiler-generated dep files.
For (2), clang has a poorly-documented "header maps" feature that we could use, which would radically simplify our implementation needs.
For (3), having the server process doing file IO, defeats the point, so we'll likely need some model where we can communicate buffers of data from the host process to the compiler process. Streaming the data across a named pipe might be an acceptable initial implementation, but we likely need to do some shared memory solution to get decent performance and avoid redundant copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great start and we should merge it. We'll need to do some more thinking and design work around the details, but this gives us a really solid starting point for further discussion.
|
||
* Cleanup between compiler invocations. Destroying a process and bringing it | ||
back up ensures that any state leaked or leftover from a previous compilation | ||
will be cleaned up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively being able to reuse processes can avoid process launch cost for environments where process launch cost is high.
You talk about process polling below, which is really only valuable if you keep processes resident for multiple jobs.
This proxy callback will be used during compilation when an include is | ||
required. Calling the proxy callback results in an IPC communication with | ||
the client's dispatcher in the form of a `Request`/`Response` pair asking for a | ||
buffer for the specified include. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a couple different reasons why custom include handlers are useful:
- They identify include dependencies.
- They allow remapping directory structures.
- They allow treating in-memory objects as files.
For (1), we should discourage using include handlers as the solution and instead have users adopt the compiler-generated dep files.
For (2), clang has a poorly-documented "header maps" feature that we could use, which would radically simplify our implementation needs.
For (3), having the server process doing file IO, defeats the point, so we'll likely need some model where we can communicate buffers of data from the host process to the compiler process. Streaming the data across a named pipe might be an acceptable initial implementation, but we likely need to do some shared memory solution to get decent performance and avoid redundant copies.
This is an initial commit with an out of process design for compiling shaders. This design will be exposed via a low-level c api that high level api implementations can use to compile.