Add proposal architecture document for compiling shaders out of process #71

coopp · 2024-09-18T21:01:12Z

This is an initial commit with an out of process design for compiling shaders. This design will be exposed via a low-level c api that high level api implementations can use to compile.

…architecture

…kdown!

…architecture

damyanp · 2024-09-18T21:03:13Z

proposals/NNNN-outofproc-compiler-api-architecture.md

+This document is to propose a more detailed architecture for the approved 
+proposal: [0005](0005-inproc-outofproc-compiler-api-support.md).
+
+The c api [0008](0008-c-interface-compiler-library.md) for compiling a shader


Suggested change

The c api [0008](0008-c-interface-compiler-library.md) for compiling a shader

The C API [0008](0008-c-interface-compiler-library.md) for compiling a shader

I think we should be case sensitive. C and API should always be uppercase when writing English text.

I think I fixed these up, but I will do another pass.

damyanp · 2024-09-18T21:06:46Z

proposals/NNNN-outofproc-compiler-api-architecture.md

+
+The c api [0008](0008-c-interface-compiler-library.md) for compiling a shader
+will be implemented using this out of process architecture.
+


The "Motivation" section is missing - and I was actually looking for it:

why are we building an out of proc system?

what problem does using process pools and things solve?

damyanp · 2024-09-18T21:09:11Z

proposals/NNNN-outofproc-compiler-api-architecture.md

+The process pool is owned by a singleton object and shared between each api
+instance created in the calling process.


"singleton object" is a bit of a implementation detail that I'm not sure belongs here.

Suggested change

The process pool is owned by a singleton object and shared between each api

instance created in the calling process.

Each process that uses the API has a single process pool that is shared between all instances of the API that process creates.,

Possible alternative: there's a machine-global process pool that is shared between all processes that use the API.

(Might be worth including in the Alternatives Considered section)

I removed wording around the singleton implementation detail. I will add your possible alternative suggestion as it is worth a conversation at least.

damyanp · 2024-09-18T21:15:58Z

proposals/NNNN-outofproc-compiler-api-architecture.md

+A set of low-level apis will be designed that will drive compilation work
+across multiple processes.


A set of low-level APIs will be designed that will drive compilation work across multiple processes.

IMO we should avoid saying things like "will be designed" and instead have the proposal be more definitive. However, I had trouble making a small change to make it that way.

Since we've discussed this offline I know what this is intended to say, but I'm not convinced that is clear here. For example, taking that sentence at face value leads to the question: "How does designing a set of low-level APIs drive compilation work across multiple processes?" And that's not what's intended here.

I think my overall suggestion here is that the low/high-level split of the libraries is a pretty fundamental part of your proposal and you should probably start the Proposed Solution section with an explanation of this.

I don't think you're intending this proposal to be the design of the low-level API, so I think you should just have a hand-wavy description of it and a note that the design on this is out-of-scope of this proposal (in fact, the design should be guided by the requirements discovered while designing the high-level API).

After that you can go into more detail about the high-level API.

The high-level/low-level api thing was really confusing. I removed this content.

damyanp · 2024-09-18T21:18:42Z

proposals/NNNN-outofproc-compiler-api-architecture.md

+The high-level HLSL compiler c api implementation will be built on top of the
+low-level c apis and implement any required synchronous and asynchronous api
+policy behaviors. The high-level c api call will be blocked either waiting for
+a worker process to become available or an already dispatched work to be
+completed. This behavior depends on the type of call used
+(synchronous/asynchronous).


Suggested change

The high-level HLSL compiler c api implementation will be built on top of the

low-level c apis and implement any required synchronous and asynchronous api

policy behaviors. The high-level c api call will be blocked either waiting for

a worker process to become available or an already dispatched work to be

completed. This behavior depends on the type of call used

(synchronous/asynchronous).

The high-level API supports both synchronous and asynchronous calling patterns.

?

The high-level/low-level api thing was really confusing. I removed this content.

damyanp · 2024-09-18T21:34:56Z

proposals/NNNN-outofproc-compiler-api-architecture.md

+Worker monitoring threads use named pipes as IPC to communicate with their
+workers and a different method (TBD) to communicate with the singleton's
+api dispatching system.


I don't think we should have a design that requires a 1:1 mapping between threads in the calling process and the clang processes.

+1 I agree.

damyanp · 2024-09-18T21:37:04Z

proposals/NNNN-outofproc-compiler-api-architecture.md

+A high-level api call starts in the api entrypoint implementation which will
+call into low-level apis and implement sync/async behaviors. The low-level
+c-api is responsible for finding an available worker process (clang.exe) to
+perform the work.  If a worker is not available the high-level api behavior


So I'm trying to understand the proposed design here since it is quite different to what I was expecting (see above). I can't quite understand what the difference is between the low- and high-level APIs in this design?

Yes. I changed this document. It was going in multiple directions at the same time which was really confusing. I removed lots of this type of content.

damyanp · 2024-09-18T21:37:43Z

proposals/NNNN-outofproc-compiler-api-architecture.md

+The dispatching system uses IPC and a message protocol to communicate with
+each worker process.  Each worker process will know how to package up results
+and communicate them back over the IPC mechanism to their monitoring threads
+which will bubble up those results via the low-level apis.


So all source files and generated binaries are transferred across this IPC channel?

I was thinking that the caller would specify output paths as part of their command line args packed to the compile request. So clang would just do what it does today with generated binaries etc.

damyanp · 2024-09-18T21:44:38Z

proposals/NNNN-outofproc-compiler-api-architecture.md

+An alternate design could be to reduce the total monitor threads down by
+using one thread to monitor all clang.exe processes. This would centralize all
+rpc traffic into a single place, but could bottleneck performance if 
+operations are waiting on workers to respond.


Why was this alternative rejected?

could bottleneck performance if operations are waiting on workers to respond

This would only happen if we had synchronous communication with the workers.

Yeah.. I changed my mind on this. I wasn't really thinking through out the dispatcher could be built so a monitor thread for multiple workers is totally doable.

damyanp · 2024-09-18T21:49:22Z

proposals/NNNN-outofproc-compiler-api-architecture.md

+
+( compiler needs an include, calls IPC wrapped included handler callback )
+
+RESPONSE - "needinclude"


This "needinclude" response feels more like clang wanting to call back into the calling process. Any reason not to model this in that way, having a more consistent REQUEST/RESPONSE matching to a function and a return value?

eg:

---> REQUEST id:1 compile... <--- REQUEST id:2 need include... ---> RESPONSE id:2 include result <--- REQUEST id:3 need include... ---> RESPONSE id:3 include result <--- RESPONSE id:1 compilation result

I like this idea. I misunderstood the json-rpc protocol on my first read through. I thought the server was not allowed to send a request/response pair to the client, but I was wrong. That is totally supported.

tex3d

To me, this looks ready to merge in the Under Consideration state, with follow-ups to refine the detailed design as necessary after the document is created.

Don't forget to update NNNN in filename and Proposal: property to the next number just before merging.

tex3d · 2024-11-08T03:37:39Z

proposals/NNNN-outofproc-compiler-api-architecture.md

+This proxy callback will be used during compilation when an include is
+required. Calling the proxy callback results in an IPC communication with
+the client's dispatcher in the form of a `Request`/`Response` pair asking for a 
+buffer for the specified include.


The current include handler architecture does a round-trip through the include handler for each potential include path constructed.

I think we will likely want to locate this include path search sequence on the client so it does not need to round trip through RPC for each search possibility. That brings the round-trip frequency down from includes * locations to just includes.

That would mean capturing the path for the file containing the include in the "need include" request, then cycling through this and the locally tracked include paths in the client process and calling the include handler for each one until a match is found (or not), then issuing the response.

I think there are a couple different reasons why custom include handlers are useful:

They identify include dependencies.

They allow remapping directory structures.

They allow treating in-memory objects as files.

For (1), we should discourage using include handlers as the solution and instead have users adopt the compiler-generated dep files.

For (2), clang has a poorly-documented "header maps" feature that we could use, which would radically simplify our implementation needs.

For (3), having the server process doing file IO, defeats the point, so we'll likely need some model where we can communicate buffers of data from the host process to the compiler process. Streaming the data across a named pipe might be an acceptable initial implementation, but we likely need to do some shared memory solution to get decent performance and avoid redundant copies.

llvm-beanz

I think this is a great start and we should merge it. We'll need to do some more thinking and design work around the details, but this gives us a really solid starting point for further discussion.

llvm-beanz · 2024-11-08T22:53:06Z

proposals/NNNN-outofproc-compiler-api-architecture.md

+
+* Cleanup between compiler invocations. Destroying a process and bringing it
+back up ensures that any state leaked or leftover from a previous compilation
+will be cleaned up.


Alternatively being able to reuse processes can avoid process launch cost for environments where process launch cost is high.

You talk about process polling below, which is really only valuable if you keep processes resident for multiple jobs.

llvm-beanz · 2024-11-08T23:09:28Z

proposals/NNNN-outofproc-compiler-api-architecture.md

+This proxy callback will be used during compilation when an include is
+required. Calling the proxy callback results in an IPC communication with
+the client's dispatcher in the form of a `Request`/`Response` pair asking for a 
+buffer for the specified include.


I think there are a couple different reasons why custom include handlers are useful:

They identify include dependencies.

They allow remapping directory structures.

They allow treating in-memory objects as files.

For (1), we should discourage using include handlers as the solution and instead have users adopt the compiler-generated dep files.

For (2), clang has a poorly-documented "header maps" feature that we could use, which would radically simplify our implementation needs.

For (3), having the server process doing file IO, defeats the point, so we'll likely need some model where we can communicate buffers of data from the host process to the compiler process. Streaming the data across a named pipe might be an acceptable initial implementation, but we likely need to do some shared memory solution to get decent performance and avoid redundant copies.

coopp and others added 15 commits August 13, 2024 15:59

Initial commit placeholder document

c3c57b2

Brain dump from whiteboard design

68520d5

Add more details about different systems involved.

929a88a

More refinement

ea94e15

Add JSON rpc and suggest clang-dxc.exe to be the worker process

912ab7c

Cleanups again!

1d09242

Merge remote-tracking branch 'origin/main' into user/coopp/outofproc_…

a6990d4

…architecture

Added more on json-rpc messages

223f4a7

Clean up formatting and description text

0a80ea7

Add better descriptions of the system and Mermaid diagram

2f8be15

Remove json markdown because --> was causing syntax errors in the mar…

2e17575

…kdown!

Update diagram to had dashed lines per process

96ab885

Add alternate design for system

a72cb5f

Merge remote-tracking branch 'origin/main' into user/coopp/outofproc_…

d9f5b7e

…architecture

Updates

96a4186

damyanp reviewed Sep 18, 2024

View reviewed changes

Pushing a new iteration (work in progress)

97a6031

llvm-beanz requested review from llvm-beanz and tex3d November 7, 2024 18:10

tex3d approved these changes Nov 8, 2024

View reviewed changes

llvm-beanz approved these changes Nov 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add proposal architecture document for compiling shaders out of process #71

Add proposal architecture document for compiling shaders out of process #71

coopp commented Sep 18, 2024

damyanp Sep 18, 2024

coopp Oct 2, 2024

damyanp Sep 18, 2024

coopp Oct 2, 2024

damyanp Sep 18, 2024

damyanp Sep 18, 2024

coopp Oct 2, 2024

damyanp Sep 18, 2024

coopp Oct 2, 2024

damyanp Sep 18, 2024

coopp Oct 2, 2024

damyanp Sep 18, 2024

coopp Oct 2, 2024

damyanp Sep 18, 2024

coopp Oct 2, 2024

damyanp Sep 18, 2024

coopp Oct 2, 2024

damyanp Sep 18, 2024

coopp Oct 2, 2024

damyanp Sep 18, 2024

coopp Oct 2, 2024

tex3d left a comment

tex3d Nov 8, 2024

llvm-beanz Nov 8, 2024

llvm-beanz left a comment

llvm-beanz Nov 8, 2024

llvm-beanz Nov 8, 2024

	The c api [0008](0008-c-interface-compiler-library.md) for compiling a shader
	The C API [0008](0008-c-interface-compiler-library.md) for compiling a shader


		The c api [0008](0008-c-interface-compiler-library.md) for compiling a shader
		will be implemented using this out of process architecture.

		The process pool is owned by a singleton object and shared between each api
		instance created in the calling process.

	The process pool is owned by a singleton object and shared between each api
	instance created in the calling process.
	Each process that uses the API has a single process pool that is shared between all instances of the API that process creates.,

		A set of low-level apis will be designed that will drive compilation work
		across multiple processes.


		( compiler needs an include, calls IPC wrapped included handler callback )

		RESPONSE - "needinclude"

Add proposal architecture document for compiling shaders out of process #71

Are you sure you want to change the base?

Add proposal architecture document for compiling shaders out of process #71

Conversation

coopp commented Sep 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tex3d left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llvm-beanz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment