Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with mutating kernels #130

Open
eholk opened this issue Feb 17, 2014 · 0 comments
Open

Experiment with mutating kernels #130

eholk opened this issue Feb 17, 2014 · 0 comments

Comments

@eholk
Copy link
Owner

eholk commented Feb 17, 2014

Many GPU algorithms are formulated with lots of mutation, and Harlan doesn't handle these with the greatest performance. We should consider some restricted forms of mutation. Some of them may not even be hard to implement to experiment with.

Here's an example of one that we were talking about earlier today:

(kernel-update! ((x xs) <- (y ys))
  (* x y))

This is sort of like (set! xs (kernel ((x xs) (y ys)) (* x y))), but it would overwrite the contents of xs with the result of this kernel (perhaps with a check to make sure we don't write the exact same value), rather than making xs point to a new vector.

There are a couple of issues to discuss with a form like this.

I think at the very least we need to keep type safety, meaning basically if you read a value that's supposed to have a certain type, the thing you get actually has that type. As long as all the updates are atomic, I don't think this will be a problem.

The next issue is how much determinism we need. So far Harlan is basically deterministic (modulo I/O, etc), and this is a nice property to have. However, nondeterminism is nice too. For example, EigenCFA tolerates monotonic races to get much better performance.

One hopefully-not-too-difficult-to-enforce option would be that kernel-update! cannot do arbitrary reads from its destination. I think it's safe for each thread to get the previous value (which is why the example binds x to an element in xs), and that this should not break determinism. This seems safe because we implicitly synchronize at the end of a kernel-update!.

Another option would be to allow monotonic reads/updates. This seems way harder to enforce, but doing so would allow us to relax synchronization some for possibly even better performance. For example, consider a BFS traversal like this:

(kernel-update! ((_ depths) <- (v vertices))
  (+ 1 (reduce min (kernel ((i (incoming-edges v)))
                      (vector-ref depths i)))))

This version could race, but since we'd use it as part of a fixpoint algorithm, we should get the same answer eventually anyway. This works because the body of the kernel is monotonically decreasing. I feel like proving that the body is monotonically decreasing would be really hard though.

Finally, we need to consider how much this impacts other optimizations. We want to avoid more chances to introduce bugs like #56.

eholk added a commit that referenced this issue Feb 25, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant