What's Changed
- Add C++ executor test by @chhwang in #304
- Cumulative Updates by @Binyang2014 in #309
- Add NPKit GPU event support by @yzygitzh in #310
- Fix NPKit support for AMD by @yzygitzh in #312
- Add "packet type" option for executor test by @Binyang2014 in #313
- Add support for multicast reduce insruction by @roshandathathri in #316
- Update quickstart.md by @angelica-moreira in #314
- Simplify/improve barrier in AllReduce6 by @roshandathathri in #317
- Support NCCL APIs by @caiomcbr in #319
- Update allreduce_bench.py by @angelica-moreira in #318
- Separate NPKit CPU timestamp access from different blocks for AMD platform by @yzygitzh in #321
- AllReduce Kernel for Small Messages by @caiomcbr in #322
- Resolve clang++ warnings by @chhwang in #325
- Support to write packets via uint2 by @Binyang2014 in #327
- Double buffering for NCCL APIs by @caiomcbr in #324
- v0.5.2 by @chhwang in #328
New Contributors
- @angelica-moreira made their first contribution in #314
- @caiomcbr made their first contribution in #319
Full Changelog: v0.5.1...v0.5.2