-
Notifications
You must be signed in to change notification settings - Fork 165
Insights: NVIDIA/cccl
Overview
Could not load contribution data
Please try again later
38 Pull requests merged by 12 people
-
[STF] Implement CUDASTF_DOT_TIMING for the ctx.cuda_kernel construct
#2950 merged
Nov 25, 2024 -
Drop some of the mdspan fold implementation
#2949 merged
Nov 25, 2024 -
minor consistency improvements in concepts macros
#2928 merged
Nov 24, 2024 -
Try to fix a clang warning:
#2941 merged
Nov 23, 2024 -
Add missing qualifier for cuda namespace
#2940 merged
Nov 23, 2024 -
Add tuple protocol to
cuda::std::complex
from C++26#2882 merged
Nov 22, 2024 -
implement C++26
std::span
's constructor fromstd::initializer_list
#2923 merged
Nov 22, 2024 -
Reorganize PTX headers to match generator
#2925 merged
Nov 22, 2024 -
Improve build instructions for libcu++
#2881 merged
Nov 22, 2024 -
Reorganize PTX docs to match generator
#2929 merged
Nov 22, 2024 -
Reorganize PTX tests to match generator
#2930 merged
Nov 22, 2024 -
move msvc compiler macros to new version
#2885 merged
Nov 22, 2024 -
implement C++26
std::span::at
#2924 merged
Nov 22, 2024 -
Replace inconsistent Doxygen macros with
_CCCL_DOXYGEN_INVOKED
#2921 merged
Nov 21, 2024 -
add "
interface
" to_CCCL_PUSH_MACROS
#2919 merged
Nov 21, 2024 -
Refactor nvbench helper less_t
#2905 merged
Nov 21, 2024 -
Try to work around issue with NVHPC in conjunction with older CTK versions
#2889 merged
Nov 21, 2024 -
Move implementation of
_LIBCUDACXX_TEMPLATE
to CCCL#2832 merged
Nov 21, 2024 -
Fix old gcc version check
#2904 merged
Nov 20, 2024 -
for_each_in_extent
#2518 merged
Nov 20, 2024 -
Add
thrust_create_target
DISPATCH
option.#2844 merged
Nov 19, 2024 -
Automate release branch creation
#2685 merged
Nov 19, 2024 -
move
_CCCL_COMPILER_CLANG
to the new macro#2859 merged
Nov 19, 2024 -
backport
to_underlying
#2853 merged
Nov 19, 2024 -
Include use of NVHPC in CUB/Thrust magic namespace
#2771 merged
Nov 19, 2024 -
Fix rst typos in benchmarking.html
#2868 merged
Nov 19, 2024 -
[Docs/PTX] Add device tensor map init example
#1983 merged
Nov 19, 2024 -
correct the names of
shared_resource
's async allocate/deallocate members#2880 merged
Nov 19, 2024 -
add missing
DOXYGEN_*
predefined macros when building the cudax docs#2858 merged
Nov 19, 2024 -
[Backport] Fix cluster launch error in branch/2.7.x
#2866 merged
Nov 18, 2024 -
Make discovery mechanism for
cuda/_include
directory compatible withpip install --editable
#2846 merged
Nov 18, 2024 -
Fix DeviceSegmentedSort NVTX range name
#2857 merged
Nov 18, 2024 -
Add MatX build to CCCL CI
#2682 merged
Nov 18, 2024 -
Fix race condition in block-RLD test harness.
#2706 merged
Nov 18, 2024 -
Add benchmarking and tuning presets
#2856 merged
Nov 18, 2024 -
Move
_CCCL_COMPILER_GCC
to the new macro#2850 merged
Nov 18, 2024 -
Add missing include
#2855 merged
Nov 18, 2024 -
Fix wrong include in Thrust benchmark
#2854 merged
Nov 18, 2024
16 Pull requests opened by 9 people
-
Drop memory resources in libcu++
#2860 opened
Nov 18, 2024 -
Disable execution checks for tuple (#2780)
#2867 opened
Nov 18, 2024 -
Tweak tuning database plot and comparison scripts
#2883 opened
Nov 19, 2024 -
Add more CUB transform benchmarks
#2906 opened
Nov 20, 2024 -
implement C++26 `std::ignore`
#2922 opened
Nov 21, 2024 -
Add "Open with Codeanywhere" badge to .devcontainer/README.md
#2926 opened
Nov 21, 2024 -
Reduce number of per-PR CI jobs.
#2931 opened
Nov 21, 2024 -
[CUDAX] Add copy_bytes and fill_bytes overloads for mdspan
#2932 opened
Nov 22, 2024 -
unify implementation of `[[deprecated]]` attribute
#2934 opened
Nov 22, 2024 -
Regenerate `cuda::ptx` files and run format
#2937 opened
Nov 22, 2024 -
cudax compiler bump
#2943 opened
Nov 22, 2024 -
fix thread-reduce performance regression
#2944 opened
Nov 22, 2024 -
add a `_CCCL_NO_CONCEPTS` config macro
#2945 opened
Nov 23, 2024 -
add a `__type_switch` utility and use it the ptx generator
#2946 opened
Nov 24, 2024 -
Avoid potential null dereference in `annotated_ptr`
#2951 opened
Nov 25, 2024 -
make compiler version comparison utility generic
#2952 opened
Nov 25, 2024
16 Issues closed by 9 people
-
Ambiguous cuda:: namespace issue
#2939 closed
Nov 23, 2024 -
[BUG]: thrust::tuple<Eigen::Vector3f> copy Constructor fails in template dedcution
#2936 closed
Nov 22, 2024 -
[BUG]: inconsistent use of macros to suppress doxygen document generation
#2362 closed
Nov 21, 2024 -
Determine and finalize design for large input support in CUB
#1454 closed
Nov 21, 2024 -
[EPIC] Self-hosted Windows Runners
#523 closed
Nov 20, 2024 -
Get sccache working in Windows
#312 closed
Nov 20, 2024 -
Initial version of "Finalize Release" workflow
#2917 closed
Nov 20, 2024 -
Initial version of "New Release" workflow
#2915 closed
Nov 20, 2024 -
MSVC: Compilation of <tuple> issues in VS 2017 and VS 2019
#955 closed
Nov 20, 2024 -
Gather benchmark results for each CUB algorithm using different offset types
#1787 closed
Nov 20, 2024 -
Extend CUB/Thrust magic namespace with NVHPC
#2770 closed
Nov 19, 2024 -
[DOC]: PTX: Document how to initialize/modify a tensor map in device memory
#1982 closed
Nov 19, 2024 -
[FEA]: Build MatX as part of our third-party CI testing
#2264 closed
Nov 18, 2024 -
[BUG]: Race reported in cub::BlockStore
#1903 closed
Nov 18, 2024 -
Create CMake presets for benchmarking and tuning
#2839 closed
Nov 18, 2024
50 Issues opened by 15 people
-
[BUG]: Suboptimal swap performance on universal vectors
#2948 opened
Nov 25, 2024 -
[BUG]: libcu++ should support the three-way comparison operator
#2947 opened
Nov 24, 2024 -
[BUG]: UB in annotated_ptr
#2942 opened
Nov 22, 2024 -
[FEA]: `cuda::span_collection`
#2938 opened
Nov 22, 2024 -
Towards a libcudacxx developer guide
#2935 opened
Nov 22, 2024 -
[BUG]: cuda::ptx takes long to compile
#2933 opened
Nov 22, 2024 -
[FEA]: Add `__pipeline_arrive_on_noinc`
#2927 opened
Nov 21, 2024 -
[FEA] Provide templates for device-side wrappers that can be type-checked on the host
#2918 opened
Nov 20, 2024 -
Initial version of "Update RC" workflow
#2916 opened
Nov 20, 2024 -
[FEA]: Come up with a plan for consolidating CCCL namespaces
#2914 opened
Nov 20, 2024 -
[Theme]: Namespace Unification
#2913 opened
Nov 20, 2024 -
add copy/fill_bytes overload for mdspan
#2912 opened
Nov 20, 2024 -
[DOC]: Improve thrust API Reference list appearance
#2911 opened
Nov 20, 2024 -
[FEA]: Implement `_CCCL_MAYBE_UNUSED`?
#2910 opened
Nov 20, 2024 -
[DOC]: Clean-up reference example code snippets in CUB
#2909 opened
Nov 20, 2024 -
[DOC]: Improve reference example code snippets in Thrust
#2908 opened
Nov 20, 2024 -
[DOC]: Provide a Tutorials top level page in our CCCL docs
#2903 opened
Nov 20, 2024 -
[DOC]: Extract from source and create a libcu++ examples standalone subsection under the libcu++ section
#2902 opened
Nov 20, 2024 -
[DOC]: Extract from source and create a Thrust examples standalone subsection under the Thrust section
#2901 opened
Nov 20, 2024 -
[DOC]: Extract CUB examples into a standalone subsection under the CUB section
#2900 opened
Nov 20, 2024 -
[DOC]: Remove Releases/Contributing subsections from Thrust and libcu++
#2899 opened
Nov 20, 2024 -
[DOC]: Consolidate and flatten the two Thrust API subsections
#2898 opened
Nov 20, 2024 -
[DOC]: Unify Docs hierarchy tree
#2897 opened
Nov 20, 2024 -
[DOC]: Flatten CUB API documentation section
#2896 opened
Nov 20, 2024 -
Split Device-Wide Primitives into Reference and Explanation pages
#2895 opened
Nov 20, 2024 -
Split Block-Wide Primitives into Reference and Explanation pages
#2894 opened
Nov 20, 2024 -
Split Warp-Wide Primitives into Reference and Explanation pages
#2893 opened
Nov 20, 2024 -
[DOC]: Split CUB primitives pages into Reference and Explanation pages [diataxis]
#2892 opened
Nov 20, 2024 -
Explore a stricter separation of synchronous and asynchronous containers and APIs
#2891 opened
Nov 19, 2024 -
[BUG]: sm_100 missing from <nv/target>
#2890 opened
Nov 19, 2024 -
mdarray multi-dimensional owning abstraction
#2888 opened
Nov 19, 2024 -
Investigate interaction between `cuda::vector` and `cuda::launch`
#2887 opened
Nov 19, 2024 -
Provide a PoC implementation within cudax
#2886 opened
Nov 19, 2024 -
[FEA]: Store peak memory bandwidth in tuning database
#2884 opened
Nov 19, 2024 -
[FEA]: Migrate fail test utilities to top-level; reuse between projects
#2879 opened
Nov 19, 2024 -
Design for runtime compilation of kernel templates with dynamically supplied types
#2878 opened
Nov 19, 2024 -
Adjust Jitify2 interface to the rest of CUDAX / Design NVRTC interface for CUDAX
#2877 opened
Nov 19, 2024 -
Meet with Jitify2 maintainers and discuss potential exposure in CUDAX
#2876 opened
Nov 19, 2024 -
Expose runtime compilation of kernels
#2875 opened
Nov 19, 2024 -
Explore integration of host side execution with container APIs
#2874 opened
Nov 19, 2024 -
CUDAX stream host callback API
#2873 opened
Nov 19, 2024 -
Stream ordered host execution
#2872 opened
Nov 19, 2024 -
get stream priority range
#2871 opened
Nov 19, 2024 -
get and set device flags
#2870 opened
Nov 19, 2024 -
get/set device limit
#2869 opened
Nov 19, 2024 -
[FEA]: Combine c2h and benchmark data generators.
#2865 opened
Nov 18, 2024 -
[FEA]: Migrate nvbench utilities to top-level; reuse between projects
#2864 opened
Nov 18, 2024 -
[DOC]: Convert to single sphinx project so full ToC is visible from subprojects
#2863 opened
Nov 18, 2024 -
[DOC]: Migrate to nvidia's sphinx theme.
#2862 opened
Nov 18, 2024 -
[EXTERNAL] Numba pointer arithmetic
#2861 opened
Nov 18, 2024
138 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Start reworking our math functions
#2749 commented on
Nov 25, 2024 • 20 new comments -
backport `unreachable`
#2852 commented on
Nov 22, 2024 • 13 new comments -
[STF] reduce access mode
#2830 commented on
Nov 25, 2024 • 3 new comments -
Add environment to encapsulate information needed for `cudax::vector`
#2775 commented on
Nov 25, 2024 • 1 new comment -
Higher-level, `cuda::vector<T, Properties...>` (like `thrust::host_vector/device_vector`)
#2057 commented on
Nov 20, 2024 • 0 new comments -
Provide Run-Length Decode API
#599 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Port libcxx <ranges> views and backport to C++17
#93 commented on
Nov 20, 2024 • 0 new comments -
[DOC]: Write dev docs for how to use Windows images for local development
#94 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Extend devcontainer/launch.sh to build custom containers on the fly
#126 commented on
Nov 20, 2024 • 0 new comments -
Thrust large input support
#49 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Use P2322R6 to determine accumulator type in Thrust
#153 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Intrusive Decoupled Look-Back
#220 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Relax shuffle-based warp exchange requirements
#271 commented on
Nov 20, 2024 • 0 new comments -
[EPIC]: Thrust/CUB shouldn't invoke user-defined operators on out-of-bounds data
#459 commented on
Nov 20, 2024 • 0 new comments -
Specialize relevant `cuda::(std::)` types for `__half/bfloat16/fp8`
#525 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Transition to new Thrust benchmarks
#557 commented on
Nov 20, 2024 • 0 new comments -
Use parallel implementations for CPU execution policies
#831 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Remove CDP (RDC) architecture filtering logic from Thrust/CUB tests
#1137 commented on
Nov 20, 2024 • 0 new comments -
Replace mdspan with upstream libc++ implementation
#1185 commented on
Nov 20, 2024 • 0 new comments -
Proof-of-concept parallel ranges algorithms
#1213 commented on
Nov 20, 2024 • 0 new comments -
[DOC]: Add CUB examples utilizing multi-dimensional thread blocks
#1659 commented on
Nov 20, 2024 • 0 new comments -
expose device properties, meminfo etc
#2081 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Improve performance of `thrust::all_of`.
#2113 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Synchronous data structures that use a cuda::mr::resource to allocate their memory
#2129 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Aynchronous data structures that use a cuda::mr::asyncresource to allocate their memory
#2131 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Non-owning view types with properties to ensure type safe access to allocated memory
#2132 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Prototype a design to ensure asynchronous operations on different streams work nicely with `cuda::launch`
#2143 commented on
Nov 20, 2024 • 0 new comments -
Publish `cuda-cooperative` Python package
#2148 commented on
Nov 20, 2024 • 0 new comments -
Roll out address stability optimization
#2403 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Multi-dimesional TMA exposure
#39 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] std::simd support in libcu++
#30 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Refactor user-facing Thrust types to use libcu++
#34 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Evaluate and retune radix sort and merge sort
#55 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Implement 1D TMA support in memcpy_async
#58 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Explore potential improvements for DeviceMemcpy::Batched
#59 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Add `thread_scope_cluster` and implement `atomic(_ref)<thread_scope_cluster>`
#73 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Add support for remote shared memory to barrier<thread_scope_block>
#75 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Provide `std::dims`
#2810 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Provide `submdspan` for padded layouts
#2809 commented on
Nov 20, 2024 • 0 new comments -
[EPIC]: Add Hopper features to `cuda::ptx`
#1340 commented on
Nov 21, 2024 • 0 new comments -
Benchmark and plot before/after performance of BabelStream based on `thrust::transform`
#2717 commented on
Nov 21, 2024 • 0 new comments -
[Do Not Merge] Implement `<ranges>`
#198 commented on
Nov 19, 2024 • 0 new comments -
Add compute-sanitizer testing to CI.
#1879 commented on
Nov 20, 2024 • 0 new comments -
Add simple kernel for deterministic reduction
#2234 commented on
Nov 21, 2024 • 0 new comments -
`basic_any`: a utility for defining type-erasing wrappers in terms of an interface description
#2633 commented on
Nov 22, 2024 • 0 new comments -
Add CI with STF MathLib builds
#2651 commented on
Nov 19, 2024 • 0 new comments -
Configure `workflow-run-job-linux` to use sccache-dist build cluster
#2672 commented on
Nov 25, 2024 • 0 new comments -
Add missing template parameter to BlockRadixRank example.
#2736 commented on
Nov 19, 2024 • 0 new comments -
[WIP] Support fancy iterators in cuda.parallel
#2788 commented on
Nov 23, 2024 • 0 new comments -
backport std integer comparison functions to C++11
#2805 commented on
Nov 22, 2024 • 0 new comments -
Try to avoid UB in `thrust::reference`
#2813 commented on
Nov 25, 2024 • 0 new comments -
new type-erased memory resources
#2824 commented on
Nov 22, 2024 • 0 new comments -
Test proclaim_copyable_arguments for lambdas
#2833 commented on
Nov 23, 2024 • 0 new comments -
[BUG]: MSVC < 2022 doesn't properly handle thrust's member function detector.
#1731 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] RAPIDS Should not need to patch CCCL
#1939 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Extend cub::DeviceSegmentedSort to support custom comparators
#1577 commented on
Nov 20, 2024 • 0 new comments -
Benchmark regression testing
#2011 commented on
Nov 20, 2024 • 0 new comments -
Standard abstraction for specifying thread grid hierarchy and dimensions
#2037 commented on
Nov 20, 2024 • 0 new comments -
Provide cmath for floating_point<M,E>
#2161 commented on
Nov 20, 2024 • 0 new comments -
Make bfloat alias for `floating_point<8,7>` that dispatches to accelerated instructions where possible
#2182 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Consider compiling CCCL tests with enabled warnings for local memory and register spilling
#2253 commented on
Nov 20, 2024 • 0 new comments -
[DOC]: Clarity check of CUB device-scope docs
#2323 commented on
Nov 20, 2024 • 0 new comments -
Return algorithm diagrams to CUB docs
#2319 commented on
Nov 20, 2024 • 0 new comments -
Make half_t alias for `floating_point<5,10>` that dispatches to accelerated instructions where possible
#2181 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] CUDA Next asynchronous programming model
#2041 commented on
Nov 20, 2024 • 0 new comments -
[DOC]: Clarity check of CUB block-scope docs
#2324 commented on
Nov 20, 2024 • 0 new comments -
[DOC]: Clarity check of CUB warp-scope docs
#2325 commented on
Nov 20, 2024 • 0 new comments -
[THEME] CUDA Runtime Modernization
#1646 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Abstract build step in cccl.c.parallel
#2525 commented on
Nov 20, 2024 • 0 new comments -
[EXTERNAL] Contribute back Numba documentation updates
#2722 commented on
Nov 18, 2024 • 0 new comments -
[DOC]: `copy_n` docs formatting bug
#2748 commented on
Nov 20, 2024 • 0 new comments -
[EPIC]: Improve documentation content by adopting diataxis
#2841 commented on
Nov 20, 2024 • 0 new comments -
[EPIC]: Fix sphinx
#2807 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] make the Thrust parallel algorithms available in `cuda::std`
#2818 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Implement cuda.parallel.device_for
#2537 commented on
Nov 20, 2024 • 0 new comments -
[DOC]: Design approach to CCCL documentation authoring
#2327 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Add more benchmarks for `thrust::transform`
#2814 commented on
Nov 20, 2024 • 0 new comments -
[THEME] Add support for 128b `atomic_ref` in device code.
#2048 commented on
Nov 20, 2024 • 0 new comments -
[EPIC]: `mdspan`-based algorithms
#2471 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Improve separation and elminate redundancy between Thrust and CUB
#24 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Consolidate kernels between Thrust and CUB
#26 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] CUB Performance Tuning
#27 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] CUB Test Catch2 Migration
#28 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Port CUB device-scope algorithm tests to use Catch2
#29 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Extended Floating-Point Support
#31 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Replace & refactor Thrust/CUB types w/ libcu++
#33 commented on
Nov 20, 2024 • 0 new comments -
[THEME] Hopper CUDA C++ Feature Exposure
#35 commented on
Nov 20, 2024 • 0 new comments -
[EPIC]: Unify testing infrastructure
#2806 commented on
Nov 19, 2024 • 0 new comments -
[FEA]: Make cuda.parallel available on PyPI
#2555 commented on
Nov 19, 2024 • 0 new comments -
[FEA]: Match cuda.parallel support matrix with CuPy
#2554 commented on
Nov 19, 2024 • 0 new comments -
[FEA]: Implement cuda.parallel version of radix sort
#2552 commented on
Nov 19, 2024 • 0 new comments -
[FEA]: Implement cuda.parallel version of partition
#2551 commented on
Nov 19, 2024 • 0 new comments -
[FEA]: Implement cuda.parallel version of select
#2550 commented on
Nov 19, 2024 • 0 new comments -
[FEA]: Implement cuda.parallel.segmented_reduce algorithm
#2549 commented on
Nov 19, 2024 • 0 new comments -
[FEA]: Implement cuda.parallel.merge_sort algorithm
#2546 commented on
Nov 19, 2024 • 0 new comments -
[FEA]: Implement cuda.parallel version of segmented radix sort
#2553 commented on
Nov 19, 2024 • 0 new comments -
[FEA]: Provide `mdarray`
#2474 commented on
Nov 19, 2024 • 0 new comments -
[DOC]: Format function signatures
#2339 commented on
Nov 19, 2024 • 0 new comments -
[FEA]: Research stateful operators for cuda.parallel
#2538 commented on
Nov 19, 2024 • 0 new comments -
[FEA]: Implement cuda.parallel version of `cub::DoubleBuffer`
#2548 commented on
Nov 19, 2024 • 0 new comments -
[FEA]: Support fancy iterators in cuda.parallel
#2479 commented on
Nov 20, 2024 • 0 new comments -
[BUG]: Proclaiming copyable arguments for lambdas fails to compile
#2834 commented on
Nov 20, 2024 • 0 new comments -
[DOC]: `thrust::partition_copy` docs are wrong
#2792 commented on
Nov 20, 2024 • 0 new comments -
[DOC]: `thrust::stable_sort` docs are wrong
#2747 commented on
Nov 20, 2024 • 0 new comments -
[EPIC]: Improve usability of architecture specific features in libcudacxx
#1083 commented on
Nov 20, 2024 • 0 new comments -
Redesign libcudacxx architecture specific testing
#1084 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Add SM90 to architecture list for CI
#1092 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Expand CUB APIs for user-specified algorithmic properties
#1186 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Clarify support for CCCL headers with host-only translation units
#1374 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Move rapids repositories away from thrust types in API interfaces
#1382 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Roadmap for cuda/memory_resource
#1502 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Third-party testing in CI
#1507 commented on
Nov 20, 2024 • 0 new comments -
[EPIC]: Reproducible floating-point reductions
#1558 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Run tests through `compute-sanitizer` in CI
#1618 commented on
Nov 20, 2024 • 0 new comments -
[EPIC]: Setup nightly/weekly CI
#1619 commented on
Nov 20, 2024 • 0 new comments -
Provide generic implementation of `floating_point<M, E>` limited to arithmetic operations
#1666 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Optimize `thrust::transform` for newer architectures
#1947 commented on
Nov 20, 2024 • 0 new comments -
Create generic floating-point wrapping type for <NumExponentBits, NumMantissaBits>
#1665 commented on
Nov 20, 2024 • 0 new comments -
Modern versions of cudaMemset and cudaMemcpy
#2000 commented on
Nov 20, 2024 • 0 new comments -
`cuda::launch` kernel-launch API
#2038 commented on
Nov 20, 2024 • 0 new comments -
[EPIC]: Atomics Improvements
#2047 commented on
Nov 20, 2024 • 0 new comments -
[EPIC]: TMA Exposure
#36 commented on
Nov 20, 2024 • 0 new comments -
[EPIC]: Hopper Cluster support
#37 commented on
Nov 20, 2024 • 0 new comments -
[EPIC]: 1D TMA/BULK exposure via `memcpy_async`
#38 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Fork/Join Parallel Ranges
#44 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Heterogeneous, sequential ranges support
#45 commented on
Nov 20, 2024 • 0 new comments -
[THEME] Asynchronous Parallel Algorithms
#46 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Universal 64-bit index type support in Thrust/CUB algorithms
#47 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: CUB large input support
#50 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Docs Overhaul
#51 commented on
Nov 20, 2024 • 0 new comments -
[EPIC]: Implement `<cuda/std/ranges>` and associated headers
#61 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] GitHub Actions CI
#68 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Track future breaking changes
#101 commented on
Nov 20, 2024 • 0 new comments -
[EPIC] Setup Windows container and script infrastructure
#248 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Vectorize memory operations in CUB algorithms
#307 commented on
Nov 20, 2024 • 0 new comments -
[FEA]: Automate release process
#653 commented on
Nov 20, 2024 • 0 new comments -
`thrust::all_of` is slower than a naive reduction
#720 commented on
Nov 20, 2024 • 0 new comments -
Add additional protections against undefined uses of CUDA extended lambdas
#1004 commented on
Nov 20, 2024 • 0 new comments