Skip to content

Conversation

@rwgk
Copy link
Collaborator

@rwgk rwgk commented Dec 2, 2025

This was the initial attempt — continuing the work under #1377 (starting over fresh from main)

rwgk added 8 commits December 1, 2025 16:24
[UNTESTED] This commit moves all implementation files from the experimental
namespace to the core namespace. The experimental/ directory is kept for
backward compatibility stubs (to be added in a later commit).

Files moved:
- All .py, .pyx, .pxd files from experimental/ to core/
- _memory/, _utils/, include/ subdirectories
- __init__.pxd renamed to __init__experimental.pxd

This is a file move only - imports will be updated in the next commit.
[UNTESTED] This commit updates all import statements in the moved files
to reference the new cuda.core.* paths instead of cuda.core.experimental.*.

Updated:
- Python files (.py): _module.py, _graph.py, _linker.py, _program.py, _system.py, utils.py
- Cython files (.pyx): _device.pyx, _stream.pyx, _launcher.pyx, _launch_config.pyx, _event.pyx
- Memory subdirectory: all .py, .pyx, and .pxd files
- Utils subdirectory: cuda_utils.pyx
- Other files: _kernel_arg_handler.pyx, _memoryview.pyx

All imports now use the new non-experimental paths.
[UNTESTED] This commit updates cuda/core/__init__.py to export all the
symbols that were moved from experimental namespace. The imports now
reference the new cuda.core.* paths instead of cuda.core.experimental.*.

Exported symbols:
- Device, Event, EventOptions
- Graph, GraphBuilder, GraphCompleteOptions, GraphDebugPrintOptions
- LaunchConfig, launch
- Linker, LinkerOptions
- Memory classes: Buffer, DeviceMemoryResource, etc.
- Kernel, ObjectCode
- Program, ProgramOptions
- Stream, StreamOptions
- System, system
- utils module
…warnings

[UNTESTED] This commit creates backward compatibility stubs in the
experimental namespace that forward imports to the new cuda.core.*
locations and emit deprecation warnings.

Features:
- All public symbols are re-exported from new locations
- DeprecationWarning is emitted when the experimental module is imported
- __getattr__ provides lazy forwarding for submodules
- Clear migration message in docstring

The stubs ensure backward compatibility while encouraging users to
migrate to the new import paths.
[UNTESTED] This commit updates all test files to import from the new
cuda.core.* paths instead of cuda.core.experimental.*.

Updated 28 test files:
- All test_*.py files
- Helper files in tests/helpers/
- Memory IPC test files
- Example test files

Tests now exercise the new API paths directly, ensuring the migration
is complete. Backward compatibility will be tested separately.
[UNTESTED] Update the remaining .pyx test file to use cuda.core.* imports.
[UNTESTED] This commit adds comprehensive tests for the experimental
namespace backward compatibility stubs.

Tests verify:
- Experimental imports still work and emit deprecation warnings
- Symbols from experimental are the same objects as from core
- Direct imports from experimental work correctly
- Utils module and options classes are accessible
- Memory classes and other public APIs work
- Objects can be instantiated through experimental namespace

These tests ensure backward compatibility while the migration happens.
[UNTESTED] Update all example files to import from cuda.core instead
of cuda.core.experimental. Examples should demonstrate the new
recommended import paths.
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Dec 2, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

rwgk added 8 commits December 1, 2025 16:48
[UNTESTED] Update build_hooks.py to look for Cython extensions in
cuda/core/ instead of cuda/core/experimental/. This fixes the build
system to work with the moved files.
[UNTESTED] Update _context.pyx to import from cuda.core._utils.cuda_utils
instead of cuda.core.experimental._utils.cuda_utils. This fixes a
circular import issue during module initialization.
[UNTESTED] Update _memory/_legacy.py to import from cuda.core._utils.cuda_utils
instead of cuda.core.experimental._utils.cuda_utils. This fixes a
circular import issue.
[UNTESTED] Update _memory/_virtual_memory_resource.py to import from
cuda.core._utils.cuda_utils instead of cuda.core.experimental._utils.cuda_utils.
This fixes a circular import issue.
[UNTESTED] Update cuda_python_test_helpers to import from cuda.core._utils
instead of cuda.core.experimental._utils.
[UNTESTED] Update test files to access internal modules via cuda.core._*
instead of cuda.core.experimental._*. This fixes test failures after
the namespace migration.
[UNTESTED] Make utils accessible as cuda.core.experimental.utils submodule
for backward compatibility. Also fix test_device_id to use cuda.core.system
instead of cuda.core.experimental.system.
[UNTESTED] Clear cached experimental module before testing to ensure
warnings are emitted. Also improve assertion message.

# Should emit deprecation warning
assert len(w) >= 1
assert issubclass(w[0].category, DeprecationWarning)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to leverage with pytest.deprecated_call https://docs.pytest.org/en/stable/reference/reference.html#pytest.deprecated_call

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cursor-agent:

Good catch! I've updated the tests to use pytest.deprecated_call() instead of manual warning catching. This is cleaner and more idiomatic pytest usage.


human:

I'll work in this auto-pilot mode until I get everything passing here (CI). Then I'll review myself.

"_linker", "_memory", "_module", "_program", "_stream", "_system",
"_utils", "_context", "_dlpack", "_kernel_arg_handler",
"_launch_config", "_memoryview"):
_warn_deprecated()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this trigger a warning on each call as opposed to just at import time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cursor-agent:

The deprecation warning is only emitted once at import time when cuda.core.experimental is first imported, not on each attribute access. This is because:

  • The warning is emitted at module initialization (line 64: _warn_deprecated())
  • utils is already set as an attribute in the module namespace (line 68), so accessing it doesn't go through __getattr__
  • Only accessing submodules via __getattr__ (like _device, _utils) triggers additional warnings

I've added documentation to clarify this behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The call to _warn_deprecated in this function will never cause a warning to be issued because the warning was already issued at module scope when loading this module. It should be removed.
  2. Do we really intend to publish all submodules by stripping the leading underscores? That would make public many things that are currently private.
  3. This implementation pays for the extra attribute look-up with each access. What about something like the following?
            # Fallback to underscore-prefixed name
            attr = importlib.import_module(f"mymodule.{name}")
            # Set the attribute globally to avoid __getattr__ next time
            setattr(sys.modules[__name__], name, attr)
            # alternative: globals()[name] = attr
            return attr

However, at that point, it might be better to just say this:

from .. import _device, _event, # etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer strongly that we just remove this entirely. Underscored modules are never considered public APIs.

This merge brings in the latest changes from main (commit a4b285a)
including the peer access control for DeviceMemoryResource (NVIDIA#1289).

The merge combines:
- Latest main changes (a4b285a)
- Experimental namespace deprecation work from this branch

All files remain moved from cuda.core.experimental.* to cuda.core.*
with backward compatibility stubs maintained.
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 2, 2025

Remark: I haven't looked at any of the changes myself yet. Everything to this point was done by cursor-agent. I just gave it a few prompts to explain what I want to do.

rwgk added 3 commits December 1, 2025 21:00
…ng behavior

- Replace manual warning catching with pytest.deprecated_call() for cleaner tests
- Add documentation clarifying that deprecation warnings are only emitted
  at module import time, not on each attribute access
- Update test_experimental_utils_module docstring to explain warning behavior
Update import from cuda.core.experimental._utils.cuda_utils to
cuda.core._utils.cuda_utils to match the new module structure.
@rwgk rwgk self-assigned this Dec 2, 2025
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 2, 2025

/ok to test

- Update cython/build_tests.sh to use cuda/core/include instead of cuda/core/experimental/include
- Update test_memory_peer_access.py to import from cuda.core instead of cuda.core.experimental
- Update test_module.py to use cuda.core.LaunchConfig instead of cuda.core.experimental.LaunchConfig
@rwgk rwgk force-pushed the cuda_core_experimental_deprecation_v0 branch from 45fa0bc to dfa0f0e Compare December 2, 2025 06:00
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 2, 2025

/ok to test

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

@Andy-Jost
Copy link
Contributor

Andy-Jost commented Dec 3, 2025

Another approach would be to change absolute imports to relative, e.g.:

- from cuda.core.experimental._utils.cuda_utils import driver
+ from ._utils.cuda_utils import driver

An advantage of this is that it decouples the code changes from the file movements, so it could be checked into the current main without eliminating the experimental namespace. After that, the files could be moved without changing any import statements.

I've been planning to propose this change but haven't had time to ask the team for input yet.

Resolved conflicts by updating imports from cuda.core.experimental.* to cuda.core.*
to match the branch's deprecation work. Updated all source files and test files
to use the new import paths.
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 9, 2025

/ok to test

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 9, 2025

Regarding absolute vs relative imports:

I've been following the direct guidance of @kkraus14, which he gave me/us around the time when I started working on cuda-pathfinder (~spring this year). It was that we should use absolute imports throughout.

For things like this, I usually put my personal preference aside completely. There will always be valid arguments one way or the other. In the end what I think is most valuable is consistency, ideally as enforced by tooling (with overrides that must be explicit, e.g. noqa).

I'd be a big fan of adding TID252 to our ruff.toml, and then agree that everything that passes that check is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to document that everything under cuda_core/cuda/core/include/ are considered internal (for now). There is currently no plan to maintain any non-Python (such as C, C++, and Cython) public APIs in cuda-core, and when we do we will revise this document.

Resolved conflicts by updating imports from cuda.core.experimental.* to cuda.core.*
to match the branch's deprecation work. Updated all source files and test files
to use the new import paths. Added new StridedLayout imports from experimental
namespace as they are new features from main that haven't been migrated yet.
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 10, 2025

/ok to test

rwgk added 5 commits December 15, 2025 08:31
Merged latest changes from main including:
- Added size property to StridedMemoryView (NVIDIA#1363)
- NVML bindings updates (NVIDIA#1360)
- cuda_pathfinder pixi configuration (NVIDIA#1365, NVIDIA#1361)

All changes merged cleanly with no conflicts.
- Moved _layout.pxd and _layout.pyx from experimental/ to cuda/core/
- Updated all imports from cuda.core.experimental._layout to cuda.core._layout
- Added StridedLayout to cuda.core.__init__.py exports
- Added backward compatibility stub in experimental/__init__.py
- Fixed docstring references in _linker.py and _module.py
- Updated pyproject.toml package-data path
- Updated docs conf.py excluded_dirs

All code has been migrated from cuda.core.experimental except backward
compatibility stubs.
The TYPE_CHECKING block was unnecessary since all symbols are already
imported at runtime below. Type checkers will see the types from the
runtime imports, making the TYPE_CHECKING block redundant duplication.
- Update issue templates to use cuda.core.Program instead of cuda.core.experimental.Program
- Update merge script to place versioned directories (cu12, cu13) under cuda/core/ instead of cuda/core/experimental/
- Fix duplicate line in merge script
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 15, 2025

/ok to test

The merge script was copying the entire cuda/core directory into versioned
subdirectories and then removing all Python modules from cuda/core, causing
ModuleNotFoundError when importing cuda.core.

Fix: Only copy version-specific binaries (.so, .pyd, .dll) into versioned
directories (cu12/, cu13/), and keep all Python modules in cuda/core/ where
they belong.

Since wheels no longer have experimental/ directory, we can simplify the merge
logic compared to the original approach which assumed experimental/ contained
only binaries.
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 16, 2025

/ok to test

The rglob("*") was recursively searching through all subdirectories,
including the cu12/ and cu13/ directories we just created. This caused
files from cuda/core/cu13/ to be copied to cuda/core/cu13/cu13/, creating
infinite recursion.

Fix: Skip any files that are in versioned directories (cu12/, cu13/)
when searching for binaries to copy.
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 16, 2025

/ok to test

Comment on lines +44 to +53
from cuda.core._memory import ( # noqa: E402
Buffer,
DeviceMemoryResource,
DeviceMemoryResourceOptions,
GraphMemoryResource,
LegacyPinnedMemoryResource,
MemoryResource,
VirtualMemoryResource,
VirtualMemoryResourceOptions,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list needs to be updated to capture a few new things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cython needs an exact filename __init__.pxd so as to find the current modules. This renaming does not make sense to me. What does it do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably an accident.

I had to git merge main already several times, a likely cause of accidents. The merges are messy. I'm using Cursor and it seems to miss some details sometimes.

This PR has become much more complex than it looked like initially (tests passed a couple weeks ago).

E.g. I don't understand why ci/tools/merge_cuda_core_wheels.py (last changed in Oct) worked before but now needs a change (it makes sense that it needs a change). Testing it locally isn't easy because the input wheels only live on the runners and disappear when the jobs finish. So I'd have to figure out how to build them locally (currently I'm not trying to).

I'll keep beating on it until I'll get all tests to pass again. Maybe I'll start over fresh. LLMs seems to be less successful when working incrementally.

Thanks for the hints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Leaving a quick note before signing off...

The merge script can be tested locally easily. Create two conda envs, one with CUDA 13 and another with CUDA 12, and build cuda.core wheels in each env via pip wheel --no-deps --no-build-isolation -v .. The cuda-bindings package can just be conda-installed together with the CTK for simplicity. No need to build it from source. Then, we take the two wheels and rename them before running the merge script. This logic is relevant:

# Rename wheel to include CUDA version suffix
mkdir -p "${{ env.CUDA_CORE_ARTIFACTS_DIR }}/cu${BUILD_PREV_CUDA_MAJOR}"
for wheel in ${{ env.CUDA_CORE_ARTIFACTS_DIR }}/*.whl; do
if [[ -f "${wheel}" ]]; then
base_name=$(basename "${wheel}" .whl)
new_name="${base_name}.cu${BUILD_PREV_CUDA_MAJOR}.whl"
mv "${wheel}" "${{ env.CUDA_CORE_ARTIFACTS_DIR }}/cu${BUILD_PREV_CUDA_MAJOR}/${new_name}"
echo "Renamed wheel to: ${new_name}"
fi
done

Once this PR (#1085) is revived/merged, we will be able to build cuda-core in a pure-wheel env and no need to use conda.

Comment on lines +87 to +88
# Add local include directory for cuda/core/include
local_include_dirs = ["cuda/core"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why is this needed? We do not keep any headers under cuda/core or cuda/core/experimental. If without adding this Cython runs into compilation errors, I suspect this has to do with the __init_experimental.pxd file below.

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 16, 2025

I'm continuing the work under the new PR #1377

I think it was the right decision to start over. Many things got lost in the repeated git merge main steps here:

Summary of Differences: cuda_core_experimental_deprecation_v0 vs cuda_core_experimental_deprecation_v1
========================================================================================================

This summary compares the cuda_core/ directory between the two branches to identify
important changes that might need to be transferred from v0 to v1.

KEY DIFFERENCES IDENTIFIED:
============================

1. **StridedLayout Naming**
   - v0: Uses public `StridedLayout` class
   - v1: Uses private `_StridedLayout` class (with leading underscore)
   - Impact: This is intentional - v1 follows main branch which privatized StridedLayout
   - Status: v1 is correct (matches main branch)

2. **New Memory Resources (from main)**
   - v1 has: `ManagedMemoryResource`, `ManagedMemoryResourceOptions`, 
             `PinnedMemoryResource`, `PinnedMemoryResourceOptions`
   - v0: Missing these
   - Impact: v1 includes newer features from main branch
   - Status: v1 is ahead (has newer features)

3. **StridedMemoryView Exports**
   - v1: Exports `StridedMemoryView` and `args_viewable_as_strided_memory` from `cuda.core`
   - v0: Missing these exports
   - Impact: v1 makes these utilities available at top level
   - Status: v1 is correct

4. **Device.get_all_devices() Method**
   - v1: Has `Device.get_all_devices()` classmethod
   - v0: Missing this method
   - Impact: v1 provides convenience method to get all devices
   - Status: v1 is ahead (has newer feature)

5. **CUDA_CORE_BUILD_MAJOR Type**
   - v1: Uses `int(_get_proper_cuda_bindings_major_version())` (integer comparison)
   - v0: Uses `_get_proper_cuda_bindings_major_version()` (string comparison)
   - v1: Cython code uses `IF CUDA_CORE_BUILD_MAJOR == 12:` (integer)
   - v0: Cython code uses `IF CUDA_CORE_BUILD_MAJOR == "12":` (string)
   - Impact: Type consistency fix
   - Status: v1 is correct (integer comparison is more appropriate)

6. **Include Directory Path**
   - v1: Uses `_include/` (with leading underscore for implementation detail)
   - v0: Uses `include/` (without underscore)
   - Impact: v1 follows the design decision to mark headers as implementation detail
   - Status: v1 is correct (matches user's requirement)

7. **pyproject.toml Package Data**
   - v1: `"cuda.core._include"` 
   - v0: `"cuda.core.include"`
   - Impact: Matches the _include directory rename
   - Status: v1 is correct

8. **build_hooks.py Comments**
   - v1: Updated comment to reference `_include/` instead of `include/`
   - v0: Older comment
   - Impact: Documentation accuracy
   - Status: v1 is correct

STATISTICS:
==========
- ~131 Python/Cython files changed
- Most changes are due to:
  * File moves (experimental/ -> root)
  * StridedLayout -> _StridedLayout renaming
  * Include path updates (include/ -> _include/)
  * New memory resource types added

CONCLUSION:
===========

The v1 branch appears to be MORE UP-TO-DATE than v0, incorporating:
- New memory resource types (ManagedMemoryResource, PinnedMemoryResource)
- Device.get_all_devices() convenience method
- Proper type handling for CUDA_CORE_BUILD_MAJOR
- Correct include directory naming (_include/)
- Proper exports of StridedMemoryView utilities

The main difference is that v0 still uses public `StridedLayout` while v1 uses private
`_StridedLayout`, but this is intentional as v1 follows the main branch which
privatized this class.

RECOMMENDATION:
===============
No important changes need to be transferred from v0 to v1. The v1 branch is
more complete and up-to-date with main branch changes. The v0 branch appears
to be an older incremental migration attempt, while v1 is a fresh, complete
migration that properly incorporates all the latest changes from main.

@leofang leofang removed this from the cuda.core beta 10 milestone Dec 16, 2025
@leofang
Copy link
Member

leofang commented Dec 17, 2025

Closing in favor of #1377.

@leofang leofang closed this Dec 17, 2025
@rwgk rwgk deleted the cuda_core_experimental_deprecation_v0 branch December 18, 2025 03:39
github-actions bot pushed a commit that referenced this pull request Dec 18, 2025
Removed preview folders for the following PRs:
- PR #1298
- PR #1397
- PR #1399
- PR #1401
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants