Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

@Andy-Jost Andy-Jost commented Dec 17, 2025

Summary

  • Fixes build failures when inferred CUDA major version does not match CUDA headers being compiled against
  • Introduces _get_cuda_major_version() which is used for both:
    1. Determining which cuda-bindings version to install as a build dependency
    2. Setting CUDA_CORE_BUILD_MAJOR for Cython compile-time conditionals
  • This ensures consistency: the installed cuda-bindings always matches the compile target

Changes

The version is derived from (in order of priority):

  1. CUDA_CORE_BUILD_MAJOR env var (explicit override, e.g. in CI)
  2. CUDA_VERSION macro in cuda.h from CUDA_PATH or CUDA_HOME

Since CUDA_PATH or CUDA_HOME is required for the build anyway (to provide include directories), the cuda.h header should always be available.

If neither the env var nor the headers are available, the build fails with a clear error message.

Test plan

  • Unit tests: pytest tests/test_build_hooks.py -v --noconftest
  • CI tests pass

@Andy-Jost Andy-Jost added bug Something isn't working P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Dec 17, 2025
@Andy-Jost Andy-Jost self-assigned this Dec 17, 2025
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Dec 17, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Contributor Author

/ok to test 0957f91

@Andy-Jost Andy-Jost added enhancement Any code-related improvements P1 Medium priority - Should do and removed bug Something isn't working P0 High priority - Must do! labels Dec 17, 2025
@Andy-Jost Andy-Jost added this to the cuda.core beta 11 milestone Dec 17, 2025
@github-actions
Copy link

@Andy-Jost Andy-Jost force-pushed the build-major-from-headers branch from 0957f91 to ff5644a Compare December 17, 2025 16:37
@Andy-Jost
Copy link
Contributor Author

/ok to test ff5644a

@kkraus14
Copy link
Collaborator

  • Fixes build failures when cuda-bindings reports major version 13 but CUDA headers are version 12, causing missing enum errors for CU_MEM_LOCATION_TYPE_NONE and CU_MEM_ALLOCATION_TYPE_MANAGED

Is this not a broken environment? cuda-bindings would presumably end up calling into v12.x DSOs which have a different abi than v13.x? What situation are we looking to support here?

@Andy-Jost
Copy link
Contributor Author

  • Fixes build failures when cuda-bindings reports major version 13 but CUDA headers are version 12, causing missing enum errors for CU_MEM_LOCATION_TYPE_NONE and CU_MEM_ALLOCATION_TYPE_MANAGED

Is this not a broken environment? cuda-bindings would presumably end up calling into v12.x DSOs which have a different abi than v13.x? What situation are we looking to support here?

When creating an environment with conda create -n test cuda-version=12 and then running pip install cuda-bindings, I end up with cuda-bindings 13.x:

% conda list cuda
# packages in environment at /home/scratch.ajost_sw/miniforge3/envs/test:
#
# Name                    Version                   Build  Channel
cuda-bindings             13.1.1                   pypi_0    pypi
cuda-version              12.9                 h4f385c5_3    conda-forge

(As an aside, if I specify both packages up front with conda create -n test cuda-version=12 cuda-bindings I get cuda-bindings 12.x instead. I wouldn’t have expected a difference between installing it during or after environment creation, but that’s what happens.)

This setup shouldn’t inherently be a problem. Users generally expect that newer releases (like cuda-bindings 13.x) work with older CUDA toolkits due to backward compatibility guarantees. In practice, cuda-bindings should detect and adapt to the underlying CUDA 12 APIs.

Anecdotally, this configuration has worked fine for me for months with no runtime instability, though it may not be explicitly supported. However, a recent change broke this workflow, requiring either cuda-bindings 12.x or setting CUDA_CORE_BUILD_MAJOR=12 manually when building cuda-core.

Because cuda-core discovers cuda.h relative to CUDA_HOME or CUDA_PATH, it doesn’t make sense to tie CUDA_CORE_BUILD_MAJOR to the cuda-bindings version. It’s more consistent to derive it from the version indicated by the headers.

So the case we want to support is:

  • The user has an older CUDA toolkit (e.g. 12.x).
  • The user installs the latest cuda-bindings and expects it to work due to backward compatibility.

The proposed fix ensures cuda-core builds correctly in this situation by decoupling its build version logic from the installed cuda-bindings.

@kkraus14
Copy link
Collaborator

When creating an environment with conda create -n test cuda-version=12 and then running pip install cuda-bindings, I end up with cuda-bindings 13.x:

% conda list cuda
# packages in environment at /home/scratch.ajost_sw/miniforge3/envs/test:
#
# Name                    Version                   Build  Channel
cuda-bindings             13.1.1                   pypi_0    pypi
cuda-version              12.9                 h4f385c5_3    conda-forge

(As an aside, if I specify both packages up front with conda create -n test cuda-version=12 cuda-bindings I get cuda-bindings 12.x instead. I wouldn’t have expected a difference between installing it during or after environment creation, but that’s what happens.)

Unfortunately, the Python packaging ecosystem is a mess, but this is expected. Conda packages and pip packages are two entirely separate things that aren't necessarily equivalent or compatible with each other. In our case, conda packages can be used for packaging non-python code, i.e. for the CUDA Toolkit native libraries. The cuda-version conda package has a constraint on the __cuda virtual conda package which detects the version of the toolkit that is compatible with the driver running on the system. Pip unfortunately doesn't have these capabilities (we are trying to change that with https://wheelnext.dev/) so there's no way to control the version of cuda-bindings resolved from a pip install command based on the driver version.

This setup shouldn’t inherently be a problem. Users generally expect that newer releases (like cuda-bindings 13.x) work with older CUDA toolkits due to backward compatibility guarantees. In practice, cuda-bindings should detect and adapt to the underlying CUDA 12 APIs.

How do we handle API breaking changes across major versions like 12.x and 13.x? The underlying CTK libraries only guarantee their API and ABI stability within a major version. If any API has a signature change from 12.x --> 13.x, which flavor of the API should we have for Python? Should we dynamically adjust our Python API at runtime based on the detected driver version available on the system? What if someone wants to specifically target the 12.x API and run on a 13.x+ driver? There's a lot of open questions here where the supported path for now is that the cuda-bindings package version follows the API and ABI of same major version of the CTK.

Anecdotally, this configuration has worked fine for me for months with no runtime instability, though it may not be explicitly supported. However, a recent change broke this workflow, requiring either cuda-bindings 12.x or setting CUDA_CORE_BUILD_MAJOR=12 manually when building cuda-core.

Because cuda-core discovers cuda.h relative to CUDA_HOME or CUDA_PATH, it doesn’t make sense to tie CUDA_CORE_BUILD_MAJOR to the cuda-bindings version. It’s more consistent to derive it from the version indicated by the headers.

The problem with this is that cuda-core uses the cuda-bindings Cython implementation within it. I.E. in your environment as described above, I imagine this would cause an issue: https://github.com/NVIDIA/cuda-python/blob/main/cuda_core/cuda/core/experimental/_device.pyx#L1097-L1100 since it's trying to use an externed cuDeviceGetUuid_v2 API from cuda.h, which exists in CUDA 12.9, but doesn't exist as of CUDA 13.0 in either cuda.h or in cydriver.pxd.

Because cuda-core discovers cuda.h relative to CUDA_HOME or CUDA_PATH, it doesn’t make sense to tie CUDA_CORE_BUILD_MAJOR to the cuda-bindings version. It’s more consistent to derive it from the version indicated by the headers.

So the case we want to support is:

  • The user has an older CUDA toolkit (e.g. 12.x).
  • The user installs the latest cuda-bindings and expects it to work due to backward compatibility.

The proposed fix ensures cuda-core builds correctly in this situation by decoupling its build version logic from the installed cuda-bindings.

cuda-core only uses cuda.h indirectly via the cuda-bindings Cython APIs, which extern APIs from cuda.h and other CUDA headers. But again, as described above, we currently need to match the cuda-bindings and cuda.h (and other CUDA headers) major versions in order to match the APIs.

The backward compatibility guarantees that CUDA makes and we follow are the following:

  • For the driver library, API backward and forward compatibility within a major version
  • For the driver library, ABI backward compatibility forever and forward compatibility within a major version
    • We currently don't support ABI backward compatibility across major versions in cuda.bindings driver modules today, but hope to in the future
  • For toolkit libraries, API backward and forward compatibility within a major version
  • For the toolkit libraries, ABI backward and forward compatibility within a major version

@Andy-Jost
Copy link
Contributor Author

Andy-Jost commented Dec 17, 2025

@kkraus14 Thanks for the additional details. In my view, deriving CUDA_CORE_BUILD_MAJOR from the headers that cuda-core actually compiles against is a strict improvement, since it allows previously failing environments to build without weakening the official guidance about matching major versions.

I'd like to suggest the following:

  1. We commit this change because it turns a hard build failure into a successful build likely producing a working configuration in an environment that users can realistically end up in.
  2. As a follow-on change, we add import-time checking to flag unsupported version combinations and issue an appropriate warning.

WDYT

Edit: For (2) please see #1412

Fixes build failures when cuda-bindings reports a different major version
than the CUDA headers being compiled against.

The new _get_cuda_major_version() function is used for both:
1. Determining which cuda-bindings version to install as a build dependency
2. Setting CUDA_CORE_BUILD_MAJOR for Cython compile-time conditionals

Version is derived from (in order of priority):
1. CUDA_CORE_BUILD_MAJOR env var (explicit override, e.g. in CI)
2. CUDA_VERSION macro in cuda.h from CUDA_PATH or CUDA_HOME

Since CUDA_PATH or CUDA_HOME is required for the build anyway, the
cuda.h header should always be available, ensuring consistency between
the installed cuda-bindings and the compile-time conditionals.
@Andy-Jost Andy-Jost force-pushed the build-major-from-headers branch from a669246 to 1fb09d4 Compare December 19, 2025 19:22
@Andy-Jost
Copy link
Contributor Author

/ok to test 1fb09d4

@Andy-Jost Andy-Jost changed the title fix: derive CUDA_CORE_BUILD_MAJOR from headers instead of bindings version fix: derive CUDA major version from headers for build Dec 19, 2025
@Andy-Jost
Copy link
Contributor Author

I took another stab at this after carefully going through the build logic. The key takeaway is that the major build version must always match the version in cuda.h or else the build breaks. There can be no deviation from this rule.

Because one of CUDA_PATH or CUDA_HOME is required, and because cuda.h for Cython compilation is always found relative to that path, we have no choice other than to derive the CUDA major version from cuda.h. The CUDA_CORE_BUILD_MAJOR environment variable still exists to force a specific version when needed.

We don't need to inspect the current cuda-bindings or nvidia-smi versions, because if they disagree with cuda.h we will get a compile error anyway.

Note on runtime vs. build-time versions: In an isolated build environment (the default for pip), cuda-bindings is installed into an isolated area while building cuda-core. However, at runtime cuda-bindings is loaded from the user's environment. These versions might not match. We should follow up with additional changes to inspect versions at runtime and issue warnings as needed.

@Andy-Jost
Copy link
Contributor Author

/ok to test af0f397

@leofang
Copy link
Member

leofang commented Dec 23, 2025

Sorry for my late reply. I'd like to provide some contexts for this PR before the holidays. This is unfortunately not a code review.

As mentioned in the last Thursday team-sync, I suggested to Andy that we should revive the closed PR that Ralf and I worked on (#1085). Ultimately, it'd allow us to use pathfinder at build time to get the CUDA headers (#692).

We have two classes of UX to address:

  1. pip install .: This is the most intuitive UX for public developers wanting to quickly spin a local build.
    • In this case, the build frontend creates a build-isolation environment and installs cuda-bindings and other build-time dependencies therein
    • IIRC we reached a wrong conclusion in the meeting. The editable installation (-e) still creates a build isolation environment, so pip install -e . also belongs to this category.
    • Make cuda-bindings buildable against CTK wheels? #692 will make this UX very very good, because one can build from source without any local CTK, but it also means that setting CUDA_PATH is not meaningful. In fact, users would not know where the build-isolation env and therefore the CUDA headers reside (but within the build system we can apply tricks to locate them).
  2. pip install --no-build-isolation .: This is suitable for experienced developers (ex: our team) who already have everything installed in the local environment, and wants to instruct the build system to reuse what's out there.
    • In this case, we want to use the local cuda.h and cuda-bindings, which should be assumed version-locked.

In any case, we want to build some heuristics to install everything consistently, and give users an escape hatch to overwrite manually. Part of the heuristics is to do one thing that pip cannot do today (which led to a long discussion above): Detect the local driver version so that we use cuda.h/cuda-bindings from the same major CTK version. The driver can be newer than CTK or any CUDA sw stack, but not the other way around (with big asterisks).

For example, installing cuda.bindings 13.x and building cuda-core against it, then running cuda-core on a system with CUDA 12.x driver, is a user error. In this case, we want a build-time check in addition to run-time check (#1412) to at least warn users early. The escape hatch can then be used to ignore the warning and/or the compile-time error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants