-
Notifications
You must be signed in to change notification settings - Fork 234
fix: derive CUDA major version from headers for build #1395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/ok to test 0957f91 |
|
0957f91 to
ff5644a
Compare
|
/ok to test ff5644a |
Is this not a broken environment? |
When creating an environment with % conda list cuda
# packages in environment at /home/scratch.ajost_sw/miniforge3/envs/test:
#
# Name Version Build Channel
cuda-bindings 13.1.1 pypi_0 pypi
cuda-version 12.9 h4f385c5_3 conda-forge(As an aside, if I specify both packages up front with This setup shouldn’t inherently be a problem. Users generally expect that newer releases (like Anecdotally, this configuration has worked fine for me for months with no runtime instability, though it may not be explicitly supported. However, a recent change broke this workflow, requiring either Because So the case we want to support is:
The proposed fix ensures |
Unfortunately, the Python packaging ecosystem is a mess, but this is expected. Conda packages and pip packages are two entirely separate things that aren't necessarily equivalent or compatible with each other. In our case, conda packages can be used for packaging non-python code, i.e. for the CUDA Toolkit native libraries. The
How do we handle API breaking changes across major versions like 12.x and 13.x? The underlying CTK libraries only guarantee their API and ABI stability within a major version. If any API has a signature change from 12.x --> 13.x, which flavor of the API should we have for Python? Should we dynamically adjust our Python API at runtime based on the detected driver version available on the system? What if someone wants to specifically target the 12.x API and run on a 13.x+ driver? There's a lot of open questions here where the supported path for now is that the
The problem with this is that
The backward compatibility guarantees that CUDA makes and we follow are the following:
|
|
@kkraus14 Thanks for the additional details. In my view, deriving I'd like to suggest the following:
WDYT Edit: For (2) please see #1412 |
Fixes build failures when cuda-bindings reports a different major version than the CUDA headers being compiled against. The new _get_cuda_major_version() function is used for both: 1. Determining which cuda-bindings version to install as a build dependency 2. Setting CUDA_CORE_BUILD_MAJOR for Cython compile-time conditionals Version is derived from (in order of priority): 1. CUDA_CORE_BUILD_MAJOR env var (explicit override, e.g. in CI) 2. CUDA_VERSION macro in cuda.h from CUDA_PATH or CUDA_HOME Since CUDA_PATH or CUDA_HOME is required for the build anyway, the cuda.h header should always be available, ensuring consistency between the installed cuda-bindings and the compile-time conditionals.
a669246 to
1fb09d4
Compare
|
/ok to test 1fb09d4 |
|
I took another stab at this after carefully going through the build logic. The key takeaway is that the major build version must always match the version in Because one of We don't need to inspect the current Note on runtime vs. build-time versions: In an isolated build environment (the default for pip), |
|
/ok to test af0f397 |
|
Sorry for my late reply. I'd like to provide some contexts for this PR before the holidays. This is unfortunately not a code review. As mentioned in the last Thursday team-sync, I suggested to Andy that we should revive the closed PR that Ralf and I worked on (#1085). Ultimately, it'd allow us to use pathfinder at build time to get the CUDA headers (#692). We have two classes of UX to address:
In any case, we want to build some heuristics to install everything consistently, and give users an escape hatch to overwrite manually. Part of the heuristics is to do one thing that For example, installing |
Summary
_get_cuda_major_version()which is used for both:cuda-bindingsversion to install as a build dependencyCUDA_CORE_BUILD_MAJORfor Cython compile-time conditionalscuda-bindingsalways matches the compile targetChanges
The version is derived from (in order of priority):
CUDA_CORE_BUILD_MAJORenv var (explicit override, e.g. in CI)CUDA_VERSIONmacro incuda.hfromCUDA_PATHorCUDA_HOMESince
CUDA_PATHorCUDA_HOMEis required for the build anyway (to provide include directories), thecuda.hheader should always be available.If neither the env var nor the headers are available, the build fails with a clear error message.
Test plan
pytest tests/test_build_hooks.py -v --noconftest