Skip to content

[BUG]: cufile test_set_stats_level test fails in CI #1147

@mdboom

Description

@mdboom

Is this a duplicate?

Type of Bug

Something else

Component

cuda.pathfinder

Describe the bug

This test test_set_stats_level fails when run in CI. @chloechia4 and I have tried to get more information out by dumping the cufile.log after the run, but it appears to be empty. In order to not hold up #1060 any longer, I have marked this test as skip and am creating this issue to come back to it later.

=================================== FAILURES ===================================
_____________________________ test_set_stats_level _____________________________

    @pytest.mark.skipif(
        cufileVersionLessThan(1140), reason="cuFile parameter APIs require cuFile library version 13.0 or later"
    )
    def test_set_stats_level():
        """Test cuFile statistics level configuration."""
        # Initialize CUDA
        (err,) = cuda.cuInit(0)
        assert err == cuda.CUresult.CUDA_SUCCESS
    
        err, device = cuda.cuDeviceGet(0)
        assert err == cuda.CUresult.CUDA_SUCCESS
    
        err, ctx = cuda.cuDevicePrimaryCtxRetain(device)
        assert err == cuda.CUresult.CUDA_SUCCESS
        (err,) = cuda.cuCtxSetCurrent(ctx)
        assert err == cuda.CUresult.CUDA_SUCCESS
    
        # Open cuFile driver
>       cufile.driver_open()

tests/test_cufile.py:1926: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
cuda/bindings/cufile.pyx:2571: in cuda.bindings.cufile.driver_open
    cpdef driver_open():
cuda/bindings/cufile.pyx:2578: in cuda.bindings.cufile.driver_open
    check_status(__status__)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   raise cuFileError(status.err, status.cu_err)
E   cuda.bindings.cufile.cuFileError: DRIVER_NOT_INITIALIZED (5001): nvidia-fs driver is not loaded. Set allow_compat_mode to true in cufile.json file to enable compatible mode; CUDA status: CUDA_SUCCESS (0)

cuda/bindings/cufile.pyx:2467: cuFileError

How to Reproduce

Run the test_cufile.py test suite.

Expected behavior

Tests pass.

Operating System

No response

nvidia-smi output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions