Skip to content

lowoha_params.num_threads Parameter not properly used in multiple locations #20

@vishalMCE

Description

@vishalMCE

The lowoha_params structure includes a num_threads field, which is used for optimization decisions (like selecting tile sizes and partitioning strategies, ... etc), but it's not passed to the actual thread management in several parallel code paths. When I set params.num_threads = 2, the optimization logic sees and uses this value, but the backend library is still using all available CPU cores

Problem
I'm trying to control thread for LOWOHA matmul, but setting params.num_threads has no effect. The code always falls back to omp_get_max_threads() regardless of what I set

There are many places in the code where thread control is missing. Here are 4 examples:

  1. zendnnl_parallel_for utility function
    File: lowoha_matmul_utils.hpp:41
  2. Batch parallel partitioning (high FLOPS path)
    File: lowoha_matmul.cpp:253
  3. Batch parallel partitioning (low FLOPS path)
    File: lowoha_matmul.cpp:298
  4. AOCL-DLP and BLIS calls
    File: aocl_kernel.cpp:1050-1095

AOCL and BLIS libraries use OpenMP internally and respect omp_set_num_threads(), but we never call it before invoking their GEMM functions

// run_blis function - no thread control!
aocl_gemm_f32f32f32of32(...);
aocl_gemm_bf16s4f32of32(...);
aocl_gemm_bf16bf16f32obf16(...);

Suggested Fix

  • for OpenMP loops
    #pragma omp parallel for collapse(2) num_threads(num_threads)
  • for parallel-for utility
    template <class F>
    inline void zendnnl_parallel_for(const int64_t begin, const int64_t end,
                                     const int64_t grain_size, 
                                     const int64_t max_num_threads,  // Add this
                                     const F &f);
  • for AOCL/BLIS
    void run_blis(...) {
      omp_set_num_threads(lowoha_param.num_threads);  // Add this
      ...
    }

Change default behavior
Consider changing the default num_threads value in constructor from 0 to 1:

// in lowoha_common.hpp
lowoha_params() : dtypes(), postop_(), quant_params(), mem_format_a('n'),
    mem_format_b('n'), lowoha_algo(matmul_algo_t::none), num_threads(1) {}  // Changed from 0 to 1

This makes more sense when the thread count isn’t explicitly set; it defaults to single-threaded rather than using all available cores

Additional Info

  • Component: LOWOHA matmul operators
  • Affects: All backends (AOCL-DLP, BLIS, LibXSMM, oneDNN)

The purpose of this issue is to enable API-level thread control. Currently, threads can only be set via OMP_NUM_THREADS env variable; having params.num_threads work properly would allow per-call thread control

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions