Skip to content

Comments

Add support for runtime color/gamma correction and new color profiles. Drastically optimized color LUT generation.#187

Merged
Gericom merged 26 commits intoGericom:developfrom
VeaNika:feature/dynamic-LUT-color-profiles
May 25, 2025
Merged

Add support for runtime color/gamma correction and new color profiles. Drastically optimized color LUT generation.#187
Gericom merged 26 commits intoGericom:developfrom
VeaNika:feature/dynamic-LUT-color-profiles

Conversation

@VeaNika
Copy link
Contributor

@VeaNika VeaNika commented May 1, 2025

This introduces a complete rewrite of the color correction LUT system in GBARunner3 with the following key improvements.

Summary of Changes

New runtime color correction system:

  • Replaces the old hardcoded matrix logic with selectable color profiles stored in ColorProfiles.h.
  • The color LUT is now generated at runtime by calling clut_generateColorLut(), and initialized at boot by clut_initColorCorrection through GbaDisplayConfigurationService.cpp. This allows color profile switching without the need of recompiling.

Gamma handling separated and optimized:

  • Introduces GammaLut.cpp with precomputed gamma encode and decode LUTs for optimal speed.
  • Gamma correction is still precalculated using floating point functions due to accuracy concerns, as fixed point math was not precise enough and led to poor visual results

Pipeline:

  • RGB color correction flows similarly to the previous implementation:

Extract RGB channels → Gamma encode (linear gamma) → Apply Luminance→ Apply RGB color profile from matrix→ Gamma decode (non-linear gamma) → Pack back to RGB555 + GBA green bit.

  • All components are easy to modify, test and extend.

Improved precision:

  • Slight improvement of RGB8↔RGB5 conversion formulas (Thanks to @folf20).
  • New gamma encode/decode steps preserve the color quality and prevent banding or blacks being crushed (Thanks to @folf20).

New Features

Selectable color profiles:

  • Each profile contains:

    • A 3x3 color correction matrix (1000 based fixed point), based on the shader RGB conversion values of libretro GBA shaders.
    • A luminance factor.
    • Color profiles follow this format:
const ColorProfile PRESET_NAME = 
{
    {
        { [r], [gr], [br] },
        { [rg], [g], [bg] },
        { [rb], [gb], [b] }
    },
    luminance
};
  • Can be swapped by calling clut_initColorCorrection(profile) passing a pointer to the desired ColorProfile.
  • Default profile Agb001 produces the same result as the previous color LUT precomputed method, and can now be changed at runtime.
  • Available color profile options:
    Agb001: AGB-001 GBA screen.
    Ags101: AGS-101 GBA SP bright screen.
    Oxy001: OXY-001 GB micro screen.
    Ntr001: NTR-001 NDS Phat screen.
    Usg001: NTR-001 NDS Phat screen.
    PspO1g: PSP-1000 screen.
    NswIps: Nintendo Switch Classics GBA shader.
    NswOle: Nintendo Switch Classics GBA shader in OLED screen.
    VbaEmu: VisualBoy Advance shader.
    NoCash: No$GBA DS Phat shader.
    mGba01: mGBA default shader.

New gamma decode levels:

  • 5 precomputed gamma decode tables are available:
    • Decode Gamma was chosen as the tunable parameter because changing it yields the most perceptible difference to the human eye.
    • The gamma level is loaded from the GBARunner3.json configuration file, following this format:
Index 0 = gamma 0.5f,
Index 1 = gamma 0.6f,
Index 2 = gamma 0.7f,
Index 3 = gamma 0.8f,
Index 4 = gamma 0.9f
Default 0
  • The default index 0 produces the same result as the previous color LUT precomputed method.
  • Set by calling setDisplayGammaIndex(index) via GbaDisplayConfigurationService.cpp.

Technical

Old Behavior

  • All the color correction LUT generation was hardcoded at compile-time.
  • Used floating-point std::pow() directly in the main process, making it difficult to replace with fixed point math.
  • Changing color profiles required recompiling the binary.

New Behavior

Fixed-point math:

  • All matrix operations use 1000-based fixed-point arithmetic.
  • Improves performance and avoids costly floating point operations on the DS.

Gamma LUTs:

  • Gamma is precomputed in smaller LUTs in a separate process:

  • Gamma Encode:

    • Transforms RGB8 to linear space using pow(r, TARGET_GAMMA + DARKEN_SCREEN).
    • Precomputed in a LUT to 2.f (gamma_encode_table) for performance.
  • Gamma Decode:

    • Converts corrected linear values back to sRGB using one of five precomputed_decode_tables.
    • More efficient and configurable than the previous single pow(x, 1.0 / DISPLAY_GAMMA) calculation.
  • Max steps for Gamma Decode can be changed in GAMMA_STEPS, and the gamma range for those steps in GAMMA_MIN GAMMA_MAX.

constexpr float TARGET_GAMMA = 2.f; // Default 2.2f
constexpr float DARKEN_SCREEN = 0.5f; // Default 0.5f

constexpr int GAMMA_STEPS = 5; // Min 1, Max 5. Default 5
constexpr float GAMMA_MIN = 0.5f; // default 0.5f
constexpr float GAMMA_MAX = 0.9f; // default 0.9f

TL;DR

  • Significant performance boost thanks to gamma LUTs and fixed-point matrix ops.
  • Color profiles and gamma can now be changed at runtime.
  • Easier to customize and expand, ideal for users who prefer different display styles.
  • Final binary size reduced to 201 KB

Known Issues

  • Only 5 gamma decode LUTs can fit currently due to VRAM limitations. <-- More can fit now, but 5 is sufficient.
  • Since VRAM is full, future improvements may require:
    • Moving LUTs or some code to WRAM. <-- Done
    • Benchmarking per-pixel calculations vs LUT in runtime. <-- we still could check this
  • Due to the mentioned issues, this cannot be merged into hicode yet. <-- it fits now, I've moved the LUT cache buffer to main ram.

Updated Configuration File

{
    "runSettings": {
        "enableWramICache": true,
        "enableEwramDCache": true,
        "skipBiosIntro": false
    },
    "displaySettings": {
        "gbaScreen": "top",
        "gbaColorCorrection": "Oxy001",
        "gbaDisplayGamma": 0,
        "gbaScreenBrightness": 16,
        "enableCenterAndMask": true,
        "centerOffsetX": 8,
        "centerOffsetY": 16,
        "maskWidth": 240,
        "maskHeight": 160,
        "borderImage": "default"
    } 
}

Feel free to ping me on Discord when your review notes are ready!

VeaNika added 11 commits April 25, 2025 16:18
… profiles

- Removed hardcoded static color LUT
- Reworked ColorLut to calculate color correction LUT at runtime
- Added precompiled gamma tables (0.3f – 0.7f range, 5 steps)
- Introduced dynamic color profiles selectable at runtime
- Preparations for configurable Color Profiles and [DISPLAY_GAMMA] steps
- Moved color profile definitions to Colorprofiles.h, ditched cpp
- Simplified access to profiles in header files
- 5 precomputed DISPLAY_GAMMA curve steps for now
@Gericom
Copy link
Owner

Gericom commented May 6, 2025

I see you changed the rgb8ToRgb5 function. This needs to be reverted. When the 2d engine converts from 5 to 6 bit, the lsb bit will always be zero (i.e. 31 -> 62). My original function took that into account for more accurate results. I tested color conversion extensively when I was working on pico launcher.

@Gericom
Copy link
Owner

Gericom commented May 6, 2025

Is there a specific reason to use 1000-based fixed point? Usually power of 2 based is more performant. I already have this fixed point class you can use for that: https://github.com/Gericom/GBARunner3/blob/develop/code/core/arm9/source/Core/Math/fixed.h
On the DS it's common to use a 1.19.12 format (i.e. base 4096)

@VeaNika
Copy link
Contributor Author

VeaNika commented May 7, 2025

I see you changed the rgb8ToRgb5 function. This needs to be reverted. When the 2d engine converts from 5 to 6 bit, the lsb bit will always be zero (i.e. 31 -> 62). My original function took that into account for more accurate results. I tested color conversion extensively when I was working on pico launcher.

Ah I see, I was thinking in terms of general practice, but that makes sense given the DS hardware behavior. Thanks for pointing it out; I’ll revert the change.

Is there a specific reason to use 1000-based fixed point? Usually power of 2 based is more performant. I already have this fixed point class you can use for that: https://github.com/Gericom/GBARunner3/blob/develop/code/core/arm9/source/Core/Math/fixed.h On the DS it's common to use a 1.19.12 format (i.e. base 4096)

Makes sense, I defaulted to 1000-based out of habit, but using your fixed-point class is clearly a better fit here. I’ll update the code accordingly.

Planning to get both changes in before the weekend!

@VeaNika
Copy link
Contributor Author

VeaNika commented May 10, 2025

Is there a specific reason to use 1000-based fixed point? Usually power of 2 based is more performant. I already have this fixed point class you can use for that: https://github.com/Gericom/GBARunner3/blob/develop/code/core/arm9/source/Core/Math/fixed.h On the DS it's common to use a 1.19.12 format (i.e. base 4096)

Done! I also updated the color matrices to the most recent ones from the libretro shaders. Feel free to ping me on Discord for your review!

VeaNika added 6 commits May 18, 2025 18:43
- Gamma value from settings is now passed in
`GbaDisplayConfigurationService::SetupColorCorrection`
- ColorLut remains in VRAM for now due to performance reasons,
future performance testing may justify moving ColorLut as well.
VeaNika added 2 commits May 24, 2025 19:07
- Removed unnecessary parentheses.
- Changed GammaLut values to u8.
- Fixed gbarunner9.ld indentation for consistency.
@Gericom Gericom merged commit 2414cf9 into Gericom:develop May 25, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants