Skip to content

[Draft] improve package load time#647

Open
giordano-lucas wants to merge 1 commit intomainfrom
bugfix/optimise-package-load
Open

[Draft] improve package load time#647
giordano-lucas wants to merge 1 commit intomainfrom
bugfix/optimise-package-load

Conversation

@giordano-lucas
Copy link
Member

@giordano-lucas giordano-lucas commented Jan 8, 2026

Summary by CodeRabbit

Release Notes

  • Performance Improvements
    • Optimized application startup with faster module initialization, reducing initial load time for telemetry, actions, and browser automation capabilities.
    • Deferred loading of non-essential components ensures improved application responsiveness and reduced memory overhead during startup.
    • Complete backward compatibility maintained—all existing features continue to work as expected with no breaking changes.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 8, 2026

Walkthrough

This pull request introduces lazy initialization mechanisms across four modules to defer component setup until first use. The telemetry module defers posthog/scarf client creation and installation ID generation. The SDK init module replaces eager imports of action classes with runtime lazy-loading via a module-level getattr. The client module replaces GotoAction instantiation with a lightweight dictionary-based approach. The sessions module defers Playwright sync/async module loading checks until methods are actually called, reducing initialization overhead and dependencies at import time.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

The changes span four distinct modules with heterogeneous lazy-loading patterns that require separate reasoning: telemetry with state-managed initialization, SDK with module-level getattr dispatch, client with action descriptor replacement, and sessions with tri-state Playwright availability tracking. The modifications introduce behavioral changes affecting import-time vs. runtime initialization, require verification of backwards compatibility (e.g., INSTALLATION_ID alias), and demand careful review of state management and error handling implications across deferred initialization paths.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title 'improve package load time' is fully aligned with the changeset, which introduces lazy initialization and lazy-loading mechanisms across multiple modules specifically to reduce import time and defer initialization.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @packages/notte-core/src/notte_core/common/telemetry.py:
- Around line 205-206: The call to track_package_download({_installation_id,
"triggered_by":"first_telemetry_use"}) conflates package-download semantics with
telemetry initialization; change it to explicitly represent initialization and
ensure it only fires once per installation by (a) renaming the event to
something like "first_telemetry_initialized" instead of using
track_package_download, and (b) adding a per-installation guard keyed by
_installation_id (persisted flag or lookup) so the instrumentation only emits
once; update the invocation site that currently references _installation_id and
track_package_download to call the new telemetry event function (or
track_package_download with a renamed event) and add the once-per-installation
check before calling.
- Line 210: The module currently binds INSTALLATION_ID = _installation_id at
import time which freezes it to the initial empty string; replace that alias
with a lazy accessor function so callers get the initialized value: remove the
assignment INSTALLATION_ID = _installation_id and instead define
INSTALLATION_ID() (or get_installation_id()) that calls
_ensure_telemetry_initialized() and then returns _installation_id; keep the
existing _ensure_telemetry_initialized() logic unchanged and update any internal
call sites that expect INSTALLATION_ID as a string to call the accessor (or add
a short wrapper if you must keep a string-like API).
- Around line 191-203: The lazy initializer _ensure_telemetry_initialized is not
thread-safe; add a module-level threading.Lock (e.g., _init_lock =
threading.Lock()) and wrap the initialization block in with _init_lock:, using
the double-check pattern (re-check _telemetry_initialized and DISABLE_TELEMETRY
inside the lock) before calling get_or_create_installation_id(),
setup_posthog(), and setup_scarf(), then set _telemetry_initialized = True while
still under the lock to prevent concurrent initialization and duplicate client
creation.
🧹 Nitpick comments (1)
packages/notte-sdk/src/notte_sdk/__init__.py (1)

79-107: Consider deriving _action_classes from a single source of truth.

The action class names are now defined in three places: the TYPE_CHECKING imports (lines 15-42), __all__ (lines 51-76), and _action_classes (lines 80-107). This creates maintenance overhead and risk of drift.

Consider extracting to a single source:

_action_classes = frozenset({
    "CaptchaSolve",
    # ... all action names
})

# Then in __all__:
__all__ = [
    "NotteClient",
    "RemoteSession",
    "RemoteAgent",
    "retry",
    "generate_cookies",
    *_action_classes,
]

The TYPE_CHECKING block must remain separate for type checker support, but at least __all__ could be derived from _action_classes.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 9cd006c and 3d4e5de.

📒 Files selected for processing (4)
  • packages/notte-core/src/notte_core/common/telemetry.py
  • packages/notte-sdk/src/notte_sdk/__init__.py
  • packages/notte-sdk/src/notte_sdk/client.py
  • packages/notte-sdk/src/notte_sdk/endpoints/sessions.py
🧰 Additional context used
🧬 Code graph analysis (2)
packages/notte-sdk/src/notte_sdk/client.py (1)
packages/notte-sdk/src/notte_sdk/endpoints/sessions.py (16)
  • execute (1253-1255)
  • execute (1257-1257)
  • execute (1259-1261)
  • execute (1263-1265)
  • execute (1267-1269)
  • execute (1271-1273)
  • execute (1275-1277)
  • execute (1279-1281)
  • execute (1283-1283)
  • execute (1285-1287)
  • execute (1289-1291)
  • execute (1293-1295)
  • execute (1297-1299)
  • execute (1301-1301)
  • execute (1303-1305)
  • execute (1307-1309)
packages/notte-sdk/src/notte_sdk/endpoints/sessions.py (2)
packages/notte-browser/src/notte_browser/playwright.py (1)
  • playwright (56-59)
packages/notte-sdk/src/notte_sdk/client.py (1)
  • NotteClient (26-190)
🪛 Ruff (0.14.10)
packages/notte-sdk/src/notte_sdk/__init__.py

125-125: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (11)
packages/notte-sdk/src/notte_sdk/client.py (1)

187-189: LGTM! Clean switch to dictionary-based action invocation.

The change from GotoAction(url=url) to execute(type="goto", url=url) correctly leverages the existing GotoActionDict overload, maintaining type safety while avoiding the eager import of the action class. Error handling is preserved.

packages/notte-sdk/src/notte_sdk/__init__.py (2)

110-125: LGTM! Well-implemented lazy loading with proper caching.

The __getattr__ implementation correctly:

  • Uses O(1) set lookup for action class names
  • Imports from the actions module only when requested
  • Caches in globals() for subsequent fast access
  • Raises AttributeError following standard module convention

The static analysis hint (TRY003) about the exception message is a minor style preference; the current message follows Python's standard module AttributeError format.


13-42: Good use of TYPE_CHECKING for type-safe lazy loading.

This pattern correctly provides type hints for static analysis tools and IDEs while avoiding the runtime import cost of 26+ action classes with their Pydantic schemas.

packages/notte-sdk/src/notte_sdk/endpoints/sessions.py (7)

85-90: Lazy loading state tracking looks correct.

The tri-state pattern (bool | None) correctly distinguishes between "not yet checked", "available", and "not available". Note that there's a minor race condition if multiple threads access page or apage concurrently before the first load completes—both may attempt the import simultaneously. This is benign (imports are idempotent in Python and the GIL protects against data corruption), but worth documenting if thread safety becomes a concern.


92-106: LGTM!

The lazy loading helper correctly defers the import until first use and caches the result. The early return on _playwright_available is not None prevents redundant import attempts.


108-121: LGTM!

Symmetric implementation with the sync counterpart. Correctly handles the async Playwright import lazily.


1040-1043: LGTM!

The lazy loading call is correctly placed before checking availability, and the error message provides clear installation instructions.


1063-1065: LGTM!

The assertion correctly narrows the type for the type checker. Since _playwright_context is initialized in the preceding conditional block (lines 1055-1058), it's guaranteed to be non-None at this point, but the type checker needs the explicit assertion to track this.


1100-1103: LGTM!

Consistent with the sync page property implementation. Correctly defers the async Playwright import until first access.


1122-1124: LGTM!

Symmetric with the sync version. The assertion correctly provides type narrowing for the async playwright context.

packages/notte-core/src/notte_core/common/telemetry.py (1)

121-122: LGTM: Lazy initialization integration points are correct.

The calls to _ensure_telemetry_initialized() are properly placed before telemetry usage, and the direct use of _installation_id in event properties is correct. The guard flag prevents any circular initialization issues.

Also applies to: 230-231, 242-242, 252-252

Comment on lines +191 to +203
def _ensure_telemetry_initialized() -> None:
"""Initialize telemetry clients lazily on first use."""
global posthog_client, scarf_client, _installation_id, _telemetry_initialized

if _telemetry_initialized or DISABLE_TELEMETRY:
return

# Get installation ID (does file I/O)
_installation_id = get_or_create_installation_id()
posthog_client = setup_posthog()
scarf_client = setup_scarf()
_telemetry_initialized = True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check if the package is used in multi-threaded contexts

# Search for threading/async usage patterns that might call telemetry concurrently
echo "=== Threading/Async patterns ===" 
rg -n --type=py -C3 'threading\.(Thread|Lock)|asyncio\.(create_task|gather)|concurrent\.futures' | head -60

echo -e "\n=== Web framework usage ===" 
# Search for web framework usage that might run in multi-threaded mode
rg -n --type=py -C3 'from\s+(flask|django|fastapi|starlette)' | head -40

Repository: nottelabs/notte

Length of output: 2969


🏁 Script executed:

#!/bin/bash
# Description: Check telemetry module context and usage

# Get package type and description
echo "=== Package setup info ===" 
cat packages/notte-core/setup.py | head -50

echo -e "\n=== Telemetry module imports and full structure ===" 
head -50 packages/notte-core/src/notte_core/common/telemetry.py

echo -e "\n=== Check for synchronization mechanisms (locks) in telemetry module ===" 
rg -n 'Lock|RLock|Semaphore|threading' packages/notte-core/src/notte_core/common/telemetry.py

Repository: nottelabs/notte

Length of output: 1908


🏁 Script executed:

#!/bin/bash
# Description: Find all calls to _ensure_telemetry_initialized

echo "=== Calls to _ensure_telemetry_initialized ===" 
rg -n '_ensure_telemetry_initialized' --type=py

echo -e "\n=== Context around calls ===" 
rg -n '_ensure_telemetry_initialized' --type=py -B2 -A2

Repository: nottelabs/notte

Length of output: 1680


🏁 Script executed:

#!/bin/bash
# Description: Examine the full telemetry module to understand initialization context

echo "=== Full telemetry module (sections around calls) ===" 
# Get the functions that call _ensure_telemetry_initialized
sed -n '100,140p' packages/notte-core/src/notte_core/common/telemetry.py
echo "---"
sed -n '220,240p' packages/notte-core/src/notte_core/common/telemetry.py

echo -e "\n=== Global variables at top of module ===" 
sed -n '40,80p' packages/notte-core/src/notte_core/common/telemetry.py

Repository: nottelabs/notte

Length of output: 3390


🏁 Script executed:

#!/bin/bash
# Description: Check if telemetry functions are used in websocket/threading contexts

echo "=== Check if telemetry used in websocket start() ===" 
rg -n 'def start|_ensure_telemetry_initialized|log_event|capture' packages/notte-sdk/src/notte_sdk/websockets/base.py -B1 -A3

echo -e "\n=== Check if telemetry decorator applied to threaded functions ===" 
rg -n '@.*telemetry|@track|@capture' --type=py | head -20

echo -e "\n=== Check notte_core package description ===" 
cat packages/notte-core/pyproject.toml | grep -A20 'description\|long_description\|\[project\]' | head -30

Repository: nottelabs/notte

Length of output: 2974


🏁 Script executed:

#!/bin/bash
# Description: Check if initialization could race from multiple entry points

echo "=== Check if decorator is applied to public-facing functions ===" 
sed -n '150,190p' packages/notte-core/src/notte_core/common/telemetry.py

echo -e "\n=== Look for the track_event decorator ===" 
rg -n 'def track_event|def log_event|def capture_event' packages/notte-core/src/notte_core/common/telemetry.py -A10

Repository: nottelabs/notte

Length of output: 2156


Add thread safety synchronization to prevent race conditions in lazy initialization.

The lazy initialization pattern is vulnerable to race conditions in multi-threaded contexts. The codebase explicitly uses threading (e.g., notte-sdk/websockets/base.py), and the @track_usage decorator is applied to 20+ public SDK methods that can be called from concurrent threads. If two threads call _ensure_telemetry_initialized() simultaneously, both will see _telemetry_initialized = False and proceed to initialize, causing duplicate client creation, race conditions in file I/O (get_or_create_installation_id()), and potential duplicate tracking events.

Use a lock to protect the initialization:

import threading

_init_lock = threading.Lock()

def _ensure_telemetry_initialized() -> None:
    """Initialize telemetry clients lazily on first use (thread-safe)."""
    global posthog_client, scarf_client, _installation_id, _telemetry_initialized

    if _telemetry_initialized or DISABLE_TELEMETRY:
        return

    with _init_lock:
        # Double-check pattern to avoid redundant work
        if _telemetry_initialized:
            return
        
        _installation_id = get_or_create_installation_id()
        posthog_client = setup_posthog()
        scarf_client = setup_scarf()
        _telemetry_initialized = True
🤖 Prompt for AI Agents
In @packages/notte-core/src/notte_core/common/telemetry.py around lines 191 -
203, The lazy initializer _ensure_telemetry_initialized is not thread-safe; add
a module-level threading.Lock (e.g., _init_lock = threading.Lock()) and wrap the
initialization block in with _init_lock:, using the double-check pattern
(re-check _telemetry_initialized and DISABLE_TELEMETRY inside the lock) before
calling get_or_create_installation_id(), setup_posthog(), and setup_scarf(),
then set _telemetry_initialized = True while still under the lock to prevent
concurrent initialization and duplicate client creation.

Comment on lines +205 to +206
if _installation_id:
track_package_download(_installation_id, {"triggered_by": "first_telemetry_use"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reconsider unconditional download tracking on first telemetry use.

Calling track_package_download on every telemetry initialization conflates "first telemetry use" with "package download". This may send misleading analytics, as capturing an event isn't the same as downloading/upgrading the package. While the version-checking logic inside track_package_download guards against some duplicates, the semantics here are confusing.

Consider either:

  1. Remove this call entirely and rely on explicit download tracking at appropriate points
  2. Rename to a different event like "first_telemetry_initialized" to clarify intent
  3. Add logic to ensure this only runs once per installation (not per initialization)
🤖 Prompt for AI Agents
In @packages/notte-core/src/notte_core/common/telemetry.py around lines 205 -
206, The call to track_package_download({_installation_id,
"triggered_by":"first_telemetry_use"}) conflates package-download semantics with
telemetry initialization; change it to explicitly represent initialization and
ensure it only fires once per installation by (a) renaming the event to
something like "first_telemetry_initialized" instead of using
track_package_download, and (b) adding a per-installation guard keyed by
_installation_id (persisted flag or lookup) so the instrumentation only emits
once; update the invocation site that currently references _installation_id and
track_package_download to call the new telemetry event function (or
track_package_download with a renamed event) and add the once-per-installation
check before calling.



# Backwards compatibility alias
INSTALLATION_ID = _installation_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Backwards compatibility alias is broken.

The alias INSTALLATION_ID = _installation_id is created at module import time when _installation_id is still "". In Python, this creates a binding to the empty string value, not to the variable itself. When _installation_id is later reassigned in _ensure_telemetry_initialized(), INSTALLATION_ID remains an empty string forever, completely breaking backwards compatibility for any code relying on INSTALLATION_ID.

🔧 Proposed fix

Option 1 (Recommended): Make INSTALLATION_ID a lazy property function

-# Backwards compatibility alias
-INSTALLATION_ID = _installation_id
+def get_installation_id() -> str:
+    """Get installation ID, initializing telemetry if needed."""
+    _ensure_telemetry_initialized()
+    return _installation_id
+
+# Backwards compatibility - though this returns a function, not a string
+INSTALLATION_ID = get_installation_id  # Users would need to call INSTALLATION_ID()

Option 2: Remove the alias and document the breaking change

If backwards compatibility isn't critical, document that users should access _installation_id directly after calling _ensure_telemetry_initialized(), or use functions that trigger initialization.

Option 3: Update the alias after initialization

Add this at the end of _ensure_telemetry_initialized():

     # Check for first-time package download tracking
     if _installation_id:
         track_package_download(_installation_id, {"triggered_by": "first_telemetry_use"})
+    
+    # Update backwards compatibility alias
+    global INSTALLATION_ID
+    INSTALLATION_ID = _installation_id

Note: Option 3 still has issues since INSTALLATION_ID would be empty until first use.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @packages/notte-core/src/notte_core/common/telemetry.py at line 210, The
module currently binds INSTALLATION_ID = _installation_id at import time which
freezes it to the initial empty string; replace that alias with a lazy accessor
function so callers get the initialized value: remove the assignment
INSTALLATION_ID = _installation_id and instead define INSTALLATION_ID() (or
get_installation_id()) that calls _ensure_telemetry_initialized() and then
returns _installation_id; keep the existing _ensure_telemetry_initialized()
logic unchanged and update any internal call sites that expect INSTALLATION_ID
as a string to call the accessor (or add a short wrapper if you must keep a
string-like API).

@giordano-lucas giordano-lucas changed the title improve package load time [Draft] improve package load time Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant