Conversation
WalkthroughThis pull request introduces lazy initialization mechanisms across four modules to defer component setup until first use. The telemetry module defers posthog/scarf client creation and installation ID generation. The SDK init module replaces eager imports of action classes with runtime lazy-loading via a module-level getattr. The client module replaces GotoAction instantiation with a lightweight dictionary-based approach. The sessions module defers Playwright sync/async module loading checks until methods are actually called, reducing initialization overhead and dependencies at import time. Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes The changes span four distinct modules with heterogeneous lazy-loading patterns that require separate reasoning: telemetry with state-managed initialization, SDK with module-level getattr dispatch, client with action descriptor replacement, and sessions with tri-state Playwright availability tracking. The modifications introduce behavioral changes affecting import-time vs. runtime initialization, require verification of backwards compatibility (e.g., INSTALLATION_ID alias), and demand careful review of state management and error handling implications across deferred initialization paths. 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @packages/notte-core/src/notte_core/common/telemetry.py:
- Around line 205-206: The call to track_package_download({_installation_id,
"triggered_by":"first_telemetry_use"}) conflates package-download semantics with
telemetry initialization; change it to explicitly represent initialization and
ensure it only fires once per installation by (a) renaming the event to
something like "first_telemetry_initialized" instead of using
track_package_download, and (b) adding a per-installation guard keyed by
_installation_id (persisted flag or lookup) so the instrumentation only emits
once; update the invocation site that currently references _installation_id and
track_package_download to call the new telemetry event function (or
track_package_download with a renamed event) and add the once-per-installation
check before calling.
- Line 210: The module currently binds INSTALLATION_ID = _installation_id at
import time which freezes it to the initial empty string; replace that alias
with a lazy accessor function so callers get the initialized value: remove the
assignment INSTALLATION_ID = _installation_id and instead define
INSTALLATION_ID() (or get_installation_id()) that calls
_ensure_telemetry_initialized() and then returns _installation_id; keep the
existing _ensure_telemetry_initialized() logic unchanged and update any internal
call sites that expect INSTALLATION_ID as a string to call the accessor (or add
a short wrapper if you must keep a string-like API).
- Around line 191-203: The lazy initializer _ensure_telemetry_initialized is not
thread-safe; add a module-level threading.Lock (e.g., _init_lock =
threading.Lock()) and wrap the initialization block in with _init_lock:, using
the double-check pattern (re-check _telemetry_initialized and DISABLE_TELEMETRY
inside the lock) before calling get_or_create_installation_id(),
setup_posthog(), and setup_scarf(), then set _telemetry_initialized = True while
still under the lock to prevent concurrent initialization and duplicate client
creation.
🧹 Nitpick comments (1)
packages/notte-sdk/src/notte_sdk/__init__.py (1)
79-107: Consider deriving_action_classesfrom a single source of truth.The action class names are now defined in three places: the
TYPE_CHECKINGimports (lines 15-42),__all__(lines 51-76), and_action_classes(lines 80-107). This creates maintenance overhead and risk of drift.Consider extracting to a single source:
_action_classes = frozenset({ "CaptchaSolve", # ... all action names }) # Then in __all__: __all__ = [ "NotteClient", "RemoteSession", "RemoteAgent", "retry", "generate_cookies", *_action_classes, ]The
TYPE_CHECKINGblock must remain separate for type checker support, but at least__all__could be derived from_action_classes.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (4)
packages/notte-core/src/notte_core/common/telemetry.pypackages/notte-sdk/src/notte_sdk/__init__.pypackages/notte-sdk/src/notte_sdk/client.pypackages/notte-sdk/src/notte_sdk/endpoints/sessions.py
🧰 Additional context used
🧬 Code graph analysis (2)
packages/notte-sdk/src/notte_sdk/client.py (1)
packages/notte-sdk/src/notte_sdk/endpoints/sessions.py (16)
execute(1253-1255)execute(1257-1257)execute(1259-1261)execute(1263-1265)execute(1267-1269)execute(1271-1273)execute(1275-1277)execute(1279-1281)execute(1283-1283)execute(1285-1287)execute(1289-1291)execute(1293-1295)execute(1297-1299)execute(1301-1301)execute(1303-1305)execute(1307-1309)
packages/notte-sdk/src/notte_sdk/endpoints/sessions.py (2)
packages/notte-browser/src/notte_browser/playwright.py (1)
playwright(56-59)packages/notte-sdk/src/notte_sdk/client.py (1)
NotteClient(26-190)
🪛 Ruff (0.14.10)
packages/notte-sdk/src/notte_sdk/__init__.py
125-125: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (11)
packages/notte-sdk/src/notte_sdk/client.py (1)
187-189: LGTM! Clean switch to dictionary-based action invocation.The change from
GotoAction(url=url)toexecute(type="goto", url=url)correctly leverages the existingGotoActionDictoverload, maintaining type safety while avoiding the eager import of the action class. Error handling is preserved.packages/notte-sdk/src/notte_sdk/__init__.py (2)
110-125: LGTM! Well-implemented lazy loading with proper caching.The
__getattr__implementation correctly:
- Uses O(1) set lookup for action class names
- Imports from the actions module only when requested
- Caches in
globals()for subsequent fast access- Raises
AttributeErrorfollowing standard module conventionThe static analysis hint (TRY003) about the exception message is a minor style preference; the current message follows Python's standard module
AttributeErrorformat.
13-42: Good use ofTYPE_CHECKINGfor type-safe lazy loading.This pattern correctly provides type hints for static analysis tools and IDEs while avoiding the runtime import cost of 26+ action classes with their Pydantic schemas.
packages/notte-sdk/src/notte_sdk/endpoints/sessions.py (7)
85-90: Lazy loading state tracking looks correct.The tri-state pattern (
bool | None) correctly distinguishes between "not yet checked", "available", and "not available". Note that there's a minor race condition if multiple threads accesspageorapageconcurrently before the first load completes—both may attempt the import simultaneously. This is benign (imports are idempotent in Python and the GIL protects against data corruption), but worth documenting if thread safety becomes a concern.
92-106: LGTM!The lazy loading helper correctly defers the import until first use and caches the result. The early return on
_playwright_available is not Noneprevents redundant import attempts.
108-121: LGTM!Symmetric implementation with the sync counterpart. Correctly handles the async Playwright import lazily.
1040-1043: LGTM!The lazy loading call is correctly placed before checking availability, and the error message provides clear installation instructions.
1063-1065: LGTM!The assertion correctly narrows the type for the type checker. Since
_playwright_contextis initialized in the preceding conditional block (lines 1055-1058), it's guaranteed to be non-None at this point, but the type checker needs the explicit assertion to track this.
1100-1103: LGTM!Consistent with the sync
pageproperty implementation. Correctly defers the async Playwright import until first access.
1122-1124: LGTM!Symmetric with the sync version. The assertion correctly provides type narrowing for the async playwright context.
packages/notte-core/src/notte_core/common/telemetry.py (1)
121-122: LGTM: Lazy initialization integration points are correct.The calls to
_ensure_telemetry_initialized()are properly placed before telemetry usage, and the direct use of_installation_idin event properties is correct. The guard flag prevents any circular initialization issues.Also applies to: 230-231, 242-242, 252-252
| def _ensure_telemetry_initialized() -> None: | ||
| """Initialize telemetry clients lazily on first use.""" | ||
| global posthog_client, scarf_client, _installation_id, _telemetry_initialized | ||
|
|
||
| if _telemetry_initialized or DISABLE_TELEMETRY: | ||
| return | ||
|
|
||
| # Get installation ID (does file I/O) | ||
| _installation_id = get_or_create_installation_id() | ||
| posthog_client = setup_posthog() | ||
| scarf_client = setup_scarf() | ||
| _telemetry_initialized = True | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check if the package is used in multi-threaded contexts
# Search for threading/async usage patterns that might call telemetry concurrently
echo "=== Threading/Async patterns ==="
rg -n --type=py -C3 'threading\.(Thread|Lock)|asyncio\.(create_task|gather)|concurrent\.futures' | head -60
echo -e "\n=== Web framework usage ==="
# Search for web framework usage that might run in multi-threaded mode
rg -n --type=py -C3 'from\s+(flask|django|fastapi|starlette)' | head -40Repository: nottelabs/notte
Length of output: 2969
🏁 Script executed:
#!/bin/bash
# Description: Check telemetry module context and usage
# Get package type and description
echo "=== Package setup info ==="
cat packages/notte-core/setup.py | head -50
echo -e "\n=== Telemetry module imports and full structure ==="
head -50 packages/notte-core/src/notte_core/common/telemetry.py
echo -e "\n=== Check for synchronization mechanisms (locks) in telemetry module ==="
rg -n 'Lock|RLock|Semaphore|threading' packages/notte-core/src/notte_core/common/telemetry.pyRepository: nottelabs/notte
Length of output: 1908
🏁 Script executed:
#!/bin/bash
# Description: Find all calls to _ensure_telemetry_initialized
echo "=== Calls to _ensure_telemetry_initialized ==="
rg -n '_ensure_telemetry_initialized' --type=py
echo -e "\n=== Context around calls ==="
rg -n '_ensure_telemetry_initialized' --type=py -B2 -A2Repository: nottelabs/notte
Length of output: 1680
🏁 Script executed:
#!/bin/bash
# Description: Examine the full telemetry module to understand initialization context
echo "=== Full telemetry module (sections around calls) ==="
# Get the functions that call _ensure_telemetry_initialized
sed -n '100,140p' packages/notte-core/src/notte_core/common/telemetry.py
echo "---"
sed -n '220,240p' packages/notte-core/src/notte_core/common/telemetry.py
echo -e "\n=== Global variables at top of module ==="
sed -n '40,80p' packages/notte-core/src/notte_core/common/telemetry.pyRepository: nottelabs/notte
Length of output: 3390
🏁 Script executed:
#!/bin/bash
# Description: Check if telemetry functions are used in websocket/threading contexts
echo "=== Check if telemetry used in websocket start() ==="
rg -n 'def start|_ensure_telemetry_initialized|log_event|capture' packages/notte-sdk/src/notte_sdk/websockets/base.py -B1 -A3
echo -e "\n=== Check if telemetry decorator applied to threaded functions ==="
rg -n '@.*telemetry|@track|@capture' --type=py | head -20
echo -e "\n=== Check notte_core package description ==="
cat packages/notte-core/pyproject.toml | grep -A20 'description\|long_description\|\[project\]' | head -30Repository: nottelabs/notte
Length of output: 2974
🏁 Script executed:
#!/bin/bash
# Description: Check if initialization could race from multiple entry points
echo "=== Check if decorator is applied to public-facing functions ==="
sed -n '150,190p' packages/notte-core/src/notte_core/common/telemetry.py
echo -e "\n=== Look for the track_event decorator ==="
rg -n 'def track_event|def log_event|def capture_event' packages/notte-core/src/notte_core/common/telemetry.py -A10Repository: nottelabs/notte
Length of output: 2156
Add thread safety synchronization to prevent race conditions in lazy initialization.
The lazy initialization pattern is vulnerable to race conditions in multi-threaded contexts. The codebase explicitly uses threading (e.g., notte-sdk/websockets/base.py), and the @track_usage decorator is applied to 20+ public SDK methods that can be called from concurrent threads. If two threads call _ensure_telemetry_initialized() simultaneously, both will see _telemetry_initialized = False and proceed to initialize, causing duplicate client creation, race conditions in file I/O (get_or_create_installation_id()), and potential duplicate tracking events.
Use a lock to protect the initialization:
import threading
_init_lock = threading.Lock()
def _ensure_telemetry_initialized() -> None:
"""Initialize telemetry clients lazily on first use (thread-safe)."""
global posthog_client, scarf_client, _installation_id, _telemetry_initialized
if _telemetry_initialized or DISABLE_TELEMETRY:
return
with _init_lock:
# Double-check pattern to avoid redundant work
if _telemetry_initialized:
return
_installation_id = get_or_create_installation_id()
posthog_client = setup_posthog()
scarf_client = setup_scarf()
_telemetry_initialized = True🤖 Prompt for AI Agents
In @packages/notte-core/src/notte_core/common/telemetry.py around lines 191 -
203, The lazy initializer _ensure_telemetry_initialized is not thread-safe; add
a module-level threading.Lock (e.g., _init_lock = threading.Lock()) and wrap the
initialization block in with _init_lock:, using the double-check pattern
(re-check _telemetry_initialized and DISABLE_TELEMETRY inside the lock) before
calling get_or_create_installation_id(), setup_posthog(), and setup_scarf(),
then set _telemetry_initialized = True while still under the lock to prevent
concurrent initialization and duplicate client creation.
| if _installation_id: | ||
| track_package_download(_installation_id, {"triggered_by": "first_telemetry_use"}) |
There was a problem hiding this comment.
Reconsider unconditional download tracking on first telemetry use.
Calling track_package_download on every telemetry initialization conflates "first telemetry use" with "package download". This may send misleading analytics, as capturing an event isn't the same as downloading/upgrading the package. While the version-checking logic inside track_package_download guards against some duplicates, the semantics here are confusing.
Consider either:
- Remove this call entirely and rely on explicit download tracking at appropriate points
- Rename to a different event like "first_telemetry_initialized" to clarify intent
- Add logic to ensure this only runs once per installation (not per initialization)
🤖 Prompt for AI Agents
In @packages/notte-core/src/notte_core/common/telemetry.py around lines 205 -
206, The call to track_package_download({_installation_id,
"triggered_by":"first_telemetry_use"}) conflates package-download semantics with
telemetry initialization; change it to explicitly represent initialization and
ensure it only fires once per installation by (a) renaming the event to
something like "first_telemetry_initialized" instead of using
track_package_download, and (b) adding a per-installation guard keyed by
_installation_id (persisted flag or lookup) so the instrumentation only emits
once; update the invocation site that currently references _installation_id and
track_package_download to call the new telemetry event function (or
track_package_download with a renamed event) and add the once-per-installation
check before calling.
|
|
||
|
|
||
| # Backwards compatibility alias | ||
| INSTALLATION_ID = _installation_id |
There was a problem hiding this comment.
Critical: Backwards compatibility alias is broken.
The alias INSTALLATION_ID = _installation_id is created at module import time when _installation_id is still "". In Python, this creates a binding to the empty string value, not to the variable itself. When _installation_id is later reassigned in _ensure_telemetry_initialized(), INSTALLATION_ID remains an empty string forever, completely breaking backwards compatibility for any code relying on INSTALLATION_ID.
🔧 Proposed fix
Option 1 (Recommended): Make INSTALLATION_ID a lazy property function
-# Backwards compatibility alias
-INSTALLATION_ID = _installation_id
+def get_installation_id() -> str:
+ """Get installation ID, initializing telemetry if needed."""
+ _ensure_telemetry_initialized()
+ return _installation_id
+
+# Backwards compatibility - though this returns a function, not a string
+INSTALLATION_ID = get_installation_id # Users would need to call INSTALLATION_ID()Option 2: Remove the alias and document the breaking change
If backwards compatibility isn't critical, document that users should access _installation_id directly after calling _ensure_telemetry_initialized(), or use functions that trigger initialization.
Option 3: Update the alias after initialization
Add this at the end of _ensure_telemetry_initialized():
# Check for first-time package download tracking
if _installation_id:
track_package_download(_installation_id, {"triggered_by": "first_telemetry_use"})
+
+ # Update backwards compatibility alias
+ global INSTALLATION_ID
+ INSTALLATION_ID = _installation_idNote: Option 3 still has issues since INSTALLATION_ID would be empty until first use.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @packages/notte-core/src/notte_core/common/telemetry.py at line 210, The
module currently binds INSTALLATION_ID = _installation_id at import time which
freezes it to the initial empty string; replace that alias with a lazy accessor
function so callers get the initialized value: remove the assignment
INSTALLATION_ID = _installation_id and instead define INSTALLATION_ID() (or
get_installation_id()) that calls _ensure_telemetry_initialized() and then
returns _installation_id; keep the existing _ensure_telemetry_initialized()
logic unchanged and update any internal call sites that expect INSTALLATION_ID
as a string to call the accessor (or add a short wrapper if you must keep a
string-like API).
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.