feat: Switch to using Github app based access tokens instead of personal access token#10
feat: Switch to using Github app based access tokens instead of personal access token#10
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request switches the authentication mechanism from GitHub personal access tokens to GitHub App installation access tokens. The change introduces a JWT-based authentication flow where a GitHub App JWT is exchanged for short-lived installation access tokens (valid for 1 hour) on a per-repository basis.
Changes:
- Adds a new
get_installation_access_token()function that fetches and caches GitHub App installation tokens - Implements token caching with automatic refresh when tokens have less than 60 seconds remaining
- Updates the main ETL loop to fetch installation-specific tokens for each repository
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cgsheeh
left a comment
There was a problem hiding this comment.
Could you add tests for this as well?
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| Uses the JWT to look up the installation for the given repo, then exchanges | ||
| it for an installation access token (valid for 1 hour). Tokens are cached | ||
| per installation ID so that repos sharing an installation reuse the same token, | ||
| while repos on different installations each get their own. The repo->installation | ||
| ID mapping is also cached since it never changes. |
There was a problem hiding this comment.
The docstring and caching strategy assume the repo→installation ID mapping "never changes", but GitHub App installations can change (repo transfer, uninstall/reinstall, etc.). Caching this mapping indefinitely can leave the process stuck using a stale installation_id; consider adding a TTL / cache invalidation path (e.g., on 401/404 when exchanging tokens, evict and re-fetch the installation).
| installation_id = repo_installation_cache.get(repo) | ||
| if installation_id is None: | ||
| resp = session.get(f"{github_api_url}/repos/{repo}/installation") | ||
| if ( |
There was a problem hiding this comment.
The new installation/token HTTP calls don't set a request timeout. Without a timeout, the ETL can hang indefinitely on network issues; pass an explicit timeout (and ideally reuse a shared default) for these GitHub API calls.
| try: | ||
| return _main() | ||
| except RuntimeError as e: | ||
| logger.error(str(e)) | ||
| return 1 |
There was a problem hiding this comment.
main() only catches RuntimeError, but other failures in the pipeline (e.g., BigQuery client exceptions or the generic Exception raised in load_data()) will bypass this handler and produce a stack trace / non-uniform exit behavior. Either broaden this to catch Exception (while allowing KeyboardInterrupt/SystemExit through), or consistently raise RuntimeError for expected operational failures so main() can handle them uniformly.
No description provided.