Skip to content

Add auto-install for SCIP indexer tools#237

Open
wende wants to merge 1 commit intomainfrom
feat/auto-install-scip-tools
Open

Add auto-install for SCIP indexer tools#237
wende wants to merge 1 commit intomainfrom
feat/auto-install-scip-tools

Conversation

@wende
Copy link
Owner

@wende wende commented Feb 7, 2026

Summary

  • When a SCIP indexer binary isn't found, automatically install it using the language's native package manager (go install, gem install, dart pub, dotnet tool, rustup, coursier, npm)
  • Adds generic SCIPToolInstaller class with 7 install methods and InstallConfig dataclass
  • Integrates auto-install into ConfigurableSCIPIndexer._find_executable() and RustSCIPIndexer._ensure_rust_analyzer_installed()

Test plan

  • All 3680 existing tests pass
  • New unit tests for each install method (success, runtime missing, subprocess failure)
  • Integration tests for _find_executable auto-install path
  • Integration tests for Rust auto-install path
  • Verified C/C++ correctly have no auto-install config

When a SCIP indexer binary isn't found, automatically install it using
the language's native package manager (go install, gem install, dart pub,
dotnet tool, rustup, coursier, npm). This removes a major accessibility
barrier - users no longer need to manually install indexer tools.

- Add generic SCIPToolInstaller with 7 install methods
- Add install_config field to SCIPLanguageConfig for go, ruby, dart,
  java, scala, csharp, vb (c/cpp have no auto-install)
- Integrate auto-install into ConfigurableSCIPIndexer._find_executable()
- Add rust-analyzer auto-install via rustup in RustSCIPIndexer
@cursor
Copy link

cursor bot commented Feb 7, 2026

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on March 1.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @wende, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust auto-installation system for SCIP indexer tools, significantly improving the developer experience by automatically resolving missing dependencies. By centralizing installation logic and integrating it into the core indexer framework, it streamlines the setup process for various programming languages, making the system more self-sufficient and user-friendly.

Highlights

  • Generic Auto-Installation Framework: A new SCIPToolInstaller class and InstallConfig dataclass have been introduced to provide a generic mechanism for automatically installing SCIP indexer tools using various native package managers (npm, go, gem, dart pub, dotnet, rustup, coursier).
  • Integrated Auto-Install Logic: The auto-installation functionality is integrated into ConfigurableSCIPIndexer._find_executable() and RustSCIPIndexer._ensure_rust_analyzer_installed(), allowing the system to automatically install missing indexers for supported languages like Go, Ruby, Dart, Java, Scala, C#, VB, and Rust.
  • Enhanced SCIPLanguageConfig: The SCIPLanguageConfig now includes an install_config field, enabling language-specific auto-installation settings to be defined centrally.
  • Non-Interactive npx Commands: Existing npx commands used for JavaScript and TypeScript SCIP index generation have been updated with the --yes flag to ensure non-interactive execution.
  • Comprehensive Testing: New unit tests cover each installation method's success and failure scenarios, along with integration tests for the auto-install paths in _find_executable and RustSCIPIndexer.
Changelog
  • Makefile
    • Added --yes flag to npx command for JavaScript SCIP index generation to ensure non-interactive execution.
  • cicada/languages/rust/indexer.py
    • Modified _ensure_rust_analyzer_installed to use the new SCIPToolInstaller for automatic installation of rust-analyzer if it's not found.
  • cicada/languages/scip/configs.py
    • Added install_config: InstallConfig | None field to SCIPLanguageConfig.
    • Introduced _make_install_configs() and _get_install_config() to lazily define and retrieve auto-install configurations for Go, Ruby, Dart, Java, Scala, C#, and VB.
    • Updated LANGUAGE_CONFIGS to include the new install_config for supported languages.
  • cicada/languages/scip/configurable_indexer.py
    • Modified _find_executable to check for an install_config and attempt auto-installation via SCIPToolInstaller.try_install() if the required executable is not found. If successful, the command is updated with the installed path.
  • cicada/languages/scip/installer.py
    • New file: Implemented InstallMethod enum, InstallConfig dataclass, and the SCIPToolInstaller class with methods for installing tools via npm, go, gem, dart pub, dotnet, rustup, and coursier.
  • cicada/languages/typescript/indexer.py
    • Added --yes flag to npx command for TypeScript SCIP index generation.
  • tests/fixtures/.cicada/hashes.json
    • Updated last_updated timestamp.
  • tests/languages/scip/test_scip_tool_installer.py
    • New file: Added extensive unit tests for InstallConfig and SCIPToolInstaller, covering runtime checks, successful installations, and failure scenarios for all supported package managers.
    • Included integration tests for the auto-install logic within ConfigurableSCIPIndexer and RustSCIPIndexer.
  • tests/languages/typescript/test_javascript_indexer.py
    • Updated npx command in tests to include --yes.
  • tests/languages/typescript/test_javascript_integration.py
    • Updated npx command in test setup and skip messages to include --yes.
  • tests/languages/typescript/test_typescript_indexer.py
    • Updated npx command in tests to include --yes.
  • tests/setup_fixtures.sh
    • Added --yes flag to npx command for SCIP TypeScript index generation in the setup script.
Activity
  • The author, wende, has introduced a significant feature to automatically install missing SCIP indexer tools, enhancing the system's robustness and ease of use.
  • This pull request includes the creation of a new Python module (installer.py) dedicated to handling various package manager installations.
  • The core logic for finding and executing indexers has been updated to incorporate this new auto-installation capability.
  • Comprehensive test coverage has been added to ensure the reliability of the new installation methods across different languages and scenarios.
  • Minor adjustments were made to existing scripts and test files to align with non-interactive npx command execution.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 security issue, and 2 other issues

Security issues:

  • Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `cicada/languages/scip/installer.py:135-58` </location>
<code_context>
+        return None
+
+    @classmethod
+    def _install_gem(cls, config: InstallConfig, verbose: bool) -> str | None:
+        """Install via gem install."""
+        result = subprocess.run(
+            ["gem", "install", config.package, "--no-document"],
+            capture_output=True,
+            text=True,
+            timeout=180,
+        )
+
+        if result.returncode != 0:
+            if verbose:
+                print(f"  gem install failed: {result.stderr}")
+            return None
+
+        path = shutil.which(config.executable)
</code_context>

<issue_to_address>
**suggestion:** Gem-installed binaries may not be found via PATH; consider probing standard gem bin directories.

Unlike the Go/Dart/Dotnet helpers, this relies only on `shutil.which(config.executable)` after `gem install`. If the gem bin dir isn’t on PATH (common on some distros/CI images), this will look like a failed install even when it succeeded. Consider checking standard gem bin locations (e.g., from `gem environment` or `$HOME/.gem/.../bin`) before falling back to `which` so installs work reliably across environments.

Suggested implementation:

```python
    @classmethod
    def _install_gem(cls, config: InstallConfig, verbose: bool) -> str | None:
        """Install via gem install."""
        result = subprocess.run(
            ["gem", "install", config.package, "--no-document"],
            capture_output=True,
            text=True,
            timeout=180,
        )

        if result.returncode != 0:
            if verbose:
                print(f"  gem install failed: {result.stderr}")
            return None

        # Probe standard gem bin directories first, then fall back to PATH.
        gem_bin_dirs = set()

        # Prefer gem-reported directories for robustness across environments.
        for key in ("gemdir", "user_gemdir"):
            try:
                env = subprocess.run(
                    ["gem", "environment", key],
                    capture_output=True,
                    text=True,
                    timeout=30,
                )
                if env.returncode == 0:
                    value = env.stdout.strip()
                    if value:
                        gem_bin_dirs.add(Path(value) / "bin")
            except Exception as exc:  # pragma: no cover - defensive
                if verbose:
                    print(f"  failed to query gem environment {key}: {exc}")

        # Common fallback: ~/.gem/.../bin
        try:
            home = Path.home()
            gem_bin_dirs.add(home / ".gem" / "bin")
        except Exception:
            # If we can't resolve $HOME, just skip this fallback.
            pass

        for bin_dir in gem_bin_dirs:
            candidate = bin_dir / config.executable
            if candidate.is_file() and os.access(candidate, os.X_OK):
                if verbose:
                    print(f"  Installed {config.executable} to {candidate}")
                return str(candidate)

        path = shutil.which(config.executable)
        if path:
            return path

        return None

```

This edit assumes `Path` from `pathlib` and `os` are already imported in `cicada/languages/scip/installer.py`. If they are not, add:

- `from pathlib import Path`
- `import os`

to the import section near the top of the file, following existing style and ordering.
</issue_to_address>

### Comment 2
<location> `cicada/languages/typescript/indexer.py:51` </location>
<code_context>
         # Security audit: Command uses list-form arguments (not shell=True),
         # so no command injection risk. All arguments are hardcoded strings.
-        cmd = ["npx", "@sourcegraph/scip-typescript", "index"]
+        cmd = ["npx", "--yes", "@sourcegraph/scip-typescript", "index"]
         scip_file = repo_path / "index.scip"

</code_context>

<issue_to_address>
**question:** Using `npx --yes` can break on older npm versions that don’t support this flag.

This flag avoids prompts but is unsupported in older npm/npx versions (e.g., npm < 7), causing the command to fail there. If this tool is expected to run on varied environments, consider either checking the npm/npx version before adding `--yes` or falling back to plain `npx` when the flag isn’t supported to avoid failures on older toolchains.
</issue_to_address>

### Comment 3
<location> `cicada/languages/scip/installer.py:243-248` </location>
<code_context>
        result = subprocess.run(
            [cs_cmd, "install", config.package],
            capture_output=True,
            text=True,
            timeout=180,
        )
</code_context>

<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

if config.runtime_check and not shutil.which(config.runtime_check):
if verbose:
print(f" {config.runtime_check} not found - cannot install {config.executable}")
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Gem-installed binaries may not be found via PATH; consider probing standard gem bin directories.

Unlike the Go/Dart/Dotnet helpers, this relies only on shutil.which(config.executable) after gem install. If the gem bin dir isn’t on PATH (common on some distros/CI images), this will look like a failed install even when it succeeded. Consider checking standard gem bin locations (e.g., from gem environment or $HOME/.gem/.../bin) before falling back to which so installs work reliably across environments.

Suggested implementation:

    @classmethod
    def _install_gem(cls, config: InstallConfig, verbose: bool) -> str | None:
        """Install via gem install."""
        result = subprocess.run(
            ["gem", "install", config.package, "--no-document"],
            capture_output=True,
            text=True,
            timeout=180,
        )

        if result.returncode != 0:
            if verbose:
                print(f"  gem install failed: {result.stderr}")
            return None

        # Probe standard gem bin directories first, then fall back to PATH.
        gem_bin_dirs = set()

        # Prefer gem-reported directories for robustness across environments.
        for key in ("gemdir", "user_gemdir"):
            try:
                env = subprocess.run(
                    ["gem", "environment", key],
                    capture_output=True,
                    text=True,
                    timeout=30,
                )
                if env.returncode == 0:
                    value = env.stdout.strip()
                    if value:
                        gem_bin_dirs.add(Path(value) / "bin")
            except Exception as exc:  # pragma: no cover - defensive
                if verbose:
                    print(f"  failed to query gem environment {key}: {exc}")

        # Common fallback: ~/.gem/.../bin
        try:
            home = Path.home()
            gem_bin_dirs.add(home / ".gem" / "bin")
        except Exception:
            # If we can't resolve $HOME, just skip this fallback.
            pass

        for bin_dir in gem_bin_dirs:
            candidate = bin_dir / config.executable
            if candidate.is_file() and os.access(candidate, os.X_OK):
                if verbose:
                    print(f"  Installed {config.executable} to {candidate}")
                return str(candidate)

        path = shutil.which(config.executable)
        if path:
            return path

        return None

This edit assumes Path from pathlib and os are already imported in cicada/languages/scip/installer.py. If they are not, add:

  • from pathlib import Path
  • import os

to the import section near the top of the file, following existing style and ordering.

# Security audit: Command uses list-form arguments (not shell=True),
# so no command injection risk. All arguments are hardcoded strings.
cmd = ["npx", "@sourcegraph/scip-typescript", "index"]
cmd = ["npx", "--yes", "@sourcegraph/scip-typescript", "index"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Using npx --yes can break on older npm versions that don’t support this flag.

This flag avoids prompts but is unsupported in older npm/npx versions (e.g., npm < 7), causing the command to fail there. If this tool is expected to run on varied environments, consider either checking the npm/npx version before adding --yes or falling back to plain npx when the flag isn’t supported to avoid failures on older toolchains.

Comment on lines +243 to +248
result = subprocess.run(
[cs_cmd, "install", config.package],
capture_output=True,
text=True,
timeout=180,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

Source: opengrep

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an excellent feature for automatically installing missing SCIP indexer tools, which will significantly improve the user experience. The implementation is well-structured, with a new generic SCIPToolInstaller that cleanly handles different package managers. The integration into the existing indexers is also well-done, and the new unit tests are comprehensive. I have a couple of minor suggestions to improve code consistency and efficiency, but overall this is a high-quality contribution.

Comment on lines +75 to +126
def _make_install_configs() -> dict[str, InstallConfig]:
"""Build install configs lazily to avoid circular imports."""
from cicada.languages.scip.installer import InstallConfig, InstallMethod

return {
"go": InstallConfig(
method=InstallMethod.GO,
package="github.com/sourcegraph/scip-go/cmd/scip-go@latest",
executable="scip-go",
runtime_check="go",
),
"ruby": InstallConfig(
method=InstallMethod.GEM,
package="scip-ruby",
executable="scip-ruby",
runtime_check="gem",
),
"dart": InstallConfig(
method=InstallMethod.DART_PUB,
package="scip_dart",
executable="scip_dart",
runtime_check="dart",
),
"java": InstallConfig(
method=InstallMethod.COURSIER,
package="scip-java",
executable="scip-java",
),
"scala": InstallConfig(
method=InstallMethod.COURSIER,
package="scip-java",
executable="scip-java",
),
"csharp": InstallConfig(
method=InstallMethod.DOTNET,
package="scip-dotnet",
executable="scip-dotnet",
runtime_check="dotnet",
),
"vb": InstallConfig(
method=InstallMethod.DOTNET,
package="scip-dotnet",
executable="scip-dotnet",
runtime_check="dotnet",
),
}


def _get_install_config(language: str) -> InstallConfig | None:
"""Get install config for a language, or None if not supported."""
configs = _make_install_configs()
return configs.get(language)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _make_install_configs() function is called every time _get_install_config() is invoked. Since _get_install_config() is called for each language config at module load time, this leads to the dictionary of install configs being created multiple times unnecessarily. It would be more efficient to create it once and cache it in a module-level variable.

_INSTALL_CONFIGS: dict[str, InstallConfig] | None = None


def def _make_install_configs() -> dict[str, InstallConfig]:
    """Build install configs lazily to avoid circular imports."""
    from cicada.languages.scip.installer import InstallConfig, InstallMethod

    return {
        "go": InstallConfig(
            method=InstallMethod.GO,
            package="github.com/sourcegraph/scip-go/cmd/scip-go@latest",
            executable="scip-go",
            runtime_check="go",
        ),
        "ruby": InstallConfig(
            method=InstallMethod.GEM,
            package="scip-ruby",
            executable="scip-ruby",
            runtime_check="gem",
        ),
        "dart": InstallConfig(
            method=InstallMethod.DART_PUB,
            package="scip_dart",
            executable="scip_dart",
            runtime_check="dart",
        ),
        "java": InstallConfig(
            method=InstallMethod.COURSIER,
            package="scip-java",
            executable="scip-java",
        ),
        "scala": InstallConfig(
            method=InstallMethod.COURSIER,
            package="scip-java",
            executable="scip-java",
        ),
        "csharp": InstallConfig(
            method=InstallMethod.DOTNET,
            package="scip-dotnet",
            executable="scip-dotnet",
            runtime_check="dotnet",
        ),
        "vb": InstallConfig(
            method=InstallMethod.DOTNET,
            package="scip-dotnet",
            executable="scip-dotnet",
            runtime_check="dotnet",
        ),
    }


def _get_install_config(language: str) -> InstallConfig | None:
    """Get install config for a language, or None if not supported."""
    global _INSTALL_CONFIGS
    if _INSTALL_CONFIGS is None:
        _INSTALL_CONFIGS = _make_install_configs()
    return _INSTALL_CONFIGS.get(language)

Comment on lines +128 to +130
path = shutil.which(config.executable)
if path:
return path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fallback to shutil.which() does not log a success message when verbose is true. This is inconsistent with other installer methods like _install_gem which do log a message. Adding a log message here would improve consistency and aid in debugging. This also applies to _install_dart_pub and _install_dotnet.

Suggested change
path = shutil.which(config.executable)
if path:
return path
path = shutil.which(config.executable)
if path:
if verbose:
print(f" Installed {config.executable} at {path}")
return path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant