Skip to content

Conversation

@DFanso
Copy link
Owner

@DFanso DFanso commented Oct 9, 2025

Description

Add redaction patterns for cloud credentials

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code refactoring
  • Performance improvement
  • Other (please describe):

Related Issue

Fixes #(issue number)

Changes Made

  • Expands scrubber to cover Azure, GCP, Stripe, Twilio,
    and other sensitive keys for enhanced security.
  • Updates tests for new redaction markers.
  • Adds AGENTS.md for project guidelines.

Testing

  • Tested with Gemini API
  • Tested with Grok API
  • Tested on Windows
  • Tested on Linux
  • Tested on macOS
  • Added/updated tests (if applicable)

Checklist

  • My code follows the project's code style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings or errors
  • I have tested this in a real Git repository
  • I have read the CONTRIBUTING.md guidelines

Screenshots (if applicable)

Additional Notes


For Hacktoberfest Participants

  • This PR is submitted as part of Hacktoberfest 2025

Thank you for your contribution! 🎉

Summary by CodeRabbit

  • New Features
    • Significantly expanded sensitive data detection and redaction, covering major cloud credentials, API tokens, private keys (RSA, EC, DSA, OpenSSH, PKCS#8, PGP), and credit cards.
    • More specific redaction labels (e.g., per-keytype) for clearer outputs.
    • Added scrubbing for multi-line content and diffs for broader protection.
  • Tests
    • Added comprehensive test suites validating new redaction cases and updated expectations.
  • Chores
    • Updated ignore list to exclude additional local files.
  • Style
    • Removed a non-functional header comment.

DFanso added 4 commits October 8, 2025 09:10
Azure credentials
Google Cloud credentials
SSH keys in different formats
API keys from more providers (Stripe, Twilio, etc.)
Connection strings for various databases
- Expands scrubber to cover Azure, GCP, Stripe, Twilio,
and other sensitive keys for enhanced security.
- Updates tests for new redaction markers.
- Adds AGENTS.md for project guidelines.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 9, 2025

Walkthrough

Expands .gitignore entries. Removes a header comment from cmd/cli/root.go. Significantly broadens internal/scrubber credential and secret redaction patterns, adds specific private-key handling, and introduces ScrubLines and ScrubDiff. Updates and adds tests to cover new redaction cases and adjusted placeholders.

Changes

Cohort / File(s) Summary
Repository config
\.gitignore
Adds ignore rules for AGENTS.md and files matching commit~.
CLI header cleanup
cmd/cli/root.go
Removes non-functional copyright header comment; no logic changes.
Scrubber implementation
internal/scrubber/scrubber.go
Adds extensive redaction patterns: Azure, GCP, various API keys/tokens, credit cards, social auth, and multiple private-key formats (RSA/EC/DSA/OpenSSH/PKCS#8/PGP). Introduces ScrubLines and ScrubDiff. Expands redaction placeholders (e.g., [REDACTED_RSA_PRIVATE_KEY]). Core flow remains; functionality broadened.
Scrubber tests
internal/scrubber/scrubber_test.go
Updates private-key expectation to [REDACTED_RSA_PRIVATE_KEY]. Adds tests for credit cards, Azure, GCP, additional API keys (Stripe/Twilio/DigitalOcean/SendGrid), and SSH keys. Uses table-driven cases with expected outputs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Caller
  participant S as Scrubber
  participant P as Pattern Engine

  rect rgba(230, 240, 255, 0.5)
    Note over C,S: Existing flow
    C->>S: Scrub(text)
    S->>P: Detect + Redact matches
    P-->>S: Redacted text
    S-->>C: Result
  end

  rect rgba(235, 255, 235, 0.5)
    Note over C,S: New entry points
    C->>S: ScrubLines(lines) [New]
    S->>P: Detect + Redact per line
    P-->>S: Redacted lines
    S-->>C: Result

    C->>S: ScrubDiff(diff) [New]
    S->>P: Detect + Redact within diff hunks
    P-->>S: Redacted diff
    S-->>C: Result
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

go

Poem

I twitch my whiskers at secrets that hide,
Sniffing keys, I hop from side to side.
With paws of regex, I mask and I mend,
From diffs to lines, I scrub to the end.
Thump-thump—no leaks on this garden ride! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly identifies the primary change of adding redaction patterns for cloud credentials, which aligns with the main additions of Azure and Google Cloud detection in the scrubber and reflects the PR’s stated objectives.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Comment @coderabbitai help to get the list of available commands and usage tips.

@gitguardian
Copy link

gitguardian bot commented Oct 9, 2025

⚠️ GitGuardian has uncovered 3 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
- - OpenSSH Private Key dd2a098 internal/scrubber/scrubber_test.go View secret
- - Stripe Keys dd2a098 internal/scrubber/scrubber_test.go View secret
- - Generic High Entropy Secret dd2a098 internal/scrubber/scrubber_test.go View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@DFanso DFanso added bug Something isn't working enhancement New feature or request hacktoberfest Eligible for Hacktoberfest hacktoberfest-accepted Approved Hacktoberfest contribution go Pull requests that update go code labels Oct 9, 2025
@DFanso DFanso self-assigned this Oct 9, 2025
@DFanso DFanso linked an issue Oct 9, 2025 that may be closed by this pull request
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/scrubber/scrubber.go (1)

30-33: Fix Authorization header redaction: stray quote and mismatched quoting

Current replacement always appends a double-quote, which can produce mismatched or extra quotes. Capture the optional closing quote and re-insert it.

Apply:

-        {
-            Name:    "Authorization Header",
-            Pattern: regexp.MustCompile(`(?i)(authorization\s*[=:]\s*["\']?)([a-zA-Z0-9_\-\.]{20,})["\']?`),
-            Redact:  "${1}[REDACTED_AUTH_TOKEN]\"",
-        },
+        {
+            Name:    "Authorization Header",
+            Pattern: regexp.MustCompile(`(?i)(authorization\s*[=:]\s*["\']?)([a-zA-Z0-9_\-\.]{20,})(["\']?)`),
+            Redact:  "${1}[REDACTED_AUTH_TOKEN]${3}",
+        },
🧹 Nitpick comments (9)
internal/scrubber/scrubber.go (5)

47-68: Preserve delimiters/quotes and avoid partial leaks in Azure patterns

Capture the assignment + opening quote as prefix and the optional closing quote as suffix, then reinsert them. This standardizes formatting and prevents mismatched quotes.

         {
             Name:    "Azure Client Secret",
-            Pattern: regexp.MustCompile(`(?i)(azure[_-]?client[_-]?secret|AZURE_CLIENT_SECRET)\s*[=:]\s*["\']?([a-zA-Z0-9_\-\.~]{20,})["\']?`),
-            Redact:  "${1}=\"[REDACTED_AZURE_CLIENT_SECRET]\"",
+            Pattern: regexp.MustCompile(`(?i)((?:azure[_-]?client[_-]?secret|AZURE_CLIENT_SECRET)\s*[=:]\s*["\']?)([a-zA-Z0-9_\-\.~]{20,})(["\']?)`),
+            Redact:  "${1}[REDACTED_AZURE_CLIENT_SECRET]${3}",
         },
         {
             Name:    "Azure Subscription Key",
-            Pattern: regexp.MustCompile(`(?i)(azure[_-]?subscription[_-]?key|AZURE_SUBSCRIPTION_KEY)\s*[=:]\s*["\']?([a-zA-Z0-9]{32})["\']?`),
-            Redact:  "${1}=\"[REDACTED_AZURE_SUBSCRIPTION_KEY]\"",
+            Pattern: regexp.MustCompile(`(?i)((?:azure[_-]?subscription[_-]?key|AZURE_SUBSCRIPTION_KEY)\s*[=:]\s*["\']?)([a-zA-Z0-9]{32})(["\']?)`),
+            Redact:  "${1}[REDACTED_AZURE_SUBSCRIPTION_KEY]${3}",
         },
         {
             Name:    "Azure Storage Key",
-            Pattern: regexp.MustCompile(`(?i)(azure[_-]?storage[_-]?key|AZURE_STORAGE_KEY|AccountKey)\s*[=:]\s*["\']?([a-zA-Z0-9/+=]{88})["\']?`),
-            Redact:  "${1}=\"[REDACTED_AZURE_STORAGE_KEY]\"",
+            Pattern: regexp.MustCompile(`(?i)((?:azure[_-]?storage[_-]?key|AZURE_STORAGE_KEY|AccountKey)\s*[=:]\s*["\']?)([a-zA-Z0-9/+=]{88})(["\']?)`),
+            Redact:  "${1}[REDACTED_AZURE_STORAGE_KEY]${3}",
         },
         {
             Name:    "Azure Service Principal",
-            Pattern: regexp.MustCompile(`(?i)(azure[_-]?service[_-]?principal|AZURE_SERVICE_PRINCIPAL)\s*[=:]\s*["\']?([a-f0-9-]{36})["\']?`),
-            Redact:  "${1}=\"[REDACTED_AZURE_SERVICE_PRINCIPAL]\"",
+            Pattern: regexp.MustCompile(`(?i)((?:azure[_-]?service[_-]?principal|AZURE_SERVICE_PRINCIPAL)\s*[=:]\s*["\']?)([a-f0-9-]{36})(["\']?)`),
+            Redact:  "${1}[REDACTED_AZURE_SERVICE_PRINCIPAL]${3}",
         },

Note: If any real-world value exceeds the fixed lengths, consider relaxing the value class to [A-Za-z0-9/_+=.-]{20,} to avoid partial redactions.


69-90: Narrow GCP JSON redaction to only the private_key value

Current pattern replaces a large JSON chunk and may drop other fields. Replace only the private_key value to preserve structure.

         {
             Name:    "Google Cloud JSON Credentials",
-            Pattern: regexp.MustCompile(`(?s)"type":\s*"service_account".*?"private_key":\s*"-----BEGIN PRIVATE KEY-----.*?-----END PRIVATE KEY-----`),
-            Redact:  "\"type\": \"service_account\",\n\"private_key\": \"[REDACTED_GCP_PRIVATE_KEY]\"",
+            Pattern: regexp.MustCompile(`(?s)("private_key"\s*:\s*")-----BEGIN PRIVATE KEY-----.*?-----END PRIVATE KEY-----(")`),
+            Redact:  "${1}[REDACTED_GCP_PRIVATE_KEY]${2}",
         },

Optionally, align the other GCP key patterns to the prefix/suffix-quote style used for Azure to keep formatting consistent.


135-161: Include Stripe test keys too (not just live)

Cover both live and test variants to avoid misses in non-production diffs.

         {
             Name:    "Stripe API Key",
-            Pattern: regexp.MustCompile(`(?i)(stripe[_-]?api[_-]?key|STRIPE_API_KEY)\s*[=:]\s*["\']?(sk_live_[a-zA-Z0-9]{24})["\']?`),
-            Redact:  "${1}=\"[REDACTED_STRIPE_KEY]\"",
+            Pattern: regexp.MustCompile(`(?i)((?:stripe[_-]?api[_-]?key|STRIPE_API_KEY)\s*[=:]\s*["\']?)(sk_(?:live|test)_[A-Za-z0-9]{24,})(["\']?)`),
+            Redact:  "${1}[REDACTED_STRIPE_KEY]${3}",
         },
         {
             Name:    "Stripe Publishable Key",
-            Pattern: regexp.MustCompile(`(?i)(stripe[_-]?publishable[_-]?key|STRIPE_PUBLISHABLE_KEY)\s*[=:]\s*["\']?(pk_live_[a-zA-Z0-9]{24})["\']?`),
-            Redact:  "${1}=\"[REDACTED_STRIPE_PUBLISHABLE_KEY]\"",
+            Pattern: regexp.MustCompile(`(?i)((?:stripe[_-]?publishable[_-]?key|STRIPE_PUBLISHABLE_KEY)\s*[=:]\s*["\']?)(pk_(?:live|test)_[A-Za-z0-9]{24,})(["\']?)`),
+            Redact:  "${1}[REDACTED_STRIPE_PUBLISHABLE_KEY]${3}",
         },

162-183: Heroku API key pattern is likely too strict

Using a UUID format may miss real Heroku API keys (commonly 32+ chars random). Consider a broader value class rather than UUID only, or add an alternate pattern.

Example:

-            Pattern: regexp.MustCompile(`(?i)(heroku[_-]?api[_-]?key|HEROKU_API_KEY)\s*[=:]\s*["\']?([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})["\']?`),
+            Pattern: regexp.MustCompile(`(?i)((?:heroku[_-]?api[_-]?key|HEROKU_API_KEY)\s*[=:]\s*["\']?)([A-Za-z0-9_-]{24,})(["\']?)`),

This reduces false negatives while still avoiding obvious noise.


223-283: Multi-line keys won’t be scrubbed by ScrubLines

These patterns are multi-line ((?s)), but ScrubLines operates per-line. Consider documenting this or adding a fallback to apply ScrubDiff when multi-line markers (BEGIN/END … PRIVATE KEY) are present.

Option: in ScrubLines, short-circuit:

 func ScrubLines(content string) string {
+    // Fallback for multi-line secrets (private keys)
+    if strings.Contains(content, "-----BEGIN ") && strings.Contains(content, "PRIVATE KEY") {
+        return ScrubDiff(content)
+    }

This keeps per-line behavior while handling multi-line secrets.

internal/scrubber/scrubber_test.go (4)

438-459: Strengthen assertions to ensure no leakage

Currently only checks for presence of a [REDACTED_*] marker. Add a negative assertion that a recognizable slice of the original value doesn’t remain (especially for long AccountKey).

Example:

         t.Run(tt.name, func(t *testing.T) {
             result := ScrubDiff(tt.input)
             if !strings.Contains(result, "[REDACTED_AZURE") {
                 t.Errorf("ScrubDiff() failed to redact Azure credentials.\nInput: %s\nOutput: %s", tt.input, result)
             }
+            // Ensure no obvious part of the secret remains
+            if strings.Contains(result, "abcdefghijklmnopqrstuvwxyz123456") {
+                t.Errorf("Redaction leaked part of the Azure secret.\nOutput: %s", result)
+            }
         })

Also consider using an 88-char AccountKey fixture to mirror real Azure lengths.


461-489: Add negative assertions for GCP fixtures

Verify that raw values like my-service-account-key.json or a substring of the API key do not appear post-scrub.

Example snippets:

             result := ScrubDiff(tt.input)
             if !strings.Contains(result, "[REDACTED_") {
                 t.Errorf("ScrubDiff() failed to redact Google Cloud credentials.\nInput: %s\nOutput: %s", tt.input, result)
             }
+            if strings.Contains(result, "service-account-key.json") ||
+               strings.Contains(result, "abcdefghijklmnopqrstuvwxyz123456") {
+                t.Errorf("Redaction leaked part of the GCP secret.\nOutput: %s", result)
+            }

490-521: Also assert original tokens aren’t present for Stripe/Twilio/DO/SendGrid

Presence of a [REDACTED_*] marker isn’t sufficient; add a couple of negative contains checks per case.

Example:

             result := ScrubDiff(tt.input)
             if !strings.Contains(result, "[REDACTED_") {
                 t.Errorf("ScrubDiff() failed to redact additional API keys.\nInput: %s\nOutput: %s", tt.input, result)
             }
+            if strings.Contains(result, "sk_live_") ||
+               strings.Contains(result, "twilio_auth_token") ||
+               strings.Contains(result, "digitalocean_api_key") ||
+               strings.Contains(result, "SG.") {
+                t.Errorf("Redaction may have leaked raw values.\nOutput: %s", result)
+            }

523-550: Add leakage checks for SSH fixtures

Assert that BEGIN/END blocks (or a known base64 slice) aren’t present after scrubbing.

Example:

             result := ScrubDiff(tt.input)
             if !strings.Contains(result, "[REDACTED_") {
                 t.Errorf("ScrubDiff() failed to redact SSH keys.\nInput: %s\nOutput: %s", tt.input, result)
             }
+            if strings.Contains(result, "-----BEGIN ") || strings.Contains(result, " OPENSSH PRIVATE KEY-----") {
+                t.Errorf("SSH private key material still present.\nOutput: %s", result)
+            }

Additionally, consider adding a test for quoted Authorization headers to guard against quoting issues:

func TestAuthorizationHeaderWithQuotes(t *testing.T) {
    in := `authorization="sometokenwithmorethan20chars"`
    out := ScrubDiff(in)
    if !strings.Contains(out, "[REDACTED_AUTH_TOKEN]") || strings.Count(out, `"`) != 2 {
        t.Errorf("Authorization redaction with quotes failed: %s", out)
    }
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1227f6 and dd2a098.

📒 Files selected for processing (4)
  • .gitignore (1 hunks)
  • cmd/cli/root.go (0 hunks)
  • internal/scrubber/scrubber.go (2 hunks)
  • internal/scrubber/scrubber_test.go (2 hunks)
💤 Files with no reviewable changes (1)
  • cmd/cli/root.go
🧰 Additional context used
🧬 Code graph analysis (1)
internal/scrubber/scrubber_test.go (1)
internal/scrubber/scrubber.go (1)
  • ScrubDiff (330-339)
🪛 Gitleaks (8.28.0)
internal/scrubber/scrubber.go

[high] 231-236: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.

(private-key)


[high] 241-246: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.

(private-key)


[high] 251-281: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.

(private-key)

internal/scrubber/scrubber_test.go

[high] 439-439: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 501-501: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 505-505: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 509-509: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 476-476: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.

(private-key)


[high] 530-530: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.

(private-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build Go Binary (windows-latest)
🔇 Additional comments (3)
.gitignore (1)

31-31: Confirm intent to ignore AGENTS.md.

We’re versioning AGENTS.md in this PR, yet it’s also being added to .gitignore. That makes future edits harder (and confusing, since contributors must git add -f). Please confirm whether the doc should remain tracked; if so, drop it from .gitignore.

internal/scrubber/scrubber_test.go (2)

137-139: LGTM: RSA-specific placeholder expectation

Matches the new per-type key redaction behavior.


399-430: Credit card tests look good

Covers spaces, dashes, and contiguous digits.

@DFanso DFanso merged commit f6aad8b into main Oct 9, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request go Pull requests that update go code hacktoberfest Eligible for Hacktoberfest hacktoberfest-accepted Approved Hacktoberfest contribution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Enhance Scrubber Patterns

2 participants