Enable optional symlink following for fs source #4565

dustin-decker · 2025-11-20T18:05:12Z

Description:

Adds an option to follow symlinks in the fs source.

Checklist:

Tests passing (make test-community)?
Lint passing (make lint this requires golangci-lint)?

pkg/sources/filesystem/filesystem.go

mcastorina · 2025-11-20T21:29:15Z

pkg/sources/filesystem/filesystem.go

 		}

-		if fileInfo.Mode()&os.ModeSymlink != 0 {
+		if !s.followSymlinks && fileInfo.Mode()&os.ModeSymlink != 0 {


fileInfo.Mode() would always be non-symlink if os.Stat is used, right? Since Stat follows symlinks.

I have redone this preserving the original first Lstat and then doing a Stat after if followSymlinks is enabled and it is a symlink to make it more clear.

Unresolving because I don't see the original Lstat anywhere. Am I missing it?

pkg/sources/filesystem/filesystem.go

mcastorina · 2025-12-15T23:31:57Z

pkg/sources/filesystem/filesystem.go

 		}

-		if fileInfo.Mode()&os.ModeSymlink != 0 {
+		if !s.followSymlinks && fileInfo.Mode()&os.ModeSymlink != 0 {


Unresolving because I don't see the original Lstat anywhere. Am I missing it?

mcastorina · 2025-12-15T23:34:24Z

pkg/sources/filesystem/filesystem.go

+	// Why LRU cache instead of a map:
+	// - Bounded memory: Limits to 10k paths (~1MB) even for massive directory trees
+	// - Per-path reset: Cache is recreated for each scan path to prevent accumulation
+	// - Loop detection: Prevents scanning the same file multiple times via different symlinks
+	//
+	// Why depth-1 limiting:
+	// - Prevents infinite loops: Symlink chains (A->B->C->...) are limited
+	// - Predictable behavior: Users know exactly which symlinks will be followed


What does a cache get us that Stat / EvalSymlinks does not address? Looks like if there's a cycle, those functions return an error: too many links

Ah, the cycle we're worried about is a symlink to a directory and then recursively scanning that. Do you think we could get away with saving the visited directories only?

mcastorina · 2025-12-15T23:40:11Z

pkg/sources/filesystem/filesystem.go

+		// If followSymlinks is enabled and this is a symlink, check for loops
+		if s.followSymlinks && fileInfo.Mode()&os.ModeSymlink != 0 {


I believe this is always false since fileInfo comes from Stat when followSymlinks is true. Is there an example or test case that exercises this scenario?

mcastorina · 2025-12-15T23:49:21Z

pkg/sources/filesystem/filesystem.go

+// isDirectChild checks if a path is a direct child of any scan root path.
+// This enforces depth-1 symlink following to prevent:
+// - Infinite symlink loops
+// - Deep directory traversal through symlinks
+//
+// Returns true only if the symlink's parent directory matches a scan root path.
+func (s *Source) isDirectChild(path string) bool {
+	dir := filepath.Clean(filepath.Dir(path))
+	_, isRoot := s.scanRootPaths[dir]
+	return isRoot
+}


When I read "depth-1 limiting" I thought it meant we allow only one symlink depth, but this looks like it is limiting symbolic link targets to a very specific directory structure.

Why are we doing this? It seems like something that can be easily tripped on and I'm not seeing why it needs to be that way.

mcastorina · 2025-12-15T23:53:55Z

pkg/sources/filesystem/filesystem_test.go

+	t.Run("only follow first level symlink in chain", func(t *testing.T) {
+		conn, err := anypb.New(&sourcespb.Filesystem{
+			Paths:          []string{tempDir},
+			FollowSymlinks: true,
+		})
+		require.NoError(t, err)
+
+		s := Source{}
+		err = s.Init(ctx, "test symlink chain", 0, 0, true, conn, 1)
+		require.NoError(t, err)
+
+		reporter := sourcestest.TestReporter{}
+		err = s.ChunkUnit(ctx, sources.CommonSourceUnit{
+			ID: tempDir,
+		}, &reporter)
+		require.NoError(t, err)
+
+		// Should have 2 chunks: real_file and symlink1 (which resolves to real_file)
+		// symlink2 also resolves to the same real_file, so loop detection prevents duplicate scanning
+		// This is correct behavior - we don't want to scan the same content multiple times
+		assert.Equal(t, 2, len(reporter.Chunks), "Expected two chunks from real file and first symlink")
+	})


I'm not sure I agree that this is testing "only follow first level symlink in chain"

It scans the entire directory vs just symlink1.

dustin-decker requested a review from a team November 20, 2025 18:05

dustin-decker requested review from a team as code owners November 20, 2025 18:05

mcastorina reviewed Nov 20, 2025

View reviewed changes

pkg/sources/filesystem/filesystem.go Show resolved Hide resolved

mcastorina reviewed Nov 20, 2025

View reviewed changes

dustin-decker requested a review from a team as a code owner November 21, 2025 19:04

dustin-decker added 2 commits December 12, 2025 08:28

Enable optional symlink following for fs source

48945e6

make approach more robust

b8dddcd

dustin-decker force-pushed the opt-follow-symlinks branch from 9f9d140 to b8dddcd Compare December 12, 2025 16:28

mcastorina reviewed Dec 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable optional symlink following for fs source #4565

Enable optional symlink following for fs source #4565

dustin-decker commented Nov 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

mcastorina Nov 20, 2025

Uh oh!

dustin-decker Dec 12, 2025

Uh oh!

mcastorina Dec 15, 2025

Uh oh!

Uh oh!

mcastorina Dec 15, 2025

Uh oh!

mcastorina Dec 15, 2025

Uh oh!

mcastorina Dec 15, 2025

Uh oh!

mcastorina Dec 15, 2025

Uh oh!

mcastorina Dec 15, 2025

Uh oh!

mcastorina Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// If followSymlinks is enabled and this is a symlink, check for loops
		if s.followSymlinks && fileInfo.Mode()&os.ModeSymlink != 0 {

Enable optional symlink following for fs source #4565

Are you sure you want to change the base?

Enable optional symlink following for fs source #4565

Conversation

dustin-decker commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description:

Checklist:

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dustin-decker commented Nov 20, 2025 •

edited

Loading