Skip to content

Conversation

@mattiasgeniar
Copy link
Contributor

Before, this was falling back to the package implementation of file_get_contents() for the remote URL in the robots-txt package.

Now, it re-uses the same HTTP client already configured in the crawler package, to fetch the contents of robots.txt instead. This ensures the custom headers & User-Agent as set in the Crawler settings are also applied to fetching the initial robots.txt.

@freekmurze freekmurze merged commit a9c432c into spatie:main Oct 28, 2025
18 checks passed
@freekmurze
Copy link
Member

Thanks!

@mattiasgeniar mattiasgeniar deleted the robots-txt-user-agent-header branch November 3, 2025 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants