When fetching robots.txt, use the same User-Agent as defined by the user #491

mattiasgeniar · 2025-10-23T20:09:25Z

Before, this was falling back to the package implementation of file_get_contents() for the remote URL in the robots-txt package.

Now, it re-uses the same HTTP client already configured in the crawler package, to fetch the contents of robots.txt instead. This ensures the custom headers & User-Agent as set in the Crawler settings are also applied to fetching the initial robots.txt.

…of a particular URL, it should remove that URL from the queue

freekmurze · 2025-10-28T14:08:32Z

Thanks!

mattiasgeniar and others added 4 commits February 26, 2021 12:26

Bugfix: prevent infinite loops when a CrawlProfile prevents crawling …

25d764b

…of a particular URL, it should remove that URL from the queue

Merge branch 'spatie:main' into master

bba1dab

Merge branch 'spatie:main' into master

5be7274

When fetching robots.txt, use the same User-Agent as defined by the user

3527cf9

freekmurze merged commit a9c432c into spatie:main Oct 28, 2025
18 checks passed

mattiasgeniar deleted the robots-txt-user-agent-header branch November 3, 2025 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When fetching robots.txt, use the same User-Agent as defined by the user #491

When fetching robots.txt, use the same User-Agent as defined by the user #491

Uh oh!

mattiasgeniar commented Oct 23, 2025

Uh oh!

Uh oh!

freekmurze commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

When fetching robots.txt, use the same User-Agent as defined by the user #491

When fetching robots.txt, use the same User-Agent as defined by the user #491

Uh oh!

Conversation

mattiasgeniar commented Oct 23, 2025

Uh oh!

Uh oh!

freekmurze commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants