Conversation
jocel1
commented
Dec 4, 2022
- simplify / use more generic bot rules
- add extra bots (ia_archiver, gtmetrix, lighthouse)
- add extra bots (ia_archiver, gtmetrix, lighthouse)
|
Bonjour @jocel1, Since your change is doing two distinct things, I would rather see two commits. There's also no explanation or justification for why we should generalize certain rules. Not being a historical maintainer of this project, I can't tell why choices were made and whether it's a good idea to challenge them. One thing you could do for example is share a list of user agents to add test coverage, to make sure we don't break previous expectations. |
|
Hi @dridi! For the first one : The main reason to add "google" is to cover Google Adsense user-agent: Mediapartners-Google. I also checked google pixels don't have "google" in their user agent, but we could perhaps add just this one. For spider, I often discover new bots like ia_archiver is a common bot https://user-agents.net/string/ia-archiver I also changed facebook to match For the last one : (?i)(web)crawler the syntax sounds like (?i)(web)?crawler was expected, to match for example:
For gtmetrix / lighthouse I don't know if we should see them as bot or not, perhaps create a new category for those ones, like "synthetic-bot" ? (we could add in them "Synthetic" to match dynatrace as well) |