Skip to content

Undocumented behaviour: duplicated input names have their IDs lumped together #15

@Rekyt

Description

@Rekyt

I found an undocumented behavior of the package, and as such, I wasn't expecting it.
While I understand from an API standpoint why it is important to avoid unnecessary duplicate queries and save resources.

However, TNRS doesn't document the fact that same input names are going to be lumped together in the query.
It would be nice to document this behavior to avoid any bad surprises when using the ID columns to make joins after matching names.

reprex:

# Test twice the same name with different
taxa_frame = data.frame(
  ID = paste0("test-", 1:2),
  name = c("Helianthus", "Helianthus")
)

matched = TNRS::TNRS(taxa_frame)

# IDs are mixed
matched[, 1:5]
#>              ID Name_submitted Overall_score Name_matched_id Name_matched
#> 1 test-2,test-1     Helianthus             1          668749   Helianthus

# It's the same for sequential match
seq_match = TNRS::TNRS(taxa_frame$name)
seq_match[, 1:5]
#>    ID Name_submitted Overall_score Name_matched_id Name_matched
#> 1 2,1     Helianthus             1          668749   Helianthus

Created on 2023-02-14 with reprex v2.0.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions