Sometimes empty string as input moves entire names

I also had a strange bug when having the empty string as input (but not the same as #14).

Sometimes, if the dataset is "big enough", testing with a data.frame with an empty string actually removes from the input and moves the next name to that ID instead. This causes many issues if you want to match back the names through the ID.

See my reproducible example:

``` r
library("dplyr")
#> Warning: le package 'dplyr' a été compilé avec la version R 4.2.1
#> 
#> Attachement du package : 'dplyr'
#> Les objets suivants sont masqués depuis 'package:stats':
#> 
#>     filter, lag
#> Les objets suivants sont masqués depuis 'package:base':
#> 
#>     intersect, setdiff, setequal, union

ex_data = structure(
  list(
    ID = c(
      "splot-1", "splot-2", "splot-3", "splot-4", 
      "splot-5", "splot-6", "splot-7", "splot-8", "splot-9", "splot-10", 
      "splot-11", "splot-12", "splot-13", "splot-14", "splot-15", "splot-16", 
      "splot-17", "splot-18", "splot-19", "splot-20", "splot-21", "splot-22", 
      "splot-23", "splot-24", "splot-25", "splot-26", "splot-27", "splot-28", 
      "splot-29", "splot-30"
    ),
    Name_submitted = c(
      "Chlorophytum platt", 
      "Echinochloa", "Indigofera lange bl-stiele", "Polygala", "", 
      "Species", "Fabaceae", "Stumpf", "Schwarz pkt.", "-wirtelig b", 
      "Borstig", "-kantig", "Glatt langweilig", "Herzblatt", "Oval", 
      "Versetzt lanzettlich", "Species", "Kn verdichtet", "Wirtelig nadelig", 
      "Aa", "Aaaaaaa", "Aa achalensis", "Aa argyrolepis", "Aa aurantiaca", 
      "Aa calceata", "Achnella", "Aa colombiana", "Aa denticulata", 
      "Aa erosa", "Aa fiebrigii"
    )
  ),
  row.names = c(NA, 30L),
  class = "data.frame"
)

# Note that name splot-5 is empty
head(ex_data)
#>        ID             Name_submitted
#> 1 splot-1         Chlorophytum platt
#> 2 splot-2                Echinochloa
#> 3 splot-3 Indigofera lange bl-stiele
#> 4 splot-4                   Polygala
#> 5 splot-5                           
#> 6 splot-6                    Species

# Also many of the names are quite messy but some are genuine
ex_data[["Name_submitted"]]
#>  [1] "Chlorophytum platt"         "Echinochloa"               
#>  [3] "Indigofera lange bl-stiele" "Polygala"                  
#>  [5] ""                           "Species"                   
#>  [7] "Fabaceae"                   "Stumpf"                    
#>  [9] "Schwarz pkt."               "-wirtelig b"               
#> [11] "Borstig"                    "-kantig"                   
#> [13] "Glatt langweilig"           "Herzblatt"                 
#> [15] "Oval"                       "Versetzt lanzettlich"      
#> [17] "Species"                    "Kn verdichtet"             
#> [19] "Wirtelig nadelig"           "Aa"                        
#> [21] "Aaaaaaa"                    "Aa achalensis"             
#> [23] "Aa argyrolepis"             "Aa aurantiaca"             
#> [25] "Aa calceata"                "Achnella"                  
#> [27] "Aa colombiana"              "Aa denticulata"            
#> [29] "Aa erosa"                   "Aa fiebrigii"

# There are 30 names from 'splot-1' to 'splot-30'
dim(ex_data)
#> [1] 30  2

# Matching
ex_match = TNRS::TNRS(ex_data)

# Only 28 rows
dim(ex_match)
#> [1] 28 45

# Two IDs are not found 'splot-6' and 'splot-17'
setdiff(ex_data$ID, ex_match$ID)
#> [1] "splot-6"  "splot-17"

# There are both corresponding to two names that are 'Species'
ex_data %>%
  filter(ID %in% setdiff(ex_data$ID, ex_match$ID))
#>         ID Name_submitted
#> 1  splot-6        Species
#> 2 splot-17        Species

# However, 'Species' has been matched as 'splot-5'
head(ex_match)[,1:3]
#>        ID             Name_submitted Overall_score
#> 1 splot-1         Chlorophytum platt     0.5000000
#> 2 splot-2                Echinochloa     1.0000000
#> 3 splot-3 Indigofera lange bl-stiele     0.4004262
#> 4 splot-4                   Polygala     1.0000000
#> 5 splot-5                    Species            NA
#> 6 splot-7                   Fabaceae     1.0000000

# And regular 'splot-5' is nowhere to be seen
ex_match[,1:5] %>%
  filter(ID == "splot-5")
#>        ID Name_submitted Overall_score Name_matched_id     Name_matched
#> 1 splot-5        Species            NA                 [No match found]
```

<sup>Created on 2023-02-14 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>

I dived into the issue by looking at the query done through `TNRS_core()` and it seems all fine.
The data JSON looks like this:
```json
[["splot-1","Chlorophytum platt"],["splot-2","Echinochloa"],["splot-3","Indigofera lange bl-stiele"],
["splot-4","Polygala"],["splot-5",""],["splot-6","Species"],["splot-7","Fabaceae"],["splot-8","Stumpf"],
["splot-9","Schwarz pkt."],["splot-10","-wirtelig b"],["splot-11","Borstig"],["splot-12","-kantig"],
["splot-13","Glatt langweilig"],["splot-14","Herzblatt"],["splot-15","Oval"],["splot-16","Versetzt lanzettlich"],
["splot-17","Species"],["splot-18","Kn verdichtet"],["splot-19","Wirtelig nadelig"],
["splot-20","Aa"],["splot-21","Aaaaaaa"],["splot-22","Aa achalensis"],["splot-23","Aa argyrolepis"],
["splot-24","Aa aurantiaca"],["splot-25","Aa calceata"],["splot-26","Achnella"],
["splot-27","Aa colombiana"],["splot-28","Aa denticulata"],["splot-29","Aa erosa"],["splot-30","Aa fiebrigii"]] 
```
So perfectly fine. But the API returns the same table as above with the names moved up.
So it seems to be rather an issue with the API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sometimes empty string as input moves entire names #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sometimes empty string as input moves entire names #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions