Skip to content

Conversation

@CarolineDenis
Copy link
Contributor

@CarolineDenis CarolineDenis commented Apr 18, 2025

Fixes #6294, #7488, #7558

Adds tree/create_default_tree/ endpoint to create or populate a tree with records from a CSV retrived from a URL.
Also fetches and displays a list of available default taxon trees in the Tree Viewer tree creation dialog.

TODO:

Details
  • Make /create_default_tree/ accept a CSV url and discipline name. (Right now the frontend fetches a list of CSV files and sends a filename in the request, then the backend extracts the discipline from the filename, and then the backend chooses a url for the discipline from its own list.)
  • Make the default tree creation progress bar functional
  • Make default tree creation reliable. Retry failed rows? Show detailed errors if something goes wrong?
  • Allow the backend trigger default tree creation (In order to do it within the setup process).
  • Move tree mapping to hosted .json files

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone
  • Add relevant documentation (Tester - Dev)
  • Add automated tests
  • Add a reverse migration if a migration is present in the PR

Testing instructions

  • Go to tree viewer for Taxon
  • Click on the plus button to create a new tree
  • See that a list of default Taxon trees is displayed
  • Click on a default tree and see you get a notification that it is being created.
  • Wait and make sure the default tree is successfully created
  • Refresh the page
  • Create a new empty tree and make sure it is created successfully.
  • Make sure you can start creating a tree and cancel it with the Cancel button in the popup.
  • Make sure you can cancel the tree from the "in progress" notification as well.
  • Create a Geography tree:
    • Go to yourdb.test.specifysystems.org/documentation/api/operations/all/
    • Scroll down to /api/create_default_tree/ under the api section and click on it.
    • Click "Try it out" and type the following (and fill in your collection name):
{
  "url": "https://files.specifysoftware.org/geographyfiles/geonames.csv",
  "mappingUrl": "https://files.specifysoftware.org/treerows/geography.json",
  "disciplineName": "geography",
  "collection": "Your Collection Name",
  "rowCount": 42266
}
    • Click execute and make note of the response you got below.
    • The new secondary Geography tree should be created after a few minutes.
    • You can track the progress by going to yourdb.test.specifysystems.org/api/create_default_tree/status/YOUR_TASK_ID/. You should see your task id in the response you got.
  • Create a Chronostratigraphy tree:
{
  "url": "https://files.specifysoftware.org/chronostratfiles/GeologicTimePeriod.csv",
  "mappingUrl": "https://files.specifysoftware.org/treerows/geologictimeperiod.json",
  "disciplineName": "geologictimeperiod",
  "collection": "Your Collection Name",
  "rowCount": 165
}
    • Click execute and make note of the response you got below.
    • The new Chronostrat tree should be created after a few minutes.
  • Feedback on the new tree menu is also appreciated.

@CarolineDenis
Copy link
Contributor Author

NOTES:

  • There are no default for geology trees

@CarolineDenis
Copy link
Contributor Author

CarolineDenis commented Sep 2, 2025

TODO:

  • When testing the feature, it fails when running
def create_default_trees_view(request):

because:

logged_in_collection_name = request.user.logincollectionname 

returns None for me

@alesan99
Copy link
Contributor

whenever i execute the API command I get the following bad request response when trying to create the secondary geography tree

Whoops, should work now 👍 I gave it a quick test to confirm

Copy link
Collaborator

@emenslin emenslin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • See that a list of default Taxon trees is displayed
  • Click on a default tree and see you get a notification that it is being created.
  • Wait and make sure the default tree is successfully created
  • Create a new empty tree and make sure it is created successfully.
  • Make sure you can start creating a tree and cancel it with the Cancel button in the popup.
  • Make sure you can cancel the tree from the "in progress" notification as well.
    • The new secondary Geography tree should be created after a few minutes.
    • The new Chronostrat tree should be created after a few minutes.

  • I tried uploading all taxon trees, everything uploads successfully except the ornithology tree.
  • The root node for geography is just called 'Root' but should be called 'Earth'
  • The root node for Chronostrat is called 'Root' but should be called 'Time'
  • The chronostratigraphy tree doesn't have an Eon rank. I'm not sure if this is necessary but I wanted to mention it in case it is.
  • The full name separator for the root ranks for both chronostrat and geology are set to but should be set to , .

@alesan99
Copy link
Contributor

  • I tried uploading all taxon trees, everything uploads successfully except the ornithology tree.

Hm not sure why its failing, I increased the re-connection delay, so it might work now?

@emenslin
Copy link
Collaborator

emenslin commented Jan 2, 2026

Hm not sure why its failing, I increased the re-connection delay, so it might work now?

It's still failing at the end of the tree creation

@alesan99
Copy link
Contributor

alesan99 commented Jan 6, 2026

TODO:

  • Remove cancel button from progress dialog
  • Move ornithology tree (and all trees) away from the specify file server

@alesan99 alesan99 requested a review from emenslin January 6, 2026 20:21
@alesan99
Copy link
Contributor

alesan99 commented Jan 6, 2026

@emenslin Ready to be tested again 👍

The bird tree has been moved onto the same server as the other trees, so it should now download reliably

@alesan99 alesan99 requested a review from a team January 6, 2026 20:23
Copy link
Collaborator

@emenslin emenslin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • See that a list of default Taxon trees is displayed
  • Click on a default tree and see you get a notification that it is being created.
  • Wait and make sure the default tree is successfully created
  • Create a new empty tree and make sure it is created successfully.
  • Make sure you can start creating a tree and cancel it with the Cancel button in the popup.
  • Make sure you can cancel the tree from the "in progress" notification as well.
    • The new secondary Geography tree should be created after a few minutes.
    • The new Chronostrat tree should be created after a few minutes.

The ornithology tree uploaded successfully, looks good! I did notice that the geography root rank is called "Earth" when I believe it should be "Planet", however, I know it's called Earth in some trees and Planet in others so it's probably not a big deal.

@emenslin emenslin requested a review from a team January 6, 2026 21:39
Copy link
Contributor

@Iwantexpresso Iwantexpresso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • See that a list of default Taxon trees is displayed
  • Click on a default tree and see you get a notification that it is being created.
  • Wait and make sure the default tree is successfully created
  • Create a new empty tree and make sure it is created successfully.
  • Make sure you can start creating a tree and cancel it with the Cancel button in the popup.
  • Make sure you can cancel the tree from the "in progress" notification as well.
  • The a new secondary Geography tree should be created after a few minutes.
  • The new Chronostrat tree should be created after a few minutes.

just tested this on Calvert and it looks great! nice job y'all !

Copy link
Member

@grantfitzsimmons grantfitzsimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • See that a list of default Taxon trees is displayed

There is almost no space between the "Populated Trees" and "Empty Trees" sections, making it difficult to differentiate between the list of trees that are pre-loaded with data and the list of empty trees that can be chosen.

Image

More importantly, the source data we are providing says it is from 2008, which is incredibly out of date (for most disciplines). The defaults we provide in Specify 6 today are from Catalog of Life 2021 in most cases, see https://files.specifysoftware.org/taxonfiles/

https://files.specifysoftware.org/taxonfiles/taxonfiles_2.xml

Image

Collections curators and managers want to use the most up-to-date taxonomic data possible when using default data, so we need to make sure these updated spreadsheets are being used.

  • Click on a default tree and see you get a notification that it is being created.
Image

Some suggestions:

  • The dialog header "The Default Tree creation process has started." should not have a period in it (to be stylistically consistent with other dialogs in the app) and should either be entirely proper case or 'default tree' should be lower case (as it is not a proper noun).
    • Better yet, it should tell the user what tree it is making, as while the 20,000+ records are added, there is no indication in this dialog confirming they made the correct selection. The notifications do include the name of the tree, but this is not visible to the user while this process happens
  • There is no "Close" button so the user can resume their work, only a red Cancel button. Since users instinctively click the primary action button, they may accidentally cancel this and not understand it will happen in the background. We could add a Close button after the cancel button to prevent this from happening instinctively.[Cancel] [Primary Action].
    • In fairness, we might not even need to show the user progress at all, as long as the worker is functional, everything should just work.
  • Wait and make sure the default tree is successfully created

This is quite slow... I chose the 'Ichthyology' tree and began populating it at 1:18 (according to the network requests). It crawled along, finishing populating entirely at 1:47 (29 minutes total) on the test panel.

Thinking of the (expectedly common) case of someone creating a database with several disciplines with a large number of trees, they would be forced to queue and wait for these jobs to finish over a long period of time.

It would make a lot of sense to batch these and make this action more efficient to prevent frustration.

The notification showing "Cancel" persists regardless of the state (success|failure).

Image
  • Create a new empty tree and make sure it is created successfully.

That was fast 😄

  • Make sure you can start creating a tree and cancel it with the Cancel button in the popup.
  • Make sure you can cancel the tree from the "in progress" notification as well.

This technically works, but–

Cancelling the tree still sends 3 notifications:

  1. The Default Tree creation process has started.
  2. Default Tree creation in progress.
  3. Default Tree creation failure.
    Aborted

My suggestion:

  • Remove the 'Task ID' collapsible section from all notifications– the users should not see this.

  • Remove the giant red "Cancel" button from the notification. If the user wants to remove this tree, they should simply be able to do so after it is created (#7594)

  • Send only two notifications, one saying the process has started and one saying the tree creation was cancelled. Saying "Default Tree creation failure. Aborted" is overwhelmingly negative and says:

    • The software failed unintentionally ("creation failure")
    • We use programmer lingo ("aborted" instead of "cancelled")
    Image
  • Create a Geography tree:
    • Click execute and make note of the response you got below.
Image
  • The new secondary Geography tree should be created after a few minutes.
  • You can track the progress by going to yourdb.test.specifysystems.org/api/create_default_tree/status/YOUR_TASK_ID/. You should see your task id in the response you got.
  • Create a Chronostratigraphy tree:
    • Click execute and make note of the response you got below.
Image
  • The new Chronostrat tree should be created after a few minutes.

Other Thoughts

The core issue, as I see it, is that the user needs to create a taxon tree populated with our default data.

From my conversation with @alesan99, I understand that a big challenge we face is the time it takes to build these trees. This PR so far is attempting to develop a robust system that will inform users about the current status of the background process, including notifications, status updates, a progress bar, and options to cancel the action, to address this slowdown.

My (perhaps naïve) perspective is that we should re-evaluate how we provide the default trees to make this process nearly instantaneous and eliminate previous complications. If we can download the full tree CSV file and import it all at once, the extra work may be unnecessary. The faster this process completes, the better the user experience. Fewer messages for the user also would improve the experience.

In Specify 6, this took only a few seconds (the file downloaded locally, the tree was populated, the user moved on). Maybe there are lessons there we can use to make this more elegant.

This PR is loading the data row-by-row while Specify 6 used batch database operations, resulting in many more database round-trips.

  • We could use bulk_create() instead of individual saves
  • Seems pertinent to pre-fetch all existing nodes and ranks
  • Batch inserts in groups?
  • Maybe we need to consider raw SQL for the import portion if that's feasible...

grantfitzsimmons and others added 2 commits January 9, 2026 13:49
Add back in close button to tree creation dialog.
Add spacing between populated trees and empty trees
@alesan99
Copy link
Contributor

@grantfitzsimmons Addressed UI issues 👍
I will attempt to speed up tree creation as discussed here: #7641

I can also update the tree files independently of this PR, so I will be working on updating those soon as well.

Copy link
Member

@grantfitzsimmons grantfitzsimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One usability concern I have is that if you dismiss the creation dialog, you are still shown the 'Add Tree' dialog, which may lead users to click the tree again instinctively, creating two trees:

Screen.Recording.2026-01-13.at.11.06.52.mov

Solution: Hide the 'Add Tree' dialog once creation begins

Since the tree creation is not fast yet (see #7641), the user is left wondering if the tree they started creating is being made again. I was not sent any notification for either of the trees I made. In the logs for the worker or specify7 container, I don't see an indication either of whether or not the tree is being created.

Until it becomes fast enough that the time between the tree creation being initiated and the time the tree exists, we need some mechanism to show the user this is happening.

I imagine this is just a bug? It looks like the intention was for this notification to be sent @alesan99

"Everything should be as simple as it can be, but not simpler"

@alesan99
Copy link
Contributor

Solution: Hide the 'Add Tree' dialog once creation begins

Done 👍

Until it becomes fast enough that the time between the tree creation being initiated and the time the tree exists, we need some mechanism to show the user this is happening.

I imagine this is just a bug? It looks like the intention was for this notification to be sent @alesan99

It is indeed a bug, a notification should be sent out as soon as the worker receives the task.
Since nothing is being shown in the logs either I assume there was an issue with the worker receiving the task.

I could move the notification trigger to the main process, rather than the worker, but that risks lying in cases where the task didn't actually start (which looks like what's happening in this case maybe?). Perhaps I can at least add a failure notification if there is some way to detect if the worker didn't start the task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Dev Attention Needed

Development

Successfully merging this pull request may close these issues.

Add default Chronostratigraphy tree Add default Geography tree [Guided Setup] - Add a tool to import a default tree

7 participants