-
Notifications
You must be signed in to change notification settings - Fork 58
Description
Background
Predictive mathematical modeling is an essential part of systems biology and is interconnected with information management. Systems biology information is stored in specialized formats (http://co.mbine.org/) that facilitate data storage and analysis (e.g. http://sbgn.org/, http://sbml.org/). These formats are not designed for easy human readability and thus require specialized software to visualize and interpret results. There is an RDF/XML schema for storage of each type of data. Traditionally, to understand this data, you need specific tools that perform a specific set of queries. AI allows users to explore different type of data directly https://www.nature.com/articles/s41540-025-00496-z.
AI can help with both understanding, designing, implementing and simulating of mathematical models. The simplest example of a ChatGPT chatbot (https://chatgpt.com/g/g-n3asZvWaM-vcell-models-explorer) can explore VCell models (http://vcell.org/). The design of the Chatbot is described in the following presentation: https://drive.google.com/file/d/1jLNjl-ZxZDeGcRxVL754Oao4GYsM4u0Y/view. It is using VCell API https://vcell.cam.uchc.edu/api/v0/biomodel.
The more complicated AI site is implemented at https://github.com/virtualcell/VCell-AI. It can be easily tested using Docker, also can be installed localy (API key is required).
Finally, the non-AI interface for accessing and displaying VCell models is deployed at http://www.vcelldb.org (currently maybe working improperly), available at https://github.com/virtualcell/modelbricks-webapp. It is implemented in Handlebars, with most information retrieved via API, and some information stored locally as JSON.
Goal
Complete and integrate AI chatbot with vcelldb website, significantly extending and modifying it. The web interface should be able to query VCell resources and provide relevant information about VCell models, modeling techniques, and using VCell software.
Here are sample queries that the chatbot answers:
- Exploring VCell database, e.g.
* List all models by a certain user
* List all models that have a specific type of geometry (e.g. analytic, constructed solid, etc), that use specific solver (e.g. CVODE), etc.
* List all models that deal with Calcium - Exploring individual models (both from the database using auth0 authorization, and uploaded by a user):
* How many reactions are in a the model? Describe mathematics, parameters, simulations.
* What biological papers have similar modeling mechanisms?
* Draw the reaction diagram in SBGN format - Assist user in using VCell:
* How to model enzymatic reaction?
* How to define an analytic geometry?
* How to plot a histogram for multiple simulations?
* What do colors mean in spatial simulations plot?
The tasks for this summer:
- Improve prompt handling
- Improve outputs formatting
- Add Auth0 authentication (using Google, Facebook, ORCID, etc)
- Add AI usage control:
* Enforce per-user token limits
* Allow administrators to manage and adjust limits - Add local LLM option
- Add search in Biomodels DB https://www.biomodels.org/docs/
- Add search in PubMed https://pmc.ncbi.nlm.nih.gov/tools/developers/
- Assist a user in designing new models, e.g.
* Generate a model for ligand-binding to receptors
* Combine two existing models into a unified one. - Expand the project to SBML/BioPAX data, answering all the same queries
- Use visualization of outputs as implemented in https://bnglviz.github.io/examples.html (https://github.com/bnglViz/bnglViz.github.io)
- (optional) AI prompts to invoke pyVCell https://github.com/virtualcell/pyvcell
Difficulty Level: Medium/Hard
Size and Length of Project
- medium: 175 hours
- 12 weeks
Skills
The suggested technologies (to be discussed) are
- Frontend: Next.js 15 with TypeScript, Tailwind CSS, and Radix UI components
- Backend: FastAPI with Python 3.12+, Poetry for dependency management
- Vector Database: Qdrant for knowledge base storage and retrieval
- Containerization: Docker and Docker Compose
- LLM engineering: langfuse
- User Authentication: Auth0
- (optional) Hugging Face for building, training, and deploying ML models
- (optional) Llama (https://www.llama.com/) open-source AI model
- (optional) Unsloth AI (https://unsloth.ai/) for fine-tuning for Llama models
- (optional) NVIDIA for speeding up fetching answers
- (optional) Agentic AI
- (optional) Java to work with pyVCell https://github.com/virtualcell/pyvcell
Proposal
Please leave a quick note here but do not share technical details, send me an email. When writing a proposal, please use the template below as a guide for writing your proposal. Following this structure will help you better understand the coding process and project requirements. Your proposal should accomplish two key objectives:
• Demonstrate your technical skills required to complete the project (with mentor assistance if necessary).
• Highlight your personal skills essential for project success, such as time management, honesty, communication, and accountability.
Your proposal must be as specific as possible.
• If you mention a software framework, include a link to it.
• If you refer to your past work, provide a link to a specific GitHub repository (or another relevant source) where I can review your code.
Honesty is crucial. If you lack experience in a particular area, state that you are willing to learn and provide a reference to your learning source (e.g., another repo with a similar project). I will conduct video interviews with the authors of the proposals that stand out.
Personal Background: Briefly introduce yourself, including relevant details such as your GitHub, LinkedIn, and any software you have developed, or courses you took. If you mention a specific project, please provide a link to it.
Why Are You Interested in This Project? What excites you about this project? Why do you specifically want to work on it?
Technical Approach: For each deliverable, outline your approach (or multiple approaches, as experimentation is encouraged). Mention the programming languages, existing libraries, frameworks, or other tools you plan to use. Provide specific links to everything non-trivial.
Milestones and Expected Timeline: Provide your estimated milestones and timeline for project completion.
Challenges and Pitfalls: Identify potential challenges you foresee and suggest strategies to overcome them.
Documentation: Specify where you will track documentation (e.g., Google Docs, README files, GitHub issues). Will you use any AI-assisted tools for documentation?
IDE: which one you use?
Working Routine: Explain how you prefer to provide updates and quickly resolve questions (e.g., via email, Slack, etc.). Describe how you intend to track progress (e.g., Google Docs, GitHub issues, Trello). If you already have a repository where you’ve opened and closed multiple issues, feel free to share it.
Availability: List the days you will be available and any days you will be unavailable. It’s completely fine to take days off—just communicate them ahead of time.
Communication Routine: Indicate your preferred (and unpreferred) communication platforms (e.g., Zoom, Webex, Slack). Would you be comfortable with daily meetings initially? The default schedule is daily at the start, becoming less frequent over time. However, if you prefer a different pace, such as spending more time on tasks before meeting, please specify. Also, provide your general availability in EST.
What Do You Expect to Gain: Beyond monetary rewards, what do you hope to gain from this project and from GSoC in general? What skills do you want to develop?
Public Repository
https://github.com/virtualcell/VCell-AI
Potential Mentors
Michael Blinov (blinov@uchc.edu), James Schaff (schaff@uchc.edu), Ming Zhang (mingzh@lanl.gov)
AI usage policy
AI tools can be used to help with coding, provided all code is .