Feature/ibis execution api by egillax · Pull Request #13 · OHDSI/Circepy

egillax · 2026-02-11T21:35:42Z

Implements an experimental native Ibis execution API for Circepy with lazy relation building and
explicit materialization actions.

Adds IbisExecutor with build(), to_pandas(), to_polars(), and write() methods
Adds convenience wrappers: build_ibis, to_polars, write_cohort
Introduces ExecutionOptions for execution/schema/materialization/SQL-capture controls
Ports and simplifies executor logic from Mitos into Circepy where I used like a registry-based factory
Adds tests for API behavior and write-option validation
Validated DuckDB parity against CirceR row counts across phenotype-library cohorts (local integration
run)

Sorry for how large this is. The API is in circe/execution/ibis.py

There is some stuff in there I'm not completely satisifed with, but good enough for you to have a look @azimov

azimov · 2026-02-18T19:56:13Z

@egillax So I have finally had a chance to give a test with databricks and overall this seems positive. There is a lot of code to unpack so i will need to do a deeper review later but here are my initial notes (mainly so I don't forget but also to stimulate discussion):

Maybe we would need to implement some CohortGenerator style functions for execution or track what cohort ids are? The current executor.write method seems to work well once but is it going to work with repeated uses/incremental execution. I'm not sure what IBIS is doing under the hood and I would like to investigate in more detail. My preferred behaviour would be - if the cohort definition with the option id changes then it automatically overwrites, if its the same then it either throws an exception that can be handled (e.g. skips generation) or just carries on.
On this - would we want to implement subset cohorts here - maybe ibis can help handling these relations?
Clearer parameterization, perhaps convenience classes on top of what our R standards are?
I'm not sure how often users should use polars/pandas objects when working with cohorts? I can see some use cases for it but it won't scale well and users generally don't think about the impacts this sort of thing has.

Test code I used to get it working

from dotenv import load_dotenv
import os
import ibis
from circe.execution import ExecutionOptions, IbisExecutor
from examples.basic_cohort import create_diabetes_cohort

load_dotenv()

host = os.getenv("DATABRICKS_HOST")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
token = os.getenv("DATABRICKS_TOKEN")
scratch_schema = os.getenv("DATABRICKS_SCRATCH_SCHEMA")


# Normalize common user input issues.
host = host.removeprefix("https://").removeprefix("http://").rstrip("/")
if not http_path.startswith("/"):
    http_path = f"/{http_path}"

# Split catalog.schema for Unity Catalog-aware connections.
scratch_catalog = None
scratch_db = scratch_schema
if "." in scratch_schema:
    scratch_catalog, scratch_db = scratch_schema.split(".", 1)

cdm_schema = "<hidden>"

con = ibis.databricks.connect(
    server_hostname=host,
    http_path=http_path,
    token=token,
    catalog=scratch_catalog,
    schema=scratch_db,
)

options = ExecutionOptions(
    cdm_schema=cdm_schema,
    vocabulary_schema = cdm_schema,
    result_schema = scratch_schema,
    temp_emulation_schema = scratch_schema,
    cohort_id = 2
)

executor = IbisExecutor(conn=con, options=options)

cohort = create_diabetes_cohort()
executor.write(cohort, table="ibis_test_cohort", overwrite=False)

Overall this was pretty straightforward and I will play around more to give you more feedback soon

azimov · 2026-02-18T20:01:01Z

pyproject.toml

+ibis-postgres = [
+    "ibis-framework[postgres]>=11.0.0; python_version >= '3.9'",
+]
+ibis-databricks = [


Might be how I installed but I got some install errors with polars/pandas after the install

I think we can not depend on polars, which hopefully would fix that. Otherwise please let me know the particulars.

azimov · 2026-02-18T20:03:10Z

tests/test_package_structure.py


 class TestPackageStructure:
    """Test basic package structure and imports."""
-    


@egillax I think we need to agree on some IDE standards as these look like auto changes. I'm just using the PyCharm defaults but happy to change to something if you're using neovim?

(All this code was written by the ai though)

I use neovim. But I actually didn't even open it for this.. just codex and git diff when I wanted to look (and this PR).

I think this is because I told my agent to format according to standards in repo. And it called black on this file. So either my agent messed up not using it correctly or your agent.

But I agree, standardize and enforce conventions. And perhaps consider pre-commit hooks or CI to enforce (or both). Have not used pre-commit hooks before so not sure if it's more trouble than not.

azimov · 2026-02-18T20:05:15Z

circe/io.py

+ExpressionInput = Union[CohortExpression, Mapping[str, Any], str, Path]
+
+
+def load_expression(value: ExpressionInput) -> CohortExpression:


No objection to this function name, but we should probably have a single api standard for loading the expressions in the .api file?

Yes agree. This pass was mostly ripping stuff out of mitos and putting here. I think the agent didn't look enough at the current api.

azimov · 2026-02-18T20:07:54Z

circe/execution/options.py

+    cohort_id: Optional[int] = None
+
+    materialize_stages: bool = False
+    materialize_codesets: bool = True


One thing would be good to implement would be taking a checksum of a cohort and then re-using the materialization over and over. Not sure if that would be possible in this branch but it would be a pretty useful change over java.

Yes I think we can do that. Or you mean between execution of a relation or within?

egillax · 2026-02-19T14:44:59Z

Maybe we would need to implement some CohortGenerator style functions for execution or track what cohort ids are? The current executor.write method seems to work well once but is it going to work with repeated uses/incremental execution. I'm not sure what IBIS is doing under the hood and I would like to investigate in more detail. My preferred behaviour would be - if the cohort definition with the option id changes then it automatically overwrites, if its the same then it either throws an exception that can be handled (e.g. skips generation) or just carries on.

I think that sounds good.

On this - would we want to implement subset cohorts here - maybe ibis can help handling these relations?

I think that should be easy with ibis. Haven't actually used subsets before so not sure how it's done in CohortGenerator, or what's the api. But I think with ibis if you do build, then you can still add any kind of operations to the expression tree and it should work. executor.build(cohort).filter(subset).write(). Need to play around with it.

I'm not sure how often users should use polars/pandas objects when working with cohorts? I can see some use cases for it but it won't scale well and users generally don't think about the impacts this sort of thing has.

Agree, so these are already operations in ibis, so first of all no need to have them here. I was to much thinking of downstream where I want to stream the results to file/memory. I think it could be useful for testing, but then again they are already in ibis so you can use it. And we get rid of polar dependency here.

Clearer parameterization, perhaps convenience classes on top of what our R standards are?

I was actually working on a different design since I felt this had a lot of repetition. But that's mostly for internal organization. But I think you mean the api for users above ? Could you explain ?

egillax added 5 commits February 11, 2026 20:35

Add native ibis execution engine and expression loader

f9e2b92

Expose experimental execution API and optional extras

b6935e0

Add execution API tests and package structure coverage

578ffae

Validate write options for append vs overwrite

c0382ab

Raise ibis optional extra minimum to 11

c56d66f

azimov reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Feature/ibis execution api#13

Feature/ibis execution api#13
egillax wants to merge 5 commits intoOHDSI:developfrom
egillax:feature/ibis-execution-api

egillax commented Feb 11, 2026

Uh oh!

azimov commented Feb 18, 2026

Uh oh!

azimov Feb 18, 2026

Uh oh!

egillax Feb 19, 2026

Uh oh!

azimov Feb 18, 2026

Uh oh!

egillax Feb 19, 2026

Uh oh!

azimov Feb 18, 2026

Uh oh!

egillax Feb 19, 2026

Uh oh!

azimov Feb 18, 2026

Uh oh!

egillax Feb 19, 2026

Uh oh!

egillax commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		class TestPackageStructure:
		"""Test basic package structure and imports."""

		ExpressionInput = Union[CohortExpression, Mapping[str, Any], str, Path]


		def load_expression(value: ExpressionInput) -> CohortExpression:

Comments

Conversation

egillax commented Feb 11, 2026

Uh oh!

azimov commented Feb 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

egillax commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants