Conversation
|
@egillax So I have finally had a chance to give a test with databricks and overall this seems positive. There is a lot of code to unpack so i will need to do a deeper review later but here are my initial notes (mainly so I don't forget but also to stimulate discussion):
Test code I used to get it working from dotenv import load_dotenv
import os
import ibis
from circe.execution import ExecutionOptions, IbisExecutor
from examples.basic_cohort import create_diabetes_cohort
load_dotenv()
host = os.getenv("DATABRICKS_HOST")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
token = os.getenv("DATABRICKS_TOKEN")
scratch_schema = os.getenv("DATABRICKS_SCRATCH_SCHEMA")
# Normalize common user input issues.
host = host.removeprefix("https://").removeprefix("http://").rstrip("/")
if not http_path.startswith("/"):
http_path = f"/{http_path}"
# Split catalog.schema for Unity Catalog-aware connections.
scratch_catalog = None
scratch_db = scratch_schema
if "." in scratch_schema:
scratch_catalog, scratch_db = scratch_schema.split(".", 1)
cdm_schema = "<hidden>"
con = ibis.databricks.connect(
server_hostname=host,
http_path=http_path,
token=token,
catalog=scratch_catalog,
schema=scratch_db,
)
options = ExecutionOptions(
cdm_schema=cdm_schema,
vocabulary_schema = cdm_schema,
result_schema = scratch_schema,
temp_emulation_schema = scratch_schema,
cohort_id = 2
)
executor = IbisExecutor(conn=con, options=options)
cohort = create_diabetes_cohort()
executor.write(cohort, table="ibis_test_cohort", overwrite=False)Overall this was pretty straightforward and I will play around more to give you more feedback soon |
| ibis-postgres = [ | ||
| "ibis-framework[postgres]>=11.0.0; python_version >= '3.9'", | ||
| ] | ||
| ibis-databricks = [ |
There was a problem hiding this comment.
Might be how I installed but I got some install errors with polars/pandas after the install
There was a problem hiding this comment.
I think we can not depend on polars, which hopefully would fix that. Otherwise please let me know the particulars.
|
|
||
| class TestPackageStructure: | ||
| """Test basic package structure and imports.""" | ||
There was a problem hiding this comment.
@egillax I think we need to agree on some IDE standards as these look like auto changes. I'm just using the PyCharm defaults but happy to change to something if you're using neovim?
(All this code was written by the ai though)
There was a problem hiding this comment.
I use neovim. But I actually didn't even open it for this.. just codex and git diff when I wanted to look (and this PR).
I think this is because I told my agent to format according to standards in repo. And it called black on this file. So either my agent messed up not using it correctly or your agent.
But I agree, standardize and enforce conventions. And perhaps consider pre-commit hooks or CI to enforce (or both). Have not used pre-commit hooks before so not sure if it's more trouble than not.
| ExpressionInput = Union[CohortExpression, Mapping[str, Any], str, Path] | ||
|
|
||
|
|
||
| def load_expression(value: ExpressionInput) -> CohortExpression: |
There was a problem hiding this comment.
No objection to this function name, but we should probably have a single api standard for loading the expressions in the .api file?
There was a problem hiding this comment.
Yes agree. This pass was mostly ripping stuff out of mitos and putting here. I think the agent didn't look enough at the current api.
| cohort_id: Optional[int] = None | ||
|
|
||
| materialize_stages: bool = False | ||
| materialize_codesets: bool = True |
There was a problem hiding this comment.
One thing would be good to implement would be taking a checksum of a cohort and then re-using the materialization over and over. Not sure if that would be possible in this branch but it would be a pretty useful change over java.
There was a problem hiding this comment.
Yes I think we can do that. Or you mean between execution of a relation or within?
I think that sounds good.
I think that should be easy with ibis. Haven't actually used subsets before so not sure how it's done in CohortGenerator, or what's the api. But I think with ibis if you do build, then you can still add any kind of operations to the expression tree and it should work.
Agree, so these are already operations in ibis, so first of all no need to have them here. I was to much thinking of downstream where I want to stream the results to file/memory. I think it could be useful for testing, but then again they are already in ibis so you can use it. And we get rid of polar dependency here.
I was actually working on a different design since I felt this had a lot of repetition. But that's mostly for internal organization. But I think you mean the api for users above ? Could you explain ? |
Implements an experimental native Ibis execution API for Circepy with lazy relation building and
explicit materialization actions.
run)
Sorry for how large this is. The API is in circe/execution/ibis.py
There is some stuff in there I'm not completely satisifed with, but good enough for you to have a look @azimov