-
Notifications
You must be signed in to change notification settings - Fork 18
chore: add pre commit #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
<!-- Thank you for contributing to OceanBase! Please feel free to ping the maintainers for the review! --> ## Summary <!-- Please clearly and concisely describe the purpose of this pull request. If this pull request resolves an issue, please link it via "close #xxx" or "fix #xxx". --> The rag demo has supported hybrid search.The readme need to be updated. close oceanbase#60 ## Solution Description <!-- Please clearly and concisely describe your solution. -->
Added comprehensive PR review report and design document template.
…ceanbase#72) - Add test_detect_db_basic.py with basic functionality tests - Test SeekDB and OceanBase server detection - Test connection establishment and return format validation <!-- Thank you for contributing to OceanBase! Please feel free to ping the maintainers for the review! --> ## Summary <!-- Please clearly and concisely describe the purpose of this pull request. If this pull request resolves an issue, please link it via "close #xxx" or "fix #xxx". --> Supports database type and version judgment for seekdk and oceanbase Close oceanbase#71 ## Solution Description <!-- Please clearly and concisely describe your solution. --> Co-authored-by: xxsc0529 <xxsc0529@users.noreply.github.com>
## Summary <!-- Please clearly and concisely describe the purpose of this pull request. If this pull request resolves an issue, please link it via "close #xxx" or "fix #xxx". --> close oceanbase#98 If python dlopen pylibseekdb and then other libraries with ABI=1, then it would coredump. For details, please refer to oceanbase#98 ## Solution Description <!-- Please clearly and concisely describe your solution. --> Force pyseekdb load a library with ABI=1 flag before opening pylibseekdb.
📝 WalkthroughWalkthroughAdds CI quality job and pre-commit tooling, applies repository-wide style normalization, introduces defensive embedding-dimension inference and OpenAI embedding refinements, and refactors JSON/filter SQL construction and embedded-client cursor/metadata handling. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In @.pre-commit-config.yaml:
- Around line 1-20: Update the pre-commit hook revisions to the requested newer
releases by changing the rev for https://github.com/pre-commit/pre-commit-hooks
from "v5.0.0" to "v6.0.0" and the rev for
https://github.com/astral-sh/ruff-pre-commit from "v0.12.7" to "v0.14.11" in
.pre-commit-config.yaml, then run pre-commit autoupdate or validate by running
pre-commit run --all-files to ensure the new hooks install and pass.
In @demo/rag/embedding_function_factory.py:
- Around line 103-110: The _ensure_model_loaded method is missing a guard and
currently calls the Embeddings API on every access; update _ensure_model_loaded
to first check if self._dimension is None and only run the OpenAI client call
and set self._dimension when it is None (i.e., wrap the existing try/except body
in "if self._dimension is None:"), so subsequent accesses to the dimension
property return the cached value without repeated API requests.
- Around line 15-30: The fallback logic in __init__ is wrong because the default
parameters are non-empty strings so the environment variables never get used;
change the signature to accept None (e.g., model_name: Optional[str] = None,
device: Optional[str] = None) and then set self.model_name = model_name or
os.environ.get("SENTENCE_TRANSFORMERS_MODEL_NAME") or "all-mpnet-base-v2" and
self.device = device or os.environ.get("SENTENCE_TRANSFORMERS_DEVICE") or "cpu"
(or equivalently keep defaults but use a conditional that prefers env vars when
arguments are None/empty) so the environment variable fallbacks actually take
effect.
In @src/pyseekdb/client/client_seekdb_embedded.py:
- Around line 291-292: The get_database SQL concatenates the unescaped parameter
name into the query causing potential SQL injection; update the SQL construction
in get_database to escape the name before interpolation (use the
already-imported escape_string on the name) or switch to a parameterized query
if _execute supports parameters, then pass the escaped/parameterized value into
the SQL call (refer to get_database, variable name, and the _execute method).
In @src/pyseekdb/client/client_seekdb_server.py:
- Around line 194-199: The get_database() method constructs SQL with
single-quoted string interpolation which risks SQL injection and is inconsistent
with create_database()/delete_database(); update the SQL in get_database() to
quote the identifier with backticks like the others (e.g., change WHERE
SCHEMA_NAME = '{name}' to WHERE SCHEMA_NAME = `{name}`) so the generated query
matches the identifier quoting pattern used elsewhere before calling
self._execute(sql).
In @tests/integration_tests/test_empty_value_handling.py:
- Around line 246-248: The loop that iterates with enumerate over
zip(updated_docs, updated_metas) should use zip(strict=True) to ensure both
iterables have the same length; update the call in the for loop (the
enumerate(zip(...)) usage) to enumerate(zip(updated_docs, updated_metas,
strict=True)) so mismatched test data fails fast.
🧹 Nitpick comments (20)
demo/rag/.env.example (1)
25-25: LGTM! EOF newline formatting applied correctly.The addition of the end-of-file newline is consistent with standard formatting practices and aligns with the pre-commit standardization goals of this PR.
Note: The static analysis tool suggests reordering
COLLECTION_NAMEalphabetically beforeSEEKDB_DIR. However, the current grouping of seekdb-related configurations together (lines 22-25) is logical and aids readability, so maintaining the current order is reasonable.demo/rag/llm.py (1)
14-21: Type hint improvement for future consideration.Static analysis flags line 18's implicit
Optionalas non-compliant with PEP 484. The parameter should use explicit optional syntax:model: str | None = Noneormodel: Optional[str] = None.Since this PR intentionally excludes type-checking tools to limit scope, this can be addressed in a follow-up PR focused on type annotations.
Based on static analysis hints.
tests/integration_tests/test_security_sql_injection.py (1)
97-97: Optional: Remove unnecessary f-string prefixes.Several print statements use f-strings without any placeholders. While harmless, removing the
fprefix improves code cleanliness.♻️ Example fix
- print(f"\n🔒 Running security SQL injection tests") + print("\n🔒 Running security SQL injection tests")Apply similar changes to lines 127, 150, 170, 209, 254, 299, and 337.
Also applies to: 127-127, 150-150, 170-170, 209-209, 254-254, 299-299, 337-337
demo/rag/seekdb_utils.py (1)
18-23: Mutable default argument:DefaultEmbeddingFunction()instantiated at function definition time.Using a function call as a default argument value means the same instance is shared across all calls. While this may work for stateless embedding functions, it's a known anti-pattern (Ruff B008). Consider using
Noneand instantiating inside the function:Suggested fix
def get_seekdb_collection( client, collection_name: str = "embeddings", - embedding_function: Optional[EmbeddingFunction] = DefaultEmbeddingFunction(), + embedding_function: Optional[EmbeddingFunction] = None, drop_if_exists: bool = True, ): + if embedding_function is None: + embedding_function = DefaultEmbeddingFunction()demo/rag/seekdb_insert.py (1)
68-70: Consider using explicitOptionalor| Nonetype hints.The parameters
db_dir,db_name, andcollection_namedefault toNonebut lack explicitOptionaltype annotations, which violates PEP 484.♻️ Suggested fix
+from typing import Optional + def process_and_insert_data( - data_path: str, db_dir: str = None, db_name: str = None, collection_name: str = None + data_path: str, + db_dir: Optional[str] = None, + db_name: Optional[str] = None, + collection_name: Optional[str] = None, ):Or using Python 3.10+ syntax:
def process_and_insert_data( - data_path: str, db_dir: str = None, db_name: str = None, collection_name: str = None + data_path: str, + db_dir: str | None = None, + db_name: str | None = None, + collection_name: str | None = None, ):src/pyseekdb/client/sql_utils.py (1)
52-55: Consider usingTypeErrorfor type validation.When checking
isinstance()and the type doesn't match,TypeErroris the idiomatic exception rather thanValueError.♻️ Suggested fix
if not isinstance(id_name, str): - raise ValueError( + raise TypeError( f"Identifier should be string type, but got {type(id_name).__name__}" )tests/integration_tests/test_empty_value_handling.py (1)
49-49: Unnecessary f-string prefix.This string has no placeholders; the
fprefix can be removed.♻️ Suggested fix
- print(f"✅ All empty value tests passed") + print("✅ All empty value tests passed")Similar occurrences at lines 61, 109, 150, and 192.
tests/integration_tests/test_detect_db_type_and_version.py (1)
41-43: Remove unnecessaryfprefix from strings without placeholders.Lines 41, 64, 87, and 106 use f-strings that contain only static text (emoji) without any variable placeholders. The
fprefix is unnecessary.🔧 Suggested fix
- print(f"\n✅ Version comparison tests passed") + print("\n✅ Version comparison tests passed")Similarly for lines 64, 87, and 106.
tests/integration_tests/test_fulltext_parser_config.py (2)
15-17: Consider using explicitOptionalfor theparamsparameter.The type hint
params: dict = Noneuses implicitOptional, which PEP 484 discourages. For consistency and clarity, consider making it explicit.🔧 Suggested fix
- def _test_fulltext_parser_config( - self, client, parser_name: str, params: dict = None - ): + def _test_fulltext_parser_config( + self, client, parser_name: str, params: dict | None = None + ):
188-190: Remove unnecessaryfprefix.The f-string on line 189 has no variable placeholders.
🔧 Suggested fix
- print( - f"\n✅ Backward compatibility: HNSWConfiguration defaults to ik parser" - ) + print( + "\n✅ Backward compatibility: HNSWConfiguration defaults to ik parser" + )tests/integration_tests/test_admin_database_management.py (2)
77-77: Remove extraneousfprefix from string without placeholders.Per static analysis (Ruff F541), this f-string has no placeholders and can be a regular string.
Suggested fix
- print(f"\n📋 Step 5: List all databases to verify deletion") + print("\n📋 Step 5: List all databases to verify deletion")
88-88: Remove extraneousfprefix from string without placeholders.Per static analysis (Ruff F541), this f-string has no placeholders and can be a regular string.
Suggested fix
- print(f"\n🎉 All database management operations completed successfully!") + print("\n🎉 All database management operations completed successfully!")Makefile (1)
4-19: Good additions for code quality workflow.The new targets appropriately integrate pre-commit hooks into the development workflow. The
checktarget combines lock file verification with pre-commit, suitable for CI, whilepre-commitprovides a standalone hook runner.Consider making
checkdepend onpre-committo reduce duplication:Optional: DRY improvement
.PHONY: check check: ## Run code quality tools @echo ">> Checking lock file consistency with 'pyproject.toml'" @$(UV) lock --locked - @echo ">> Running pre-commit hooks" - @$(UV) run prek run -a + @$(MAKE) pre-committests/integration_tests/test_collection_hybrid_search.py (1)
18-20: Consider explicitOptionaltype hints.The type hints use implicit
Optional(e.g.,dimension: int = None), which PEP 484 discourages. This is a minor style nitpick.Optional: Update to explicit Optional types
- def _create_test_collection( - self, client, collection_name: str, dimension: int = None - ): + def _create_test_collection( + self, client, collection_name: str, dimension: int | None = None + ):Similarly for
_generate_query_vector:- def _generate_query_vector( - self, dimension: int, base_vector: List[float] = None - ) -> List[float]: + def _generate_query_vector( + self, dimension: int, base_vector: List[float] | None = None + ) -> List[float]:tests/integration_tests/test_collection_hybrid_search_builder_integration.py (1)
35-158: Consider consolidating duplicated test helpers.The helper methods
_create_test_collection,_generate_query_vector, and_insert_test_dataare nearly identical to those intest_collection_hybrid_search.py. This duplication increases maintenance burden.Consider extracting these helpers into a shared module (e.g.,
tests/integration_tests/helpers.pyor adding them toconftest.pyas fixtures) to reduce duplication and ensure consistency across test files.# Example: tests/integration_tests/test_helpers.py def create_test_collection(client, collection_name: str, dimension: int | None = None): ... def generate_query_vector(dimension: int, base_vector: list[float] | None = None) -> list[float]: ... def insert_test_data(client, collection_name: str, dimension: int = 3) -> list[str]: ...src/pyseekdb/client/embedding_function.py (2)
88-97: Fallback dimension inference may have unintended side effects.The fallback logic calls
embedding_function.__call__("seekdb")to infer the dimension when nodimensionattribute exists. This can trigger model initialization (e.g., downloading ONNX models) or network calls for API-based embedding functions, which may be unexpected for users just querying the dimension.Consider documenting this behavior or providing an alternative that doesn't require invoking the embedding function.
275-275: Fullwidth comma in comment.This comment contains an ambiguous fullwidth comma (,). Consider using a regular comma (,) for consistency, as flagged by static analysis (RUF003).
demo/rag/embedding_function_factory.py (1)
136-138: Consider caching the OpenAI client.A new
OpenAIclient is instantiated on every__call__invocation. Caching it as an instance attribute (initialized once in_ensure_model_loaded) would reduce overhead.♻️ Suggested improvement
def _ensure_model_loaded(self): """Lazy load the Embedding API model""" + if self._client is not None: + return try: - client = OpenAI(api_key=self.api_key, base_url=self.base_url) - response = client.embeddings.create(model=self.model_name, input=["test"]) + self._client = OpenAI(api_key=self.api_key, base_url=self.base_url) + response = self._client.embeddings.create(model=self.model_name, input=["test"]) self._dimension = len(response.data[0].embedding) except Exception as e: raise ValueError(f"Failed to load Embedding API model: {e}")Then in
__call__:# Call Embedding API - client = OpenAI(api_key=self.api_key, base_url=self.base_url) - response = client.embeddings.create(model=self.model_name, input=input) + self._ensure_model_loaded() + response = self._client.embeddings.create(model=self.model_name, input=input)Don't forget to initialize
self._client = Nonein__init__.src/pyseekdb/client/filters.py (2)
18-32: Consider annotating class-level constants withClassVar.These mutable class attributes are used as constants. Adding
ClassVarannotation would make the intent explicit and satisfy static analysis tools.♻️ Suggested improvement
+from typing import Any, ClassVar, Dict, List, Optional, Tuple -from typing import Any, Dict, List, Optional, Tuple class FilterBuilder: """Build SQL WHERE clauses from filter dictionaries""" # Comparison operators mapping - COMPARISON_OPS = { + COMPARISON_OPS: ClassVar[Dict[str, str]] = { "$eq": "=", "$lt": "<", "$gt": ">", "$lte": "<=", "$gte": ">=", "$ne": "!=", } # Logical operators - LOGICAL_OPS = ["$and", "$or", "$not"] + LOGICAL_OPS: ClassVar[List[str]] = ["$and", "$or", "$not"] # Document operators - DOCUMENT_OPS = ["$contains", "$regex"] + DOCUMENT_OPS: ClassVar[List[str]] = ["$contains", "$regex"]
410-416: Minor: Prefernext(iter(...))over list slicing for single element.Creating a list just to access the first element is less efficient than using an iterator.
♻️ Suggested improvement
# If only one type of condition, simplify if len(result["bool"]) == 1: - key = list(result["bool"].keys())[0] + key = next(iter(result["bool"].keys())) conditions = result["bool"][key] if len(conditions) == 1: return conditions[0] return {"bool": {key: conditions}}
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (55)
.github/workflows/ci.yml.github/workflows/python-publish.yml.pre-commit-config.yamlLICENSEMakefiledemo/rag/.env.exampledemo/rag/.streamlit/config.tomldemo/rag/README.mddemo/rag/README_CN.mddemo/rag/embedding_function_factory.pydemo/rag/llm.pydemo/rag/seekdb_app.pydemo/rag/seekdb_insert.pydemo/rag/seekdb_utils.pydocs/api/index.rstdocs/conf.pyexamples/complete_example.pyexamples/hybrid_search_example.pyexamples/simple_example.pypyproject.tomlsrc/pyseekdb/__init__.pysrc/pyseekdb/client/__init__.pysrc/pyseekdb/client/admin_client.pysrc/pyseekdb/client/base_connection.pysrc/pyseekdb/client/client_base.pysrc/pyseekdb/client/client_seekdb_embedded.pysrc/pyseekdb/client/client_seekdb_server.pysrc/pyseekdb/client/collection.pysrc/pyseekdb/client/configuration.pysrc/pyseekdb/client/database.pysrc/pyseekdb/client/embedding_function.pysrc/pyseekdb/client/filters.pysrc/pyseekdb/client/hybrid_search.pysrc/pyseekdb/client/meta_info.pysrc/pyseekdb/client/sql_utils.pysrc/pyseekdb/client/version.pysrc/pyseekdb/utils/embedding_functions/sentence_transformer_embedding_function.pytests/integration_tests/conftest.pytests/integration_tests/test_admin_database_management.pytests/integration_tests/test_client_creation.pytests/integration_tests/test_collection_dml.pytests/integration_tests/test_collection_embedding_function.pytests/integration_tests/test_collection_get.pytests/integration_tests/test_collection_hybrid_search.pytests/integration_tests/test_collection_hybrid_search_builder_integration.pytests/integration_tests/test_collection_query.pytests/integration_tests/test_default_embedding_function.pytests/integration_tests/test_detect_db_type_and_version.pytests/integration_tests/test_empty_value_handling.pytests/integration_tests/test_fulltext_parser_config.pytests/integration_tests/test_offical_case.pytests/integration_tests/test_security_sql_injection.pytests/unit_tests/test_collection_name_validation.pytests/unit_tests/test_configuration.pytests/unit_tests/test_version.py
💤 Files with no reviewable changes (1)
- docs/api/index.rst
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: oceanbase/pyseekdb PR: 0
File: .github/instructions/*.instructions.md:0-0
Timestamp: 2025-12-19T08:03:58.903Z
Learning: Perform comprehensive PR reviews following a structured format including PR Overview Statistics, Design Documentation, Background & Problem Statement, Architecture Design Changes, and Detailed Change Analysis
📚 Learning: 2025-12-22T12:44:52.456Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_collection_dml.py:36-36
Timestamp: 2025-12-22T12:44:52.456Z
Learning: In integration tests under tests/integration_tests, it is acceptable to use the internal db_client._server._execute() to run raw SQL (e.g., CREATE TABLE) when needing custom table setup with specific vector index configurations. Prefer using public/test harness APIs for common setup, and clearly document and limit the use of internal calls to ensure test reliability. Ensure proper cleanup and isolation for each test, and avoid leaking internal APIs into production or non-test code.
Applied to files:
tests/integration_tests/test_collection_dml.pytests/integration_tests/test_collection_query.pytests/integration_tests/conftest.pytests/integration_tests/test_client_creation.pytests/integration_tests/test_empty_value_handling.pytests/integration_tests/test_default_embedding_function.pytests/integration_tests/test_collection_hybrid_search.pytests/integration_tests/test_collection_embedding_function.pytests/integration_tests/test_detect_db_type_and_version.pytests/integration_tests/test_offical_case.pytests/integration_tests/test_security_sql_injection.pytests/integration_tests/test_collection_get.pytests/integration_tests/test_collection_hybrid_search_builder_integration.pytests/integration_tests/test_fulltext_parser_config.pytests/integration_tests/test_admin_database_management.py
📚 Learning: 2025-12-22T12:45:51.412Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_collection_hybrid_search_builder_integration.py:127-132
Timestamp: 2025-12-22T12:45:51.412Z
Learning: In integration test Python files under tests/integration_tests, it is acceptable to use string interpolation to construct SQL INSERT statements for test data when the data is controlled and internal to the test. This pattern should only be used for trusted, test-only data and not with untrusted input; ensure the injection risk is mitigated by keeping inputs deterministic, non-user-supplied, and isolated to the test environment. If possible, prefer parameterized queries in non-test code, but in these tests, interpolate values selectively when you can guarantee safety.
Applied to files:
tests/integration_tests/test_collection_dml.pytests/integration_tests/test_collection_query.pytests/integration_tests/conftest.pytests/integration_tests/test_client_creation.pytests/integration_tests/test_empty_value_handling.pytests/integration_tests/test_default_embedding_function.pytests/integration_tests/test_collection_hybrid_search.pytests/integration_tests/test_collection_embedding_function.pytests/integration_tests/test_detect_db_type_and_version.pytests/integration_tests/test_offical_case.pytests/integration_tests/test_security_sql_injection.pytests/integration_tests/test_collection_get.pytests/integration_tests/test_collection_hybrid_search_builder_integration.pytests/integration_tests/test_fulltext_parser_config.pytests/integration_tests/test_admin_database_management.py
📚 Learning: 2025-12-22T12:33:52.528Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_admin_database_management.py:26-39
Timestamp: 2025-12-22T12:33:52.528Z
Learning: In tests/integration_tests/test_admin_database_management.py, using `isinstance(admin_client._server, pyseekdb.SeekdbEmbeddedClient)` to detect client type is acceptable. AdminClient should not expose a public `mode` property.
Applied to files:
tests/integration_tests/conftest.pytests/integration_tests/test_client_creation.pyexamples/simple_example.pytests/integration_tests/test_detect_db_type_and_version.pysrc/pyseekdb/client/admin_client.pysrc/pyseekdb/client/client_seekdb_embedded.pyexamples/complete_example.pysrc/pyseekdb/client/__init__.py
📚 Learning: 2025-12-22T12:44:57.424Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_collection_dml.py:36-36
Timestamp: 2025-12-22T12:44:57.424Z
Learning: In tests/integration_tests/test_collection_dml.py and similar test files, using `db_client._server._execute()` to run SQL commands directly (e.g., CREATE TABLE statements) is acceptable for integration tests that need custom table setup with specific vector index configurations.
Applied to files:
src/pyseekdb/client/client_seekdb_embedded.py
📚 Learning: 2025-12-22T12:33:44.844Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_admin_database_management.py:26-39
Timestamp: 2025-12-22T12:33:44.844Z
Learning: In tests/integration_tests/test_admin_database_management.py, it is acceptable to detect the client type with isinstance(admin_client._server, pyseekdb.SeekdbEmbeddedClient). Do not expose a public AdminClient 'mode' property; keep it private (e.g., _mode) or implement it without a public accessor. If needed, document the private attribute and ensure tests rely only on internal type checks rather than public configuration.
Applied to files:
tests/integration_tests/test_admin_database_management.py
🧬 Code graph analysis (24)
tests/integration_tests/test_collection_dml.py (2)
src/pyseekdb/client/admin_client.py (1)
get_collection(200-210)src/pyseekdb/client/client_base.py (2)
get_collection(209-221)get_collection(574-698)
src/pyseekdb/client/collection.py (1)
src/pyseekdb/client/client_base.py (1)
_collection_count(2918-2957)
demo/rag/seekdb_app.py (6)
demo/rag/seekdb_utils.py (3)
get_seekdb_client(8-15)get_database_stats(80-94)seekdb_query(97-127)demo/rag/llm.py (2)
get_llm_answer(14-82)get_llm_client(6-11)src/pyseekdb/client/admin_client.py (1)
get_collection(200-210)src/pyseekdb/client/client_base.py (2)
get_collection(209-221)get_collection(574-698)src/pyseekdb/client/collection.py (3)
name(63-65)embedding_function(88-90)get(364-428)demo/rag/embedding_function_factory.py (1)
create_embedding_function(145-159)
src/pyseekdb/client/embedding_function.py (3)
src/pyseekdb/client/collection.py (2)
embedding_function(88-90)dimension(73-75)demo/rag/embedding_function_factory.py (2)
dimension(49-52)dimension(113-116)tests/integration_tests/test_default_embedding_function.py (1)
dimension(183-184)
tests/integration_tests/test_client_creation.py (5)
src/pyseekdb/client/configuration.py (3)
HNSWConfiguration(40-59)Configuration(62-77)FulltextParserConfig(26-36)src/pyseekdb/client/admin_client.py (5)
create_collection(179-198)has_collection(220-222)list_collections(216-218)delete_collection(212-214)count_collection(245-247)src/pyseekdb/client/client_base.py (9)
create_collection(184-206)create_collection(359-572)has_collection(234-236)has_collection(794-814)list_collections(229-231)list_collections(722-778)delete_collection(224-226)delete_collection(700-720)count_collection(780-792)src/pyseekdb/client/meta_info.py (2)
CollectionNames(15-44)table_name(20-22)src/pyseekdb/client/client_seekdb_server.py (1)
_execute(101-116)
tests/integration_tests/test_empty_value_handling.py (6)
tests/integration_tests/conftest.py (1)
db_client(145-179)src/pyseekdb/client/meta_info.py (1)
collection_name(25-29)src/pyseekdb/client/admin_client.py (2)
get_or_create_collection(224-243)delete_collection(212-214)src/pyseekdb/client/client_base.py (3)
get_or_create_collection(816-864)delete_collection(224-226)delete_collection(700-720)src/pyseekdb/client/collection.py (4)
name(63-65)embedding_function(88-90)upsert(194-234)get(364-428)src/pyseekdb/client/embedding_function.py (1)
DefaultEmbeddingFunction(100-506)
tests/integration_tests/test_default_embedding_function.py (1)
src/pyseekdb/client/embedding_function.py (2)
dimension(162-164)dimension_of(78-97)
examples/simple_example.py (2)
src/pyseekdb/client/collection.py (2)
client(78-80)get(364-428)src/pyseekdb/client/__init__.py (1)
Client(80-194)
tests/integration_tests/test_detect_db_type_and_version.py (5)
tests/unit_tests/test_version.py (1)
test_version_comparison(14-42)src/pyseekdb/client/version.py (1)
Version(8-122)tests/integration_tests/conftest.py (2)
server_client(195-203)oceanbase_client(207-215)src/pyseekdb/client/base_connection.py (1)
is_connected(23-25)src/pyseekdb/client/client_seekdb_embedded.py (1)
is_connected(100-102)
tests/unit_tests/test_configuration.py (1)
src/pyseekdb/client/configuration.py (3)
Configuration(62-77)HNSWConfiguration(40-59)FulltextParserConfig(26-36)
tests/integration_tests/test_offical_case.py (2)
src/pyseekdb/client/admin_client.py (1)
get_or_create_collection(224-243)src/pyseekdb/client/client_base.py (1)
get_or_create_collection(816-864)
tests/integration_tests/test_security_sql_injection.py (5)
src/pyseekdb/client/collection.py (7)
get(364-428)client(78-80)name(63-65)add(103-149)update(151-191)upsert(194-234)query(276-362)tests/integration_tests/conftest.py (1)
db_client(145-179)src/pyseekdb/client/meta_info.py (1)
collection_name(25-29)src/pyseekdb/client/admin_client.py (1)
get_or_create_collection(224-243)src/pyseekdb/client/hybrid_search.py (1)
query(279-318)
tests/integration_tests/test_collection_get.py (4)
src/pyseekdb/client/embedding_function.py (1)
dimension(162-164)src/pyseekdb/client/meta_info.py (3)
collection_name(25-29)CollectionNames(15-44)table_name(20-22)src/pyseekdb/client/configuration.py (1)
HNSWConfiguration(40-59)tests/integration_tests/test_collection_embedding_function.py (1)
Simple3DEmbeddingFunction(16-38)
src/pyseekdb/client/configuration.py (1)
src/pyseekdb/client/collection.py (1)
distance(93-95)
src/pyseekdb/client/base_connection.py (2)
src/pyseekdb/client/client_seekdb_embedded.py (6)
_ensure_connection(70-91)is_connected(100-102)_cleanup(93-98)_execute(104-122)get_raw_connection(124-126)mode(129-130)src/pyseekdb/client/client_seekdb_server.py (6)
_ensure_connection(70-88)is_connected(97-99)_cleanup(90-95)_execute(101-116)get_raw_connection(118-120)mode(123-124)
src/pyseekdb/client/client_seekdb_server.py (5)
src/pyseekdb/client/base_connection.py (6)
_cleanup(28-30)is_connected(23-25)_execute(33-35)_ensure_connection(18-20)get_raw_connection(38-40)mode(44-46)src/pyseekdb/client/client_seekdb_embedded.py (10)
_cleanup(93-98)is_connected(100-102)_execute(104-122)_ensure_connection(70-91)get_raw_connection(124-126)mode(129-130)create_database(269-280)get_database(282-307)delete_database(309-320)list_databases(322-363)src/pyseekdb/client/admin_client.py (8)
create_database(51-59)create_database(126-128)get_database(62-73)get_database(130-132)delete_database(76-84)delete_database(134-136)list_databases(87-104)list_databases(138-145)src/pyseekdb/client/collection.py (1)
name(63-65)src/pyseekdb/client/database.py (1)
Database(8-55)
src/pyseekdb/client/client_seekdb_embedded.py (4)
src/pyseekdb/client/base_connection.py (3)
_cleanup(28-30)mode(44-46)_execute(33-35)src/pyseekdb/client/client_seekdb_server.py (4)
_cleanup(90-95)mode(123-124)get_database(175-207)_execute(101-116)src/pyseekdb/client/client_base.py (1)
_use_context_manager_for_cursor(1831-1840)src/pyseekdb/client/database.py (1)
Database(8-55)
src/pyseekdb/utils/embedding_functions/sentence_transformer_embedding_function.py (2)
src/pyseekdb/client/collection.py (2)
client(78-80)embedding_function(88-90)src/pyseekdb/client/embedding_function.py (1)
EmbeddingFunction(48-75)
demo/rag/embedding_function_factory.py (2)
src/pyseekdb/client/embedding_function.py (2)
EmbeddingFunction(48-75)dimension(162-164)tests/integration_tests/test_default_embedding_function.py (1)
dimension(183-184)
demo/rag/seekdb_utils.py (3)
src/pyseekdb/client/collection.py (7)
client(78-80)embedding_function(88-90)name(63-65)get(364-428)add(103-149)hybrid_search(430-523)query(276-362)src/pyseekdb/client/admin_client.py (2)
has_collection(220-222)get_or_create_collection(224-243)src/pyseekdb/client/client_base.py (3)
has_collection(234-236)has_collection(794-814)get_or_create_collection(816-864)
examples/complete_example.py (3)
src/pyseekdb/client/configuration.py (1)
HNSWConfiguration(40-59)src/pyseekdb/client/collection.py (9)
dimension(73-75)distance(93-95)client(78-80)name(63-65)embedding_function(88-90)add(103-149)update(151-191)query(276-362)get(364-428)src/pyseekdb/client/admin_client.py (1)
get_or_create_collection(224-243)
src/pyseekdb/client/__init__.py (2)
src/pyseekdb/client/configuration.py (3)
Configuration(62-77)HNSWConfiguration(40-59)FulltextParserConfig(26-36)src/pyseekdb/client/embedding_function.py (2)
EmbeddingFunction(48-75)DefaultEmbeddingFunction(100-506)
tests/integration_tests/test_fulltext_parser_config.py (3)
src/pyseekdb/client/configuration.py (3)
Configuration(62-77)HNSWConfiguration(40-59)FulltextParserConfig(26-36)src/pyseekdb/client/meta_info.py (2)
table_name(20-22)CollectionNames(15-44)src/pyseekdb/client/client_seekdb_server.py (1)
_execute(101-116)
tests/integration_tests/test_admin_database_management.py (3)
src/pyseekdb/client/admin_client.py (8)
create_database(51-59)create_database(126-128)get_database(62-73)get_database(130-132)delete_database(76-84)delete_database(134-136)list_databases(87-104)list_databases(138-145)src/pyseekdb/client/client_seekdb_embedded.py (4)
create_database(269-280)get_database(282-307)delete_database(309-320)list_databases(322-363)src/pyseekdb/client/client_seekdb_server.py (4)
create_database(154-173)get_database(175-207)delete_database(209-228)list_databases(230-278)
🪛 dotenv-linter (4.0.0)
demo/rag/.env.example
[warning] 25-25: [UnorderedKey] The COLLECTION_NAME key should go before the SEEKDB_DIR key
(UnorderedKey)
🪛 Ruff (0.14.10)
tests/integration_tests/test_collection_dml.py
45-45: f-string without any placeholders
Remove extraneous f prefix
(F541)
82-82: f-string without any placeholders
Remove extraneous f prefix
(F541)
97-97: f-string without any placeholders
Remove extraneous f prefix
(F541)
113-113: f-string without any placeholders
Remove extraneous f prefix
(F541)
131-131: f-string without any placeholders
Remove extraneous f prefix
(F541)
150-150: f-string without any placeholders
Remove extraneous f prefix
(F541)
162-162: f-string without any placeholders
Remove extraneous f prefix
(F541)
165-165: f-string without any placeholders
Remove extraneous f prefix
(F541)
172-172: f-string without any placeholders
Remove extraneous f prefix
(F541)
191-191: f-string without any placeholders
Remove extraneous f prefix
(F541)
src/pyseekdb/client/sql_utils.py
53-55: Prefer TypeError exception for invalid type
(TRY004)
53-55: Avoid specifying long messages outside the exception class
(TRY003)
demo/rag/seekdb_app.py
140-140: Do not catch blind exception: Exception
(BLE001)
155-155: Do not catch blind exception: Exception
(BLE001)
tests/integration_tests/test_collection_query.py
93-94: Possible SQL injection vector through string-based query construction
(S608)
125-125: f-string without any placeholders
Remove extraneous f prefix
(F541)
137-137: f-string without any placeholders
Remove extraneous f prefix
(F541)
146-146: f-string without any placeholders
Remove extraneous f prefix
(F541)
159-159: f-string without any placeholders
Remove extraneous f prefix
(F541)
170-170: f-string without any placeholders
Remove extraneous f prefix
(F541)
187-187: f-string without any placeholders
Remove extraneous f prefix
(F541)
207-207: f-string without any placeholders
Remove extraneous f prefix
(F541)
217-217: f-string without any placeholders
Remove extraneous f prefix
(F541)
228-228: f-string without any placeholders
Remove extraneous f prefix
(F541)
tests/integration_tests/conftest.py
65-65: Do not catch blind exception: Exception
(BLE001)
88-88: Do not catch blind exception: Exception
(BLE001)
119-119: Do not catch blind exception: Exception
(BLE001)
137-137: Do not catch blind exception: Exception
(BLE001)
171-171: Avoid specifying long messages outside the exception class
(TRY003)
214-214: Do not use bare except
(E722)
214-215: try-except-pass detected, consider logging the exception
(S110)
246-246: Avoid specifying long messages outside the exception class
(TRY003)
src/pyseekdb/client/embedding_function.py
95-97: Avoid specifying long messages outside the exception class
(TRY003)
253-255: try-except-pass detected, consider logging the exception
(S110)
253-253: Do not catch blind exception: Exception
(BLE001)
271-271: Do not catch blind exception: Exception
(BLE001)
275-275: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
289-289: Consider moving this statement to an else block
(TRY300)
291-291: Do not catch blind exception: Exception
(BLE001)
292-292: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
tests/integration_tests/test_client_creation.py
119-119: Do not catch blind exception: Exception
(BLE001)
123-124: try-except-pass detected, consider logging the exception
(S110)
123-123: Do not catch blind exception: Exception
(BLE001)
140-140: f-string without any placeholders
Remove extraneous f prefix
(F541)
145-145: f-string without any placeholders
Remove extraneous f prefix
(F541)
155-155: f-string without any placeholders
Remove extraneous f prefix
(F541)
199-199: f-string without any placeholders
Remove extraneous f prefix
(F541)
214-214: f-string without any placeholders
Remove extraneous f prefix
(F541)
243-243: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
284-284: Do not catch blind exception: Exception
(BLE001)
292-292: Do not catch blind exception: Exception
(BLE001)
tests/integration_tests/test_empty_value_handling.py
49-49: f-string without any placeholders
Remove extraneous f prefix
(F541)
54-54: Do not catch blind exception: Exception
(BLE001)
61-61: f-string without any placeholders
Remove extraneous f prefix
(F541)
80-80: Use explicit conversion flag
Replace with conversion flag
(RUF010)
97-97: Use explicit conversion flag
Replace with conversion flag
(RUF010)
97-97: Use explicit conversion flag
Replace with conversion flag
(RUF010)
105-105: Use explicit conversion flag
Replace with conversion flag
(RUF010)
109-109: f-string without any placeholders
Remove extraneous f prefix
(F541)
121-121: Use explicit conversion flag
Replace with conversion flag
(RUF010)
138-138: Use explicit conversion flag
Replace with conversion flag
(RUF010)
138-138: Use explicit conversion flag
Replace with conversion flag
(RUF010)
146-146: Use explicit conversion flag
Replace with conversion flag
(RUF010)
150-150: f-string without any placeholders
Remove extraneous f prefix
(F541)
192-192: f-string without any placeholders
Remove extraneous f prefix
(F541)
203-203: Use explicit conversion flag
Replace with conversion flag
(RUF010)
220-220: Use explicit conversion flag
Replace with conversion flag
(RUF010)
220-220: Use explicit conversion flag
Replace with conversion flag
(RUF010)
223-223: Use explicit conversion flag
Replace with conversion flag
(RUF010)
247-247: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
253-253: Use explicit conversion flag
Replace with conversion flag
(RUF010)
253-253: Use explicit conversion flag
Replace with conversion flag
(RUF010)
tests/integration_tests/test_default_embedding_function.py
42-42: f-string without any placeholders
Remove extraneous f prefix
(F541)
80-80: f-string without any placeholders
Remove extraneous f prefix
(F541)
84-84: f-string without any placeholders
Remove extraneous f prefix
(F541)
94-94: f-string without any placeholders
Remove extraneous f prefix
(F541)
118-118: Do not catch blind exception: Exception
(BLE001)
123-123: f-string without any placeholders
Remove extraneous f prefix
(F541)
138-138: Do not catch blind exception: Exception
(BLE001)
143-143: f-string without any placeholders
Remove extraneous f prefix
(F541)
159-159: Do not catch blind exception: Exception
(BLE001)
170-170: f-string without any placeholders
Remove extraneous f prefix
(F541)
180-180: f-string without any placeholders
Remove extraneous f prefix
(F541)
234-234: Unused method argument: input
(ARG002)
tests/integration_tests/test_collection_hybrid_search.py
19-19: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
34-34: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
140-141: Possible SQL injection vector through string-based query construction
(S608)
162-162: f-string without any placeholders
Remove extraneous f prefix
(F541)
213-213: f-string without any placeholders
Remove extraneous f prefix
(F541)
255-255: f-string without any placeholders
Remove extraneous f prefix
(F541)
293-293: f-string without any placeholders
Remove extraneous f prefix
(F541)
342-342: f-string without any placeholders
Remove extraneous f prefix
(F541)
388-388: f-string without any placeholders
Remove extraneous f prefix
(F541)
403-403: f-string without any placeholders
Remove extraneous f prefix
(F541)
418-418: f-string without any placeholders
Remove extraneous f prefix
(F541)
tests/integration_tests/test_collection_embedding_function.py
51-51: f-string without any placeholders
Remove extraneous f prefix
(F541)
68-69: try-except-pass detected, consider logging the exception
(S110)
68-68: Do not catch blind exception: Exception
(BLE001)
78-78: f-string without any placeholders
Remove extraneous f prefix
(F541)
96-97: try-except-pass detected, consider logging the exception
(S110)
96-96: Do not catch blind exception: Exception
(BLE001)
106-106: f-string without any placeholders
Remove extraneous f prefix
(F541)
127-128: try-except-pass detected, consider logging the exception
(S110)
127-127: Do not catch blind exception: Exception
(BLE001)
137-137: f-string without any placeholders
Remove extraneous f prefix
(F541)
164-164: f-string without any placeholders
Remove extraneous f prefix
(F541)
184-185: try-except-pass detected, consider logging the exception
(S110)
184-184: Do not catch blind exception: Exception
(BLE001)
194-194: f-string without any placeholders
Remove extraneous f prefix
(F541)
215-215: f-string without any placeholders
Remove extraneous f prefix
(F541)
219-219: Local variable created_collection is assigned to but never used
Remove assignment to unused variable created_collection
(F841)
240-241: try-except-pass detected, consider logging the exception
(S110)
240-240: Do not catch blind exception: Exception
(BLE001)
250-250: f-string without any placeholders
Remove extraneous f prefix
(F541)
254-254: Local variable created_collection is assigned to but never used
Remove assignment to unused variable created_collection
(F841)
273-274: try-except-pass detected, consider logging the exception
(S110)
273-273: Do not catch blind exception: Exception
(BLE001)
283-283: f-string without any placeholders
Remove extraneous f prefix
(F541)
299-300: try-except-pass detected, consider logging the exception
(S110)
299-299: Do not catch blind exception: Exception
(BLE001)
309-309: f-string without any placeholders
Remove extraneous f prefix
(F541)
313-313: Local variable created_collection is assigned to but never used
Remove assignment to unused variable created_collection
(F841)
330-331: try-except-pass detected, consider logging the exception
(S110)
330-330: Do not catch blind exception: Exception
(BLE001)
340-340: f-string without any placeholders
Remove extraneous f prefix
(F541)
361-362: try-except-pass detected, consider logging the exception
(S110)
361-361: Do not catch blind exception: Exception
(BLE001)
372-372: f-string without any placeholders
Remove extraneous f prefix
(F541)
tests/integration_tests/test_detect_db_type_and_version.py
41-41: f-string without any placeholders
Remove extraneous f prefix
(F541)
60-60: Possible binding to all interfaces
(S104)
64-64: f-string without any placeholders
Remove extraneous f prefix
(F541)
83-83: Possible binding to all interfaces
(S104)
87-87: f-string without any placeholders
Remove extraneous f prefix
(F541)
106-106: f-string without any placeholders
Remove extraneous f prefix
(F541)
125-125: Possible binding to all interfaces
(S104)
tests/integration_tests/test_security_sql_injection.py
79-79: Consider moving this statement to an else block
(TRY300)
81-81: Do not catch blind exception: Exception
(BLE001)
97-97: f-string without any placeholders
Remove extraneous f prefix
(F541)
127-127: f-string without any placeholders
Remove extraneous f prefix
(F541)
150-150: f-string without any placeholders
Remove extraneous f prefix
(F541)
170-170: f-string without any placeholders
Remove extraneous f prefix
(F541)
209-209: f-string without any placeholders
Remove extraneous f prefix
(F541)
254-254: f-string without any placeholders
Remove extraneous f prefix
(F541)
290-290: Do not catch blind exception: Exception
(BLE001)
299-299: f-string without any placeholders
Remove extraneous f prefix
(F541)
330-330: Do not catch blind exception: Exception
(BLE001)
337-337: f-string without any placeholders
Remove extraneous f prefix
(F541)
tests/integration_tests/test_collection_get.py
103-104: Possible SQL injection vector through string-based query construction
(S608)
tests/integration_tests/test_collection_hybrid_search_builder_integration.py
36-36: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
51-51: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
174-174: f-string without any placeholders
Remove extraneous f prefix
(F541)
220-220: f-string without any placeholders
Remove extraneous f prefix
(F541)
257-257: f-string without any placeholders
Remove extraneous f prefix
(F541)
293-293: f-string without any placeholders
Remove extraneous f prefix
(F541)
335-335: f-string without any placeholders
Remove extraneous f prefix
(F541)
377-377: f-string without any placeholders
Remove extraneous f prefix
(F541)
src/pyseekdb/client/hybrid_search.py
379-379: Avoid specifying long messages outside the exception class
(TRY003)
src/pyseekdb/client/configuration.py
57-59: Avoid specifying long messages outside the exception class
(TRY003)
src/pyseekdb/client/client_seekdb_server.py
195-195: Possible SQL injection vector through string-based query construction
(S608)
199-199: Avoid specifying long messages outside the exception class
(TRY003)
src/pyseekdb/client/client_seekdb_embedded.py
34-34: Unused method argument: kwargs
(ARG002)
140-140: Unused method argument: use_context_manager
(ARG002)
269-269: Unused method argument: tenant
(ARG002)
282-282: Unused method argument: tenant
(ARG002)
291-291: Possible SQL injection vector through string-based query construction
(S608)
295-295: Avoid specifying long messages outside the exception class
(TRY003)
309-309: Unused method argument: tenant
(ARG002)
326-326: Unused method argument: tenant
(ARG002)
src/pyseekdb/client/version.py
37-37: Avoid specifying long messages outside the exception class
(TRY003)
41-43: Avoid specifying long messages outside the exception class
(TRY003)
48-50: Avoid specifying long messages outside the exception class
(TRY003)
demo/rag/embedding_function_factory.py
101-101: Avoid specifying long messages outside the exception class
(TRY003)
157-159: Avoid specifying long messages outside the exception class
(TRY003)
demo/rag/seekdb_utils.py
21-21: Do not perform function call DefaultEmbeddingFunction in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
41-41: Avoid specifying long messages outside the exception class
(TRY003)
92-92: Do not catch blind exception: Exception
(BLE001)
examples/complete_example.py
137-137: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
169-169: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
demo/rag/llm.py
18-18: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
78-78: Consider moving this statement to an else block
(TRY300)
124-124: Consider moving this statement to an else block
(TRY300)
demo/rag/seekdb_insert.py
69-69: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
69-69: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
69-69: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
124-124: Do not catch blind exception: Exception
(BLE001)
tests/integration_tests/test_fulltext_parser_config.py
16-16: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
189-189: f-string without any placeholders
Remove extraneous f prefix
(F541)
src/pyseekdb/client/filters.py
19-26: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
29-29: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
32-32: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
412-412: Prefer next(iter(result["bool"].keys())) over single element slice
Replace with next(iter(result["bool"].keys()))
(RUF015)
tests/integration_tests/test_admin_database_management.py
77-77: f-string without any placeholders
Remove extraneous f prefix
(F541)
88-88: f-string without any placeholders
Remove extraneous f prefix
(F541)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: integration-test (server)
- GitHub Check: integration-test (embedded)
- GitHub Check: integration-test (oceanbase)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/pyseekdb/utils/embedding_functions/sentence_transformer_embedding_function.py (1)
83-87: Pre-existing bug:list(input)incorrectly handles single string input.While not introduced by this PR,
list(input)on line 84 will convert a single string like"Hello"into["H", "e", "l", "l", "o"]instead of["Hello"].Compare with
DefaultEmbeddingFunction.__call__inembedding_function.py(lines 488-490) which correctly handles this:if isinstance(input, str): input = [input]Consider addressing this in a follow-up:
Suggested fix
def __call__(self, input: Documents) -> Embeddings: + # Handle single string input + if isinstance(input, str): + input = [input] + embeddings = self._model.encode( - list(input), + input, convert_to_numpy=True, normalize_embeddings=self.normalize_embeddings, )
🧹 Nitpick comments (4)
src/pyseekdb/utils/embedding_functions/openai_base_embedding_function.py (1)
151-153: Useraise ... from eto preserve the exception chain.When re-raising a new exception within an
exceptblock, usefrom eto maintain the original traceback for better debugging.Proposed fix
except Exception as e: - raise RuntimeError( - f"Failed to determine embedding dimension via API call: {e}" - ) + raise RuntimeError( + f"Failed to determine embedding dimension via API call: {e}" + ) from etests/unit_tests/test_openai_embedding_function.py (2)
119-121: Remove extraneousfprefix from string without placeholders.Line 121 has an f-string with no placeholders.
Proposed fix
ef = OpenAIEmbeddingFunction(timeout=30, max_retries=3) assert ef is not None - print(f" Initialized with timeout and max_retries") + print(" Initialized with timeout and max_retries")
244-248: Minor cleanup: rename unused loop variable and remove extraneousfprefix.The loop variable
iis unused, and the f-string on line 247 has no placeholders.Proposed fix
- for i, emb in enumerate(embeddings): + for _i, emb in enumerate(embeddings): assert isinstance(emb, list) - assert len(emb) == len(embeddings[0]), ( - f"All embeddings should have same dimension" - ) + assert len(emb) == len(embeddings[0]), ( + "All embeddings should have same dimension" + )tests/unit_tests/test_qwen_embedding_function.py (1)
230-234: Minor cleanup: rename unused loop variable and remove extraneousfprefix.The loop variable
iis unused, and the f-string on line 233 has no placeholders.Proposed fix
- for i, emb in enumerate(embeddings): + for _i, emb in enumerate(embeddings): assert isinstance(emb, list) - assert len(emb) == len(embeddings[0]), ( - f"All embeddings should have same dimension" - ) + assert len(emb) == len(embeddings[0]), ( + "All embeddings should have same dimension" + )
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
.pre-commit-config.yamlsrc/pyseekdb/client/embedding_function.pysrc/pyseekdb/utils/embedding_functions/__init__.pysrc/pyseekdb/utils/embedding_functions/litellm_embedding_function.pysrc/pyseekdb/utils/embedding_functions/openai_base_embedding_function.pysrc/pyseekdb/utils/embedding_functions/openai_embedding_function.pysrc/pyseekdb/utils/embedding_functions/qwen_embedding_function.pysrc/pyseekdb/utils/embedding_functions/sentence_transformer_embedding_function.pytests/unit_tests/test_openai_embedding_function.pytests/unit_tests/test_qwen_embedding_function.py
✅ Files skipped from review due to trivial changes (2)
- src/pyseekdb/utils/embedding_functions/qwen_embedding_function.py
- src/pyseekdb/utils/embedding_functions/init.py
🚧 Files skipped from review as they are similar to previous changes (1)
- .pre-commit-config.yaml
🧰 Additional context used
🧬 Code graph analysis (5)
src/pyseekdb/utils/embedding_functions/sentence_transformer_embedding_function.py (2)
src/pyseekdb/client/collection.py (2)
client(78-80)embedding_function(88-90)src/pyseekdb/client/embedding_function.py (1)
EmbeddingFunction(48-75)
src/pyseekdb/utils/embedding_functions/openai_embedding_function.py (1)
src/pyseekdb/utils/embedding_functions/openai_base_embedding_function.py (1)
OpenAIBaseEmbeddingFunction(6-201)
src/pyseekdb/client/embedding_function.py (1)
src/pyseekdb/client/collection.py (1)
embedding_function(88-90)
tests/unit_tests/test_openai_embedding_function.py (3)
src/pyseekdb/utils/embedding_functions/openai_embedding_function.py (2)
OpenAIEmbeddingFunction(15-121)_get_model_dimensions(115-121)src/pyseekdb/utils/embedding_functions/openai_base_embedding_function.py (2)
dimension(123-160)_get_model_dimensions(112-120)src/pyseekdb/utils/embedding_functions/qwen_embedding_function.py (1)
_get_model_dimensions(113-119)
tests/unit_tests/test_qwen_embedding_function.py (3)
src/pyseekdb/utils/embedding_functions/qwen_embedding_function.py (1)
QwenEmbeddingFunction(16-119)src/pyseekdb/utils/embedding_functions/openai_base_embedding_function.py (1)
dimension(123-160)src/pyseekdb/client/embedding_function.py (1)
dimension(162-164)
🪛 Ruff (0.14.11)
src/pyseekdb/utils/embedding_functions/openai_base_embedding_function.py
151-153: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
151-153: Avoid specifying long messages outside the exception class
(TRY003)
src/pyseekdb/utils/embedding_functions/litellm_embedding_function.py
148-150: Avoid specifying long messages outside the exception class
(TRY003)
160-162: Avoid specifying long messages outside the exception class
(TRY003)
171-173: Avoid specifying long messages outside the exception class
(TRY003)
175-177: Avoid specifying long messages outside the exception class
(TRY003)
src/pyseekdb/client/embedding_function.py
95-97: Avoid specifying long messages outside the exception class
(TRY003)
271-271: Do not catch blind exception: Exception
(BLE001)
tests/unit_tests/test_openai_embedding_function.py
121-121: f-string without any placeholders
Remove extraneous f prefix
(F541)
244-244: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
247-247: f-string without any placeholders
Remove extraneous f prefix
(F541)
tests/unit_tests/test_qwen_embedding_function.py
230-230: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
233-233: f-string without any placeholders
Remove extraneous f prefix
(F541)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: integration-test (embedded)
- GitHub Check: integration-test (server)
- GitHub Check: integration-test (oceanbase)
🔇 Additional comments (16)
src/pyseekdb/utils/embedding_functions/openai_base_embedding_function.py (3)
85-85: LGTM!Formatting change consolidates the OpenAI client instantiation to a single line without altering behavior.
98-98: LGTM!Formatting simplification of the
NotImplementedErrormessage.
154-160: LGTM!The validation logic is correctly reformatted across multiple lines while preserving the same semantic checks.
src/pyseekdb/utils/embedding_functions/openai_embedding_function.py (2)
1-3: LGTM!Import reformatted to multi-line parenthesized form with trailing comma for cleaner diffs.
91-97: LGTM!Trailing comma added to
**kwargsfor consistent formatting.src/pyseekdb/utils/embedding_functions/litellm_embedding_function.py (3)
92-96: LGTM!The
isinstancecheck is reformatted for readability while preserving the validation logic.
117-117: LGTM!Dictionary construction collapsed to a single line for conciseness.
138-177: LGTM!Response handling block reformatted with consistent multi-line error messages. The detailed error messages including
type(item)andtype(response)are helpful for debugging unexpected API response formats.tests/unit_tests/test_openai_embedding_function.py (2)
12-12: LGTM!Blank line added for module-level formatting consistency.
21-377: LGTM!The remaining formatting changes (blank lines, wrapped print statements, parenthesized assertions) are consistent style normalization from pre-commit without any behavioral impact.
tests/unit_tests/test_qwen_embedding_function.py (3)
12-12: LGTM!Blank line added for module-level formatting consistency.
64-69: LGTM!Model list reformatted to multi-line for readability.
342-344: LGTM!Remaining formatting changes (blank lines, wrapped statements, parenthesized assertions) are consistent style normalization.
src/pyseekdb/client/embedding_function.py (2)
78-98: Defensive fallback for dimension inference looks good.The fallback logic to infer embedding dimension via a sample call is a reasonable approach for custom embedding functions that don't expose a
dimensionattribute.Minor observation: The condition on line 92
if test_embeddings and len(test_embeddings) > 0is slightly redundant—a non-empty list is truthy, soif test_embeddingswould suffice. However, the explicit length check adds clarity for readers.
181-190: The parenthesizedwithstatement syntax (lines 181-190) is compatible with the project's minimum Python version requirement of 3.11+.src/pyseekdb/utils/embedding_functions/sentence_transformer_embedding_function.py (1)
1-7: Formatting changes look good.Import reordering and multi-line formatting are consistent with the pre-commit style enforcement objectives of this PR.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Summary
Add pre-commit to unify styles in code
Solution Description
Primarily code formatting, requiring review of pre-commit.yml, Makefile, and ci.yml.
Usage
For every developer:
However, I did not add tools like ruff-check and ty for type checking to avoid further code review requirements. A separate pull request will be created for that.
Summary by CodeRabbit
Bug Fixes
Chores
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.