Merged
Conversation
- Upgraded the python-semantic-release action from v9.15.0 to v10.5.2 to leverage new features and improvements.
- Updated the LDB client to support a more flexible configuration input, allowing for `None` and dictionary types. - Introduced an enrichment registry for managing data sources and improved the access layer to return DataFrames. - Added a sentinel value in `LDBConfig` to differentiate between "not provided" and "None" for the API key. - Enhanced quota handling in the API client to support custom quotas and improved rate limiting logic for registered and anonymous users.
- Added .envrc and .vscode/ to the .gitignore to prevent tracking of environment configuration and IDE-specific files. - Included dev/ directory to ignore development-related files.
…uest handling - Updated API client methods across multiple modules to include new parameters for language, format, and conditional request headers. - Introduced centralized handling of API parameters and headers to streamline request preparation. - Enhanced list and get methods to support pagination and sorting options, improving data retrieval flexibility. - Updated documentation strings to reflect new parameters and usage examples for better clarity.
…t handling - Added a new `Format` enum to define supported response formats (JSON, JSONAPI, XML). - Updated `LDBConfig` to include a default response format, enhancing configuration flexibility. - Modified API client methods across various modules to utilize the new format handling, defaulting to the config settings. - Improved documentation to reflect changes in expected parameters for language and format in API methods.
…stency - Updated the parameter name from 'year' to 'years' across multiple API methods in the DataAPI, UnitsAPI, and VariablesAPI classes to better reflect that multiple years can be specified. - Adjusted corresponding documentation strings to ensure clarity regarding the new parameter name. - Enhanced consistency in parameter naming across the codebase.
- Introduced a new constant `DEFAULT_PAGE_SIZE` set to 100 for pagination. - Updated `LDBConfig` to include a `page_size` attribute, allowing customization of the default page size. - Enhanced environment variable handling to allow overriding the default page size, with error handling for invalid values. - Updated documentation to reflect the new `page_size` parameter in the configuration.
…gination logic - Eliminated the 'all_pages' parameter from DataAPI, SubjectsAPI, UnitsAPI, and VariablesAPI classes to simplify pagination handling. - Updated methods to use 'max_pages' for controlling pagination, with clear documentation on its usage. - Adjusted logic to fetch results based on 'max_pages' value, ensuring consistent behavior across API methods. - Enhanced documentation to clarify the new pagination approach and parameters.
- Introduced a new access layer for various API endpoints, including aggregates, attributes, data, levels, measures, subjects, units, variables, and years. - Each access class is designed to convert API responses into pandas DataFrames, enhancing data manipulation capabilities. - Added methods for listing and retrieving data, with support for pagination and metadata retrieval. - Improved documentation to clarify usage and functionality of the new access layer classes.
…lients - Introduced comprehensive end-to-end tests for access layer workflows, ensuring correct data retrieval and handling. - Added integration tests for various access classes, including AggregatesAccess, AttributesAccess, DataAccess, LevelsAccess, MeasuresAccess, SubjectsAccess, UnitsAccess, and VariablesAccess, validating their functionality with sample data. - Implemented unit tests for API clients, enhancing coverage for asynchronous and synchronous operations. - Included sample data files to support integration tests, ensuring realistic scenarios for testing. - Improved overall test structure and organization for better maintainability and clarity.
- Included MyST Notebook as a dependency for documentation. - Introduced custom test markers for unit, integration, and end-to-end tests to enhance test categorization and organization. - Updated dependencies
- Introduced detailed documentation for the new access layer, highlighting its features such as automatic DataFrame conversion, column name normalization, and nested data flattening. - Updated API clients documentation to clarify the distinction between the access layer and API layer, emphasizing the benefits of using the access layer for data analysis. - Added examples and usage scenarios to enhance user understanding and facilitate quick start with the library. - Included technical implementation details in the appendix for developers and power users.
…ethods - Changed the parameter name from `variable_id` to `variable_ids` in the `get_data_by_unit` and `aget_data_by_unit` methods to support multiple variable IDs as a list. - Updated corresponding documentation and test cases to reflect this change, ensuring consistency across the API. - Cleaned up unnecessary whitespace in several files for improved code readability.
Test Results (Python 3.13)426 tests +176 417 ✅ +167 4s ⏱️ ±0s Results for commit 75f84e6. ± Comparison against base commit defc104. This pull request removes 250 and adds 426 tests. Note that renamed tests count towards both. |
Test Results (Python 3.11)426 tests +176 417 ✅ +167 5s ⏱️ -1s Results for commit 75f84e6. ± Comparison against base commit defc104. This pull request removes 250 and adds 426 tests. Note that renamed tests count towards both. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📋 Summary
This PR introduces a comprehensive access layer that automatically converts API responses to pandas DataFrames, significantly improving the developer experience for data analysis workflows. The access layer sits on top of the existing API clients and provides automatic data normalization, column renaming, and type inference, making LDB data immediately ready for analysis. Additionally, this PR enhances API clients with improved parameter handling, pagination controls, and configuration flexibility.
🎯 Purpose & Context
The Local Data Bank (LDB) API returns data in JSON format with camelCase field names and nested structures, which requires manual conversion and normalization before analysis. This PR addresses this by introducing a dedicated access layer that:
This change enables users to work with LDB data more efficiently, reducing boilerplate code and making the library more accessible to data analysts and scientists.
🔧 Changes Made
Access Layer Implementation
pyldb.accessmodule: Introduced a complete access layer with classes for all API endpoints:AggregatesAccess,AttributesAccess,DataAccess,LevelsAccess,MeasuresAccess,SubjectsAccess,UnitsAccess,VariablesAccess,YearsAccessLDBclient now exposes both:ldb.levels,ldb.data, etc. → Returns DataFramesldb.api.levels,ldb.api.data, etc. → Returns raw dictionariesAPI Client Enhancements
year→yearsacross methods for consistency (supports multiple years)variable_id→variable_idsin data retrieval methods (supports lists)all_pagesparameter in favor ofmax_pagesfor clearer pagination controlFormatenum (JSON, JSONAPI, XML) for response format handlingpage_sizeparameter (default: 100) for paginated requestsConfiguration & Client Updates
LDBclient now acceptsNone,dict, orLDBConfiginstancespage_sizeand defaultformattoLDBConfigTesting Infrastructure
unit/,integration/, ande2e/directories@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e)Documentation
access_layer.rstdocumentationmain_client.rst,api_clients.rst, andconfig.rstexamples.ipynbwith practical usage examplesDependencies & Infrastructure
✅ Testing
Test Coverage
Test Execution
Manual Testing
Access layer DataFrame conversion:
API layer still returns raw dicts:
Parameter changes:
yearsparameter accepts listsvariable_idsparameter accepts listsmax_pagescontrols pagination correctly🚨 Breaking Changes & Migration Notes
Parameter Renames
year→years: Update calls toget_data_by_variable(),get_data_by_unit(), and related methodsvariable_id→variable_ids: Update calls toget_data_by_unit()andaget_data_by_unit()Removed Parameters
all_pagesparameter: Removed fromDataAPI,SubjectsAPI,UnitsAPI, andVariablesAPIMigration Path
ldb.api.*for raw dictionary access🔍 Review Focus Areas
Critical Review Points
BaseAccess._to_dataframe()_column_renamesmappings are correctly applied across all access classesmax_pageslogic correctly handles edge cases (None, 0, negative values)Performance Considerations
max_pagesparameterSecurity & Configuration
📦 Dependencies & Side Effects
New Dependencies
Updated Dependencies
Side Effects
page_sizeandformatconfig parameters (backward compatible)tests/unit/directory (does not affect runtime)🚀 Deployment Notes
Pre-Deployment Checklist
Post-Deployment
/docs/access_layer.htmldocs/examples.ipynbEnvironment Considerations
📊 Statistics