-
Notifications
You must be signed in to change notification settings - Fork 0
Create Mapping Info from Official Docs to Knowledge Files #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kiyotis
wants to merge
12
commits into
develop
Choose a base branch
from
feature/issue-10-create-mapping-info
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This plan outlines the approach to create comprehensive mapping information that maps Nablarch official documentation files to nabledge knowledge files. The mapping will be used by skills to automatically generate knowledge files. Key points: - Script-based automation to process 1,400+ documentation files - Path-based categorization with exclusion/inclusion rules - Separate mappings for v6 and v5 - Validation and verification process Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addressed critical issues and design improvements: 1. Complete category taxonomy coverage - Added all categories from Issue #10: processing patterns, components, setup, guides, checks, about - Defined categories-vX.json structure with id, name, description, default_in_scope, type 2. Improved file scanning logic - RST: Regex to detect ==== pattern for title extraction - MD: First line starting with # as title - Archetype: Directory name as title, scan pom.xml + README.md 3. Expanded path-based categorization rules - Added http-messaging, handlers/common, adaptors, tool - Added setup categories (blank_project, archetype) - Added dev-guide patterns (filename-based matching) - Added check categories (published-api, deprecated, security) - Added about categories (about, migration) 4. Fixed out-of-scope verification process - Changed from "sample 10%" to "ALL files" per Issue #10 requirement - Emphasized preventing false negatives (in-scope marked as out-of-scope) 5. Updated file counts and deliverables - V6: 667 RST + 158 MD + 10 archetypes = 835 entries - V5: 772 RST + 9 archetypes = 781 entries - Removed speculative statistics, will generate actual distribution Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Completed Phase 1-2 and initial Phase 3:
1. Category definitions created
- categories-v6.json: 23 categories (processing patterns, components,
setup, guides, checks, about)
- categories-v5.json: 23 categories (same structure as v6)
2. Path-based categorization rules
- Exclusion patterns for out-of-scope files (Jakarta Batch, messaging, web)
- Inclusion patterns for in-scope files (batch, REST, handlers, libraries, etc.)
- Dev guide filename patterns for MD files
- Archetype patterns for Maven archetype projects
3. Source file scanning
- scan-sources.py: Extracts metadata from RST/MD/archetype files
- RST title extraction using ==== pattern detection
- MD title extraction from # headers
- Archetype handling with pom.xml + README.md
4. Categorization implementation
- apply-categorization.py: Applies path rules to all source files
- V6: 687 files (546 in scope, 141 out of scope, 0 needs review)
- V5: 792 files (649 in scope, 143 out of scope, 0 needs review)
Results:
- sources-v6.json: 687 source files with metadata
- sources-v5.json: 792 source files with metadata
- categorized-v6.json: All files categorized with in_scope determination
- categorized-v5.json: All files categorized with in_scope determination
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Completed Phase 3-6 for both v6 and v5: 1. Target mapping - map-targets.py: Maps categorized files to target knowledge file paths - Category-based directory mapping (processing, handlers, libraries, tools, etc.) - Generates target file paths for all in-scope files 2. Validation script - validate-mapping.sh: Comprehensive validation checks - Verifies 100% file coverage - Validates all category references - Ensures in-scope files have targets - Ensures out-of-scope files have exclusion reasons - Confirms statistics accuracy 3. Out-of-scope verification - generate-out-of-scope-report.py: Generates review documentation - Groups files by exclusion reason - Provides detailed file listings for manual review Results: - mapping-v6.json: 687 entries (546 in scope, 141 out of scope) - mapping-v5.json: 792 entries (649 in scope, 143 out of scope) - out-of-scope-v6.md: 141 files across 6 exclusion reasons - out-of-scope-v5.md: 143 files across 6 exclusion reasons - All validation checks passed (10/10) Exclusion reasons: - Jakarta Batch (JSR 352): 26 files (v6) - Web applications (JSP/UI): 86 files (v6) - Messaging (MOM/DB queue): 28 files (v6) - Test/tooling files: 1 file (v6) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Completed all phases of mapping info creation: 1. Documentation - README.md: Complete work log with statistics and usage instructions - Comprehensive statistics for v6 (687 files) and v5 (792 files) - Category distribution analysis - Validation results summary - Usage examples and next steps 2. Statistics Summary V6: 546 in scope (79.5%), 141 out of scope (20.5%) V5: 649 in scope (81.9%), 143 out of scope (18.1%) Total: 1,479 files with 100% coverage 3. Top Categories - library: 134 files (v6), 152 files (v5) - handler: 89 files (v6), 94 files (v5) - batch-nablarch: 67 files (v6), 70 files (v5) - rest: 45 files (v6), 48 files (v5) 4. Validation Status All 10 validation checks passed - 100% file coverage - All categories defined - All in-scope files have targets - All out-of-scope files have exclusion reasons - Statistics accuracy confirmed Implementation complete. Ready for: - Manual review of out-of-scope files - Integration into nabledge skills - Automated knowledge file generation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Generated table format for easier review: 1. Summary tables (mapping-summary-vX.md) - Simplified view: file path, status (✓/✗), categories - V6: 687 entries (96KB) - V5: 792 entries (109KB) 2. Full tables (mapping-table-vX.md) - Complete view: all columns including titles, reasons, targets - V6: 687 entries (151KB) - V5: 792 entries (171KB) Features: - Sorted by source file path (alphabetical order) - In-scope files marked with ✓ - Out-of-scope files marked with ✗ - Easy to scan for specific file paths - Can visually identify scope boundaries Usage: Open mapping-summary-vX.md for quick review of all files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixed bug where .lw directory was being skipped as hidden directory. Changes: - Modified scan-sources.py to only skip hidden directories within the scanned tree, not parent path components - Re-scanned all sources - Re-ran categorization, mapping, validation, and report generation Results: - V6: 687 → 845 files (+158 MD files from system development guide) - V6 in scope: 546 → 704 (+158) - All validation checks pass System development guide contents now included: - Implementation patterns (Nablarchでの非同期処理, Nablarchバッチ処理パターン) - Anti-patterns (Nablarchアンチパターン) - ArchUnit guides (ArchUnitガイド, 利用ガイド, 運用ガイド) - Checkstyle guides (Checkstyleガイド, ルール解説) - CI/CD documentation - Project setup guides - Categorized as: dev-guide-pattern, dev-guide-anti, dev-guide-project, dev-guide-other Note: Security matrix files (Excel format) exist but are not included in scan scope Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removed duplicate English versions when Japanese versions exist. Changes: - Added deduplicate-ja-en.py script - Deduplicated sources before categorization - Re-ran full pipeline (categorization, mapping, validation, reports) Results: V6: - Total files: 845 → 514 (-331 en duplicates) - In scope: 704 → 443 (-261) - Out of scope: 141 → 71 (-70) V5: - Total files: 792 → 453 (-339 en duplicates) - In scope: 649 → 381 (-268) - Out of scope: 143 → 72 (-71) Notes: - Kept ja versions when ja/en pair exists - Kept en version when no ja version exists (1 file in v6) - All validation checks pass - Unique content count matches actual documentation Statistics after deduplication: - V6: 514 unique files (443 in scope, 71 out of scope) - V5: 453 unique files (381 in scope, 72 out of scope) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Generated Excel workbooks with multiple sheets for easier review. Files generated: - mapping-v6.xlsx (63KB) - mapping-v5.xlsx (58KB) Excel features: 1. Multiple sheets: - Summary: Statistics overview with color coding - All Files: Complete mapping with all columns - In Scope: Only in-scope files (green header) - Out of Scope: Only out-of-scope files (red header) 2. User-friendly features: - Color coding: Green (in scope) / Red (out of scope) - Auto filters on all data sheets - Frozen header rows for easy scrolling - Optimized column widths - Sorted by file path (All Files, In Scope) - Sorted by reason then path (Out of Scope) 3. Easy filtering and sorting: - Filter by categories - Filter by reason for exclusion - Sort by any column - Search within columns Usage: Open mapping-v6.xlsx in Excel/LibreOffice for review Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 967 files (514 V6 + 453 V5) were reviewed by AI agents to assign proper processing pattern categories based on actual content, not just path patterns. Agent Review Process: - 10 specialized agents reviewed files in parallel - Each agent read actual file content to determine applicable patterns - 100 category additions made (53 V6 + 47 V5) Key Changes: - Universal libraries (database, validation, logging): Added to both batch and REST - Batch-specific libraries (data_io, format): Added to batch-nablarch - REST-specific libraries (jaxrs_access_log): Added to rest - Common handlers: Many support both batch and REST patterns Final Statistics: - V6: 62 batch-nablarch files, 56 REST files, 6 http-messaging files - V5: 57 batch-nablarch files, 55 REST files, 6 http-messaging files Agent review results preserved in *-review*.json files for auditability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR contains the implementation plan for creating comprehensive mapping information that maps Nablarch official documentation files to nabledge knowledge files.
Approach
Key Deliverables
Scope
In Scope:
Out of Scope:
Request for Review
Please review the implementation approach, especially:
🤖 Generated with Claude Code