Create Mapping Info from Official Docs to Knowledge Files #12

kiyotis · 2026-02-13T01:29:52Z

Summary

This PR contains the implementation plan for creating comprehensive mapping information that maps Nablarch official documentation files to nabledge knowledge files.

Approach

Script-based automation: Process 1,400+ documentation files efficiently using Python scripts
Path-based categorization: Automatic categorization with exclusion/inclusion rules
Separate mappings: Generate mappings for both v6 and v5
Validation process: Verify integrity and generate statistics

Key Deliverables

Python scripts for scanning, categorization, and target mapping
Category definitions (categories-v6.json, categories-v5.json)
Path rules configuration (path-rules.json)
Complete mappings (mapping-v6.json, mapping-v5.json)
Out-of-scope verification reports
Validation script

Scope

In Scope:

Nablarch Batch (on-demand): FILE to DB, DB to DB, DB to FILE
RESTful Web Services: JAX-RS support, REST API implementation

Out of Scope:

Jakarta Batch (JSR 352)
Resident Batch (Table Queue)
Web Applications (JSP/UI)
Messaging (MOM/DB Queue)

Request for Review

Please review the implementation approach, especially:

The script-based automation strategy
Path-based categorization rules
Mapping file structure
Validation approach

🤖 Generated with Claude Code

This plan outlines the approach to create comprehensive mapping information that maps Nablarch official documentation files to nabledge knowledge files. The mapping will be used by skills to automatically generate knowledge files. Key points: - Script-based automation to process 1,400+ documentation files - Path-based categorization with exclusion/inclusion rules - Separate mappings for v6 and v5 - Validation and verification process Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Addressed critical issues and design improvements: 1. Complete category taxonomy coverage - Added all categories from Issue #10: processing patterns, components, setup, guides, checks, about - Defined categories-vX.json structure with id, name, description, default_in_scope, type 2. Improved file scanning logic - RST: Regex to detect ==== pattern for title extraction - MD: First line starting with # as title - Archetype: Directory name as title, scan pom.xml + README.md 3. Expanded path-based categorization rules - Added http-messaging, handlers/common, adaptors, tool - Added setup categories (blank_project, archetype) - Added dev-guide patterns (filename-based matching) - Added check categories (published-api, deprecated, security) - Added about categories (about, migration) 4. Fixed out-of-scope verification process - Changed from "sample 10%" to "ALL files" per Issue #10 requirement - Emphasized preventing false negatives (in-scope marked as out-of-scope) 5. Updated file counts and deliverables - V6: 667 RST + 158 MD + 10 archetypes = 835 entries - V5: 772 RST + 9 archetypes = 781 entries - Removed speculative statistics, will generate actual distribution Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Completed Phase 1-2 and initial Phase 3: 1. Category definitions created - categories-v6.json: 23 categories (processing patterns, components, setup, guides, checks, about) - categories-v5.json: 23 categories (same structure as v6) 2. Path-based categorization rules - Exclusion patterns for out-of-scope files (Jakarta Batch, messaging, web) - Inclusion patterns for in-scope files (batch, REST, handlers, libraries, etc.) - Dev guide filename patterns for MD files - Archetype patterns for Maven archetype projects 3. Source file scanning - scan-sources.py: Extracts metadata from RST/MD/archetype files - RST title extraction using ==== pattern detection - MD title extraction from # headers - Archetype handling with pom.xml + README.md 4. Categorization implementation - apply-categorization.py: Applies path rules to all source files - V6: 687 files (546 in scope, 141 out of scope, 0 needs review) - V5: 792 files (649 in scope, 143 out of scope, 0 needs review) Results: - sources-v6.json: 687 source files with metadata - sources-v5.json: 792 source files with metadata - categorized-v6.json: All files categorized with in_scope determination - categorized-v5.json: All files categorized with in_scope determination Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Completed Phase 3-6 for both v6 and v5: 1. Target mapping - map-targets.py: Maps categorized files to target knowledge file paths - Category-based directory mapping (processing, handlers, libraries, tools, etc.) - Generates target file paths for all in-scope files 2. Validation script - validate-mapping.sh: Comprehensive validation checks - Verifies 100% file coverage - Validates all category references - Ensures in-scope files have targets - Ensures out-of-scope files have exclusion reasons - Confirms statistics accuracy 3. Out-of-scope verification - generate-out-of-scope-report.py: Generates review documentation - Groups files by exclusion reason - Provides detailed file listings for manual review Results: - mapping-v6.json: 687 entries (546 in scope, 141 out of scope) - mapping-v5.json: 792 entries (649 in scope, 143 out of scope) - out-of-scope-v6.md: 141 files across 6 exclusion reasons - out-of-scope-v5.md: 143 files across 6 exclusion reasons - All validation checks passed (10/10) Exclusion reasons: - Jakarta Batch (JSR 352): 26 files (v6) - Web applications (JSP/UI): 86 files (v6) - Messaging (MOM/DB queue): 28 files (v6) - Test/tooling files: 1 file (v6) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Completed all phases of mapping info creation: 1. Documentation - README.md: Complete work log with statistics and usage instructions - Comprehensive statistics for v6 (687 files) and v5 (792 files) - Category distribution analysis - Validation results summary - Usage examples and next steps 2. Statistics Summary V6: 546 in scope (79.5%), 141 out of scope (20.5%) V5: 649 in scope (81.9%), 143 out of scope (18.1%) Total: 1,479 files with 100% coverage 3. Top Categories - library: 134 files (v6), 152 files (v5) - handler: 89 files (v6), 94 files (v5) - batch-nablarch: 67 files (v6), 70 files (v5) - rest: 45 files (v6), 48 files (v5) 4. Validation Status All 10 validation checks passed - 100% file coverage - All categories defined - All in-scope files have targets - All out-of-scope files have exclusion reasons - Statistics accuracy confirmed Implementation complete. Ready for: - Manual review of out-of-scope files - Integration into nabledge skills - Automated knowledge file generation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Generated table format for easier review: 1. Summary tables (mapping-summary-vX.md) - Simplified view: file path, status (✓/✗), categories - V6: 687 entries (96KB) - V5: 792 entries (109KB) 2. Full tables (mapping-table-vX.md) - Complete view: all columns including titles, reasons, targets - V6: 687 entries (151KB) - V5: 792 entries (171KB) Features: - Sorted by source file path (alphabetical order) - In-scope files marked with ✓ - Out-of-scope files marked with ✗ - Easy to scan for specific file paths - Can visually identify scope boundaries Usage: Open mapping-summary-vX.md for quick review of all files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fixed bug where .lw directory was being skipped as hidden directory. Changes: - Modified scan-sources.py to only skip hidden directories within the scanned tree, not parent path components - Re-scanned all sources - Re-ran categorization, mapping, validation, and report generation Results: - V6: 687 → 845 files (+158 MD files from system development guide) - V6 in scope: 546 → 704 (+158) - All validation checks pass System development guide contents now included: - Implementation patterns (Nablarchでの非同期処理, Nablarchバッチ処理パターン) - Anti-patterns (Nablarchアンチパターン) - ArchUnit guides (ArchUnitガイド, 利用ガイド, 運用ガイド) - Checkstyle guides (Checkstyleガイド, ルール解説) - CI/CD documentation - Project setup guides - Categorized as: dev-guide-pattern, dev-guide-anti, dev-guide-project, dev-guide-other Note: Security matrix files (Excel format) exist but are not included in scan scope Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Removed duplicate English versions when Japanese versions exist. Changes: - Added deduplicate-ja-en.py script - Deduplicated sources before categorization - Re-ran full pipeline (categorization, mapping, validation, reports) Results: V6: - Total files: 845 → 514 (-331 en duplicates) - In scope: 704 → 443 (-261) - Out of scope: 141 → 71 (-70) V5: - Total files: 792 → 453 (-339 en duplicates) - In scope: 649 → 381 (-268) - Out of scope: 143 → 72 (-71) Notes: - Kept ja versions when ja/en pair exists - Kept en version when no ja version exists (1 file in v6) - All validation checks pass - Unique content count matches actual documentation Statistics after deduplication: - V6: 514 unique files (443 in scope, 71 out of scope) - V5: 453 unique files (381 in scope, 72 out of scope) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Generated Excel workbooks with multiple sheets for easier review. Files generated: - mapping-v6.xlsx (63KB) - mapping-v5.xlsx (58KB) Excel features: 1. Multiple sheets: - Summary: Statistics overview with color coding - All Files: Complete mapping with all columns - In Scope: Only in-scope files (green header) - Out of Scope: Only out-of-scope files (red header) 2. User-friendly features: - Color coding: Green (in scope) / Red (out of scope) - Auto filters on all data sheets - Frozen header rows for easy scrolling - Optimized column widths - Sorted by file path (All Files, In Scope) - Sorted by reason then path (Out of Scope) 3. Easy filtering and sorting: - Filter by categories - Filter by reason for exclusion - Sort by any column - Search within columns Usage: Open mapping-v6.xlsx in Excel/LibreOffice for review Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

All 967 files (514 V6 + 453 V5) were reviewed by AI agents to assign proper processing pattern categories based on actual content, not just path patterns. Agent Review Process: - 10 specialized agents reviewed files in parallel - Each agent read actual file content to determine applicable patterns - 100 category additions made (53 V6 + 47 V5) Key Changes: - Universal libraries (database, validation, logging): Added to both batch and REST - Batch-specific libraries (data_io, format): Added to batch-nablarch - REST-specific libraries (jaxrs_access_log): Added to rest - Common handlers: Many support both batch and REST patterns Final Statistics: - V6: 62 batch-nablarch files, 56 REST files, 6 http-messaging files - V5: 57 batch-nablarch files, 55 REST files, 6 http-messaging files Agent review results preserved in *-review*.json files for auditability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kiyotis changed the base branch from main to develop February 13, 2026 01:35

kiyotis changed the title ~~Plan: Create Mapping Info from Official Docs to Knowledge Files~~ Create Mapping Info from Official Docs to Knowledge Files Feb 13, 2026

kiyotis and others added 11 commits February 13, 2026 11:17

Update README with deduplicated statistics

75f9e8a

Remove Markdown table files (replaced by Excel)

e075d14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Mapping Info from Official Docs to Knowledge Files #12

Create Mapping Info from Official Docs to Knowledge Files #12

Uh oh!

kiyotis commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Create Mapping Info from Official Docs to Knowledge Files #12

Are you sure you want to change the base?

Create Mapping Info from Official Docs to Knowledge Files #12

Uh oh!

Conversation

kiyotis commented Feb 13, 2026

Summary

Approach

Key Deliverables

Scope

Request for Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant