Skip to content

Conversation

@kiyotis
Copy link
Contributor

@kiyotis kiyotis commented Feb 13, 2026

Summary

This PR contains the implementation plan for creating comprehensive mapping information that maps Nablarch official documentation files to nabledge knowledge files.

Approach

  • Script-based automation: Process 1,400+ documentation files efficiently using Python scripts
  • Path-based categorization: Automatic categorization with exclusion/inclusion rules
  • Separate mappings: Generate mappings for both v6 and v5
  • Validation process: Verify integrity and generate statistics

Key Deliverables

  • Python scripts for scanning, categorization, and target mapping
  • Category definitions (categories-v6.json, categories-v5.json)
  • Path rules configuration (path-rules.json)
  • Complete mappings (mapping-v6.json, mapping-v5.json)
  • Out-of-scope verification reports
  • Validation script

Scope

In Scope:

  • Nablarch Batch (on-demand): FILE to DB, DB to DB, DB to FILE
  • RESTful Web Services: JAX-RS support, REST API implementation

Out of Scope:

  • Jakarta Batch (JSR 352)
  • Resident Batch (Table Queue)
  • Web Applications (JSP/UI)
  • Messaging (MOM/DB Queue)

Request for Review

Please review the implementation approach, especially:

  1. The script-based automation strategy
  2. Path-based categorization rules
  3. Mapping file structure
  4. Validation approach

🤖 Generated with Claude Code

This plan outlines the approach to create comprehensive mapping information
that maps Nablarch official documentation files to nabledge knowledge files.
The mapping will be used by skills to automatically generate knowledge files.

Key points:
- Script-based automation to process 1,400+ documentation files
- Path-based categorization with exclusion/inclusion rules
- Separate mappings for v6 and v5
- Validation and verification process

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kiyotis kiyotis changed the base branch from main to develop February 13, 2026 01:35
@kiyotis kiyotis changed the title Plan: Create Mapping Info from Official Docs to Knowledge Files Create Mapping Info from Official Docs to Knowledge Files Feb 13, 2026
kiyotis and others added 11 commits February 13, 2026 11:17
Addressed critical issues and design improvements:

1. Complete category taxonomy coverage
   - Added all categories from Issue #10: processing patterns, components,
     setup, guides, checks, about
   - Defined categories-vX.json structure with id, name, description,
     default_in_scope, type

2. Improved file scanning logic
   - RST: Regex to detect ==== pattern for title extraction
   - MD: First line starting with # as title
   - Archetype: Directory name as title, scan pom.xml + README.md

3. Expanded path-based categorization rules
   - Added http-messaging, handlers/common, adaptors, tool
   - Added setup categories (blank_project, archetype)
   - Added dev-guide patterns (filename-based matching)
   - Added check categories (published-api, deprecated, security)
   - Added about categories (about, migration)

4. Fixed out-of-scope verification process
   - Changed from "sample 10%" to "ALL files" per Issue #10 requirement
   - Emphasized preventing false negatives (in-scope marked as out-of-scope)

5. Updated file counts and deliverables
   - V6: 667 RST + 158 MD + 10 archetypes = 835 entries
   - V5: 772 RST + 9 archetypes = 781 entries
   - Removed speculative statistics, will generate actual distribution

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Completed Phase 1-2 and initial Phase 3:

1. Category definitions created
   - categories-v6.json: 23 categories (processing patterns, components,
     setup, guides, checks, about)
   - categories-v5.json: 23 categories (same structure as v6)

2. Path-based categorization rules
   - Exclusion patterns for out-of-scope files (Jakarta Batch, messaging, web)
   - Inclusion patterns for in-scope files (batch, REST, handlers, libraries, etc.)
   - Dev guide filename patterns for MD files
   - Archetype patterns for Maven archetype projects

3. Source file scanning
   - scan-sources.py: Extracts metadata from RST/MD/archetype files
   - RST title extraction using ==== pattern detection
   - MD title extraction from # headers
   - Archetype handling with pom.xml + README.md

4. Categorization implementation
   - apply-categorization.py: Applies path rules to all source files
   - V6: 687 files (546 in scope, 141 out of scope, 0 needs review)
   - V5: 792 files (649 in scope, 143 out of scope, 0 needs review)

Results:
- sources-v6.json: 687 source files with metadata
- sources-v5.json: 792 source files with metadata
- categorized-v6.json: All files categorized with in_scope determination
- categorized-v5.json: All files categorized with in_scope determination

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Completed Phase 3-6 for both v6 and v5:

1. Target mapping
   - map-targets.py: Maps categorized files to target knowledge file paths
   - Category-based directory mapping (processing, handlers, libraries, tools, etc.)
   - Generates target file paths for all in-scope files

2. Validation script
   - validate-mapping.sh: Comprehensive validation checks
   - Verifies 100% file coverage
   - Validates all category references
   - Ensures in-scope files have targets
   - Ensures out-of-scope files have exclusion reasons
   - Confirms statistics accuracy

3. Out-of-scope verification
   - generate-out-of-scope-report.py: Generates review documentation
   - Groups files by exclusion reason
   - Provides detailed file listings for manual review

Results:
- mapping-v6.json: 687 entries (546 in scope, 141 out of scope)
- mapping-v5.json: 792 entries (649 in scope, 143 out of scope)
- out-of-scope-v6.md: 141 files across 6 exclusion reasons
- out-of-scope-v5.md: 143 files across 6 exclusion reasons
- All validation checks passed (10/10)

Exclusion reasons:
- Jakarta Batch (JSR 352): 26 files (v6)
- Web applications (JSP/UI): 86 files (v6)
- Messaging (MOM/DB queue): 28 files (v6)
- Test/tooling files: 1 file (v6)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Completed all phases of mapping info creation:

1. Documentation
   - README.md: Complete work log with statistics and usage instructions
   - Comprehensive statistics for v6 (687 files) and v5 (792 files)
   - Category distribution analysis
   - Validation results summary
   - Usage examples and next steps

2. Statistics Summary
   V6: 546 in scope (79.5%), 141 out of scope (20.5%)
   V5: 649 in scope (81.9%), 143 out of scope (18.1%)
   Total: 1,479 files with 100% coverage

3. Top Categories
   - library: 134 files (v6), 152 files (v5)
   - handler: 89 files (v6), 94 files (v5)
   - batch-nablarch: 67 files (v6), 70 files (v5)
   - rest: 45 files (v6), 48 files (v5)

4. Validation Status
   All 10 validation checks passed
   - 100% file coverage
   - All categories defined
   - All in-scope files have targets
   - All out-of-scope files have exclusion reasons
   - Statistics accuracy confirmed

Implementation complete. Ready for:
- Manual review of out-of-scope files
- Integration into nabledge skills
- Automated knowledge file generation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Generated table format for easier review:

1. Summary tables (mapping-summary-vX.md)
   - Simplified view: file path, status (✓/✗), categories
   - V6: 687 entries (96KB)
   - V5: 792 entries (109KB)

2. Full tables (mapping-table-vX.md)
   - Complete view: all columns including titles, reasons, targets
   - V6: 687 entries (151KB)
   - V5: 792 entries (171KB)

Features:
- Sorted by source file path (alphabetical order)
- In-scope files marked with ✓
- Out-of-scope files marked with ✗
- Easy to scan for specific file paths
- Can visually identify scope boundaries

Usage: Open mapping-summary-vX.md for quick review of all files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixed bug where .lw directory was being skipped as hidden directory.

Changes:
- Modified scan-sources.py to only skip hidden directories within the scanned tree, not parent path components
- Re-scanned all sources
- Re-ran categorization, mapping, validation, and report generation

Results:
- V6: 687 → 845 files (+158 MD files from system development guide)
- V6 in scope: 546 → 704 (+158)
- All validation checks pass

System development guide contents now included:
- Implementation patterns (Nablarchでの非同期処理, Nablarchバッチ処理パターン)
- Anti-patterns (Nablarchアンチパターン)
- ArchUnit guides (ArchUnitガイド, 利用ガイド, 運用ガイド)
- Checkstyle guides (Checkstyleガイド, ルール解説)
- CI/CD documentation
- Project setup guides
- Categorized as: dev-guide-pattern, dev-guide-anti, dev-guide-project, dev-guide-other

Note: Security matrix files (Excel format) exist but are not included in scan scope

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removed duplicate English versions when Japanese versions exist.

Changes:
- Added deduplicate-ja-en.py script
- Deduplicated sources before categorization
- Re-ran full pipeline (categorization, mapping, validation, reports)

Results:
V6:
- Total files: 845 → 514 (-331 en duplicates)
- In scope: 704 → 443 (-261)
- Out of scope: 141 → 71 (-70)

V5:
- Total files: 792 → 453 (-339 en duplicates)
- In scope: 649 → 381 (-268)
- Out of scope: 143 → 72 (-71)

Notes:
- Kept ja versions when ja/en pair exists
- Kept en version when no ja version exists (1 file in v6)
- All validation checks pass
- Unique content count matches actual documentation

Statistics after deduplication:
- V6: 514 unique files (443 in scope, 71 out of scope)
- V5: 453 unique files (381 in scope, 72 out of scope)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Generated Excel workbooks with multiple sheets for easier review.

Files generated:
- mapping-v6.xlsx (63KB)
- mapping-v5.xlsx (58KB)

Excel features:
1. Multiple sheets:
   - Summary: Statistics overview with color coding
   - All Files: Complete mapping with all columns
   - In Scope: Only in-scope files (green header)
   - Out of Scope: Only out-of-scope files (red header)

2. User-friendly features:
   - Color coding: Green (in scope) / Red (out of scope)
   - Auto filters on all data sheets
   - Frozen header rows for easy scrolling
   - Optimized column widths
   - Sorted by file path (All Files, In Scope)
   - Sorted by reason then path (Out of Scope)

3. Easy filtering and sorting:
   - Filter by categories
   - Filter by reason for exclusion
   - Sort by any column
   - Search within columns

Usage: Open mapping-v6.xlsx in Excel/LibreOffice for review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 967 files (514 V6 + 453 V5) were reviewed by AI agents to assign
proper processing pattern categories based on actual content, not just
path patterns.

Agent Review Process:
- 10 specialized agents reviewed files in parallel
- Each agent read actual file content to determine applicable patterns
- 100 category additions made (53 V6 + 47 V5)

Key Changes:
- Universal libraries (database, validation, logging): Added to both batch and REST
- Batch-specific libraries (data_io, format): Added to batch-nablarch
- REST-specific libraries (jaxrs_access_log): Added to rest
- Common handlers: Many support both batch and REST patterns

Final Statistics:
- V6: 62 batch-nablarch files, 56 REST files, 6 http-messaging files
- V5: 57 batch-nablarch files, 55 REST files, 6 http-messaging files

Agent review results preserved in *-review*.json files for auditability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant