Skip to content

A Python tool that converts CIS benchmark PDF documents into structured data formats (CSV or Excel) for easier analysis.

License

Notifications You must be signed in to change notification settings

I-TRACING-ASO/CIS-Converter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CIS Converter

A Python tool that converts CIS benchmark PDF documents into structured data formats (CSV or Excel) for easier analysis.

Key Features

  • Intelligent PDF Parsing: Automatically extracts and structures CIS recommendation data from PDF documents
  • Multiple Output Formats: Supports both CSV and Excel (.xlsx) output formats
  • Table of Contents Processing: Leverages TOC to accurately identify and extract individual recommendations
  • Rich Data Extraction: Captures standard CIS fields
  • Excel Enhancements: Creates formatted Excel files with
    • Data validation dropdowns for compliance tracking (OK, KO, Partial, N/A, ?)
    • Conditional formatting for visual status indicators
    • Named tables for easy data manipulation
  • Batch Processing: Process multiple CIS benchmark PDFs in a single operation
  • Debugging Support: Comprehensive logging and debug output for troubleshooting

Installation

Prerequisites

  • Python 3.6 or higher
  • pip package manager

Install Dependencies

pip install -r requirements.txt

Required Dependencies

  • PyMuPDF: For PDF text extraction and processing
  • xlsxwriter: For Excel file generation (required only for Excel output format)

Usage

Basic Usage

Convert a CIS benchmark PDF to Excel format:

python cis-converter.py path/to/cis-benchmark.pdf

Convert to CSV format:

python cis-converter.py -f CSV path/to/cis-benchmark.pdf

Process multiple PDF files:

python cis-converter.py -f EXCEL -o output/ file1.pdf file2.pdf file3.pdf

Examples

Convert single PDF to Excel with custom output directory:

python cis-converter.py -f EXCEL -o ./results/ CIS_Ubuntu_Linux_20.04_Benchmark_v1.1.0.pdf

Convert to CSV with custom delimiter and debug logging:

python cis-converter.py -f CSV --csv-delimiter ";" -l DEBUG benchmark.pdf

Batch process multiple benchmarks:

python cis-converter.py -o ./compliance-data/ *.pdf

Output Structure

The tool extracts the following information from each CIS recommendation:

Field Description
Benchmark Source PDF filename
CIS # Recommendation number (e.g., 2.3.1.6)
Scored Scoring type (Scored/Not Scored/Manual/Automated)
Type Profile level (L1/L2) or applicability
Policy Recommendation title/name
Profile Applicability Target systems and environments
Description Detailed explanation of the control
Rationale Why this control is important
Audit Steps to verify compliance
Result Compliance status (for tracking)
Comments Additional notes (for tracking)
Remediation Steps to implement the control
Impact Potential effects of implementation
Default Value System default configuration
References Related documentation and resources
Additional Information Extra context and notes
CIS Controls Mapping to CIS Controls framework

Command Line Options

usage: cis-converter.py [-h] [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--debug-file DEBUG_FILE] [-f {CSV,EXCEL}]
                        [-o OUTPUT_DIR] [--csv-quoting {ALL,MINIMAL,NONNUMERIC,NONE,NOTNULL,STRINGS}]
                        [--csv-delimiter CSV_DELIMITER] [--csv-quotechar CSV_QUOTECHAR]
                        [--csv-escapechar CSV_ESCAPECHAR]
                        input_files [input_files ...]

positional arguments:
  input_files           path to the input file(s)

options:
  -h, --help            show this help message and exit
  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        set the logging level (default: INFO)
  --debug-file DEBUG_FILE
                        output file for TXT extract from PDF, only if --log-level=DEBUG (default: cis-debug.txt)
  -f {CSV,EXCEL}, --format {CSV,EXCEL}
                        set the output format (default: EXCEL)
  -o OUTPUT_DIR, --output-folder OUTPUT_DIR
                        path to the folder for storing files generated by the script (default: ./)

CSV options:
  --csv-quoting {ALL,MINIMAL,NONNUMERIC,NONE,NOTNULL,STRINGS}
                        set the CSV quoting style (default: ALL)
  --csv-delimiter CSV_DELIMITER
                        set the CSV delimiter (default: ,)
  --csv-quotechar CSV_QUOTECHAR
                        set the CSV quote character (default: ")
  --csv-escapechar CSV_ESCAPECHAR
                        set the CSV escape character (default: \)

Output Files

Excel Format (.xlsx)

  • Main Worksheet: Contains all extracted CIS recommendations with formatted columns
  • Data Worksheet: Provides validation lists for compliance tracking
  • Features:
    • Data validation dropdowns in the "Result" column
    • Conditional formatting with color-coded compliance status
    • Text wrapping and proper cell formatting
    • Named tables for easy filtering and sorting

CSV Format (.csv)

  • UTF-8 encoded with Byte Order Mark (BOM) for proper character display
  • Customizable delimiters and quoting options
  • Compatible with spreadsheet applications and data analysis tools

Troubleshooting

Common Issues

"Table of Contents could not be found"

  • Ensure the PDF follows standard CIS benchmark format
  • Check if the PDF has a proper Table of Contents section
  • Run with --log-level=DEBUG to see detailed extraction information

Missing or incomplete data

  • Some CIS PDFs may have formatting variations
  • Use debug mode to examine the extracted text: --log-level=DEBUG
  • Check the debug output file (default: cis-debug.txt) for parsing details

Debug Mode

Enable debug logging to troubleshoot parsing issues:

python cis-converter.py --log-level=DEBUG --debug-file=debug.txt input.pdf

This will create a detailed log file showing:

  • Raw text extraction from each page
  • Formatted and cleaned text
  • Table of contents parsing results
  • Section identification and data extraction

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for:

  • Bug fixes and improvements
  • Support for additional CIS benchmark formats
  • Enhanced parsing algorithms
  • New output formats

License

This project is licensed under the MIT License.

Acknowledgments

  • Based on the original CISConverter by Fragtastic
  • Enhanced for improved parsing performance and reliability

About

A Python tool that converts CIS benchmark PDF documents into structured data formats (CSV or Excel) for easier analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages