Skip to content

groupdocs-search/GroupDocs.Search-for-PythonNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

How to run GroupDocs.Search for .NET in Python

Product Page Docs Blog Free Support Temporary License

πŸ“– About This Repository

This repository demonstrates how to use GroupDocs.Search for .NET in Python applications using pythonnet. It provides two distinct implementation approaches to overcome the challenges of loading .NET assemblies with embedded dependencies in Python:

🎯 Two Implementation Approaches

1. Wrapper-Based Approach (run_search_wrapper.py)

  • Uses a custom C# wrapper library that encapsulates common search operations
  • Provides simplified static methods for building indexes and performing searches
  • Ideal for straightforward search tasks with minimal Python/.NET interop complexity
  • Best for: Quick prototyping, simple search workflows, and users who prefer high-level APIs

2. Manual Type Resolution Approach (run_search_manual.py)

  • Uses the wrapper only as a dependency resolver for embedded assemblies
  • Provides direct access to GroupDocs.Search types and methods
  • Offers full control over index creation and search customization
  • Best for: Complex search scenarios, advanced customization, and developers who need fine-grained control

Both approaches solve the core challenge of loading GroupDocs.Search's obfuscated and embedded dependencies in Python environments.

🚧 The Challenge: Dependency Resolution in Python

Why Direct Import Fails

GroupDocs.Search for .NET uses obfuscation and embedded dependencies to protect intellectual property. This creates a fundamental challenge when trying to use it directly with pythonnet:

# ❌ This approach WILL NOT work
import os
import sys

# Load coreclr first
from pythonnet import load
load("coreclr")

import clr

# Add folder with the library and dependencies to the system path
dll_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), "dlls"))
sys.path.append(dll_dir)

# Add reference to the library
clr.AddReference("GroupDocs.Search")
# Import the Index class
from GroupDocs.Search import Index
index = Index("search_index")
index.Add("documents_folder")

πŸ” Root Cause Analysis

The Problem: GroupDocs.Search embeds referenced assemblies (like Aspose.* libraries) directly into the main DLL with obfuscation. When pythonnet tries to load the assembly:

  1. Type Enumeration Phase: pythonnet attempts to enumerate all public types to build Python module proxies
  2. Dependency Resolution: During enumeration, the CLR tries to resolve embedded dependencies
  3. Failure Point: The default .NET assembly resolver cannot extract obfuscated, embedded DLLs from resources
  4. Result: ReflectionTypeLoadException is thrown, causing pythonnet to fail creating the Python module

Why This Happens:

  • Most obfuscators rely on a bootstrap/resolver that runs in your entry assembly
  • Since Python is the host (not a .NET executable), the bootstrap never executes
  • The embedded dependencies remain inaccessible to the standard .NET assembly resolver

πŸ’‘ The Solution

This repository provides two approaches to solve this challenge:

  1. Wrapper Library: A C# wrapper that handles dependency resolution and exposes simplified APIs
  2. Manual Resolution: Direct type resolution using reflection to bypass import issues

Both methods ensure the embedded dependencies are properly resolved before attempting to use GroupDocs.Search types.

What is GroupDocs.Search?

GroupDocs.Search for .NET is a comprehensive document search library that allows you to:

  • Create search indexes for 50+ document formats (PDF, Word, Excel, PowerPoint, images, etc.)
  • Perform full-text search across multiple document types simultaneously
  • Search with various query types: simple text, boolean queries, regular expressions, and fuzzy search
  • Get detailed search results with relevance scores, occurrence counts, and term highlighting
  • Work with multiple indexes for large-scale document collections
  • Customize search behavior with synonyms, stop words, and character replacements

Key Features:

  • Support for 50+ document formats
  • Multiple search query types (text, boolean, regex, fuzzy)
  • Cross-platform .NET support
  • High-performance indexing and searching
  • Flexible licensing options

What is PythonNet?

PythonNet is a package that provides near-seamless integration between Python and the .NET Common Language Runtime (CLR). It allows you to:

  • Call .NET assemblies directly from Python code
  • Use .NET types and methods as if they were native Python objects
  • Access the full .NET ecosystem from Python applications
  • Maintain performance with minimal overhead

Key Benefits:

  • Direct access to .NET libraries from Python
  • No need for separate .NET applications or services
  • Maintains Python's simplicity while leveraging .NET's power
  • Cross-platform support (Windows, Linux, macOS)

Official Repository: pythonnet/pythonnet

πŸ“‹ Prerequisites

System Requirements

  • Operating System: Windows 10/11 (x64), Linux, or macOS
  • Python: 3.8+ (recommended: 3.11 or 3.12)
  • .NET Runtime: .NET 6.0 or later
  • Memory: Minimum 4GB RAM (8GB+ recommended for large documents)
  • Disk Space: 500MB+ for dependencies and temporary files

Python ↔ pythonnet ↔ .NET Compatibility Matrix

Python Version pythonnet Version .NET Runtime Supported Target Frameworks Notes
3.7 – 3.10 2.5.x .NET Framework 4.6.2 – 4.8 net40, net45, net462, net48 βœ… Best for legacy .NET Framework DLLs (e.g., GroupDocs.Annotation net462)
Requires 64-bit Python + .NET Framework runtime
3.7 – 3.10 2.5.x Limited .NET Core 3.1 / .NET 5 Some .NET Standard 2.0 DLLs ⚠️ Unstable, not guaranteed
Use only if DLL explicitly supports .NET Standard
3.8 – 3.12 3.x (β‰₯3.0.0) .NET 6 / .NET 7 / .NET 8 net6.0, net7.0, net8.0, netstandard2.0/2.1 βœ… Best for modern .NET builds
Requires .NET Desktop Runtime 6+
3.13+ 3.x (β‰₯3.0.3) .NET 6 / .NET 7 / .NET 8 Same as above βœ… Supported
Recommended for latest Python versions

Recommended Configuration

For this repository, we recommend:

  • Python 3.11 with pythonnet 3.0.5
  • .NET 6.0 Desktop Runtime
  • Windows x64 environment

πŸš€ Getting Started / Installation

Step 1: Python Environment Setup

# Create Python 3.11 virtual environment
py -3.11 -m venv venv311

# Activate virtual environment (Windows)
venv311\Scripts\activate

# Verify Python version
python --version

Step 2: Install Dependencies

# Upgrade pip and essential tools
python -m ensurepip --upgrade
python -m pip install --upgrade pip setuptools wheel

# Install pythonnet 3.0.5
python -m pip install pythonnet==3.0.5

# Install project requirements
pip install -r requirements.txt

Step 3: Verify Installation

# Test pythonnet and .NET integration
import sys, clr
print("Python:", sys.version)
print("pythonnet imported OK:", clr.__version__)
clr.AddReference("System")
import System
print("CLR OK, .NET version:", System.Environment.Version)

Step 4: Build the Wrapper Library

# Navigate to wrapper directory
cd wrapper

# Build and publish the wrapper
dotnet publish -c Release -r win-x64 --self-contained false -o ./../dlls

# Return to root directory
cd ..

Step 5: Run the Examples

# Activate virtual environment (if not already active)
.venv\Scripts\activate

# Run wrapper-based approach
python run_search_wrapper.py

# Run manual type resolution approach
python run_search_manual.py

πŸ“ Repository Structure

GroupDocs.Search-for-PythonNet/
β”œβ”€β”€ πŸ“ lics/                         # put here the license GroupDocs.Search.lic file
β”œβ”€β”€ πŸ“ dlls/                         # Compiled .NET assemblies and dependencies
β”‚   β”œβ”€β”€ [GroupDocs.Search.dll]        # Main GroupDocs.Search library (not delivered in repository)
β”‚   β”œβ”€β”€ GroupDocs.Search.Wrapper.dll  # Custom wrapper library (not delivered in repository)
β”‚   └── [other dependencies]          # Additional .NET dependencies
β”œβ”€β”€ πŸ“ files/                        # Sample documents for testing
β”‚   β”œβ”€β”€ invoice.01.txt               # Input document for search indexing
β”‚   └── sample.api.01.json           # Additional sample document
β”œβ”€β”€ πŸ“ index/                        # Search index files (generated)
β”‚   β”œβ”€β”€ index.info                   # Index metadata
β”‚   └── [index files]                # Index data files
β”œβ”€β”€ πŸ“ wrapper/                      # C# wrapper library source code
β”‚   β”œβ”€β”€ SearchWrapper.cs             # Main wrapper implementation
β”‚   β”œβ”€β”€ GroupDocs.Search.Wrapper.csproj  # Project file
β”‚   └── bin/                         # Build output directory
β”œβ”€β”€ πŸ“„ run_search_wrapper.py         # Example: Wrapper-based approach
β”œβ”€β”€ πŸ“„ run_search_manual.py          # Example: Manual type resolution
β”œβ”€β”€ πŸ“„ requirements.txt              # Python dependencies
└── πŸ“„ README.md                     # This documentation

πŸ“‚ Folder Descriptions

Folder/File Purpose Contents
lics/ Licenses folder This repository does not contain any license.
dlls/ Compiled assemblies Contains all .NET DLLs required for runtime, including GroupDocs.Search and the custom wrapper
files/ Sample documents Test documents for search indexing examples (input documents)
index/ Search index Generated search index files for document searching
wrapper/ C# source code Custom wrapper library that simplifies GroupDocs.Search usage
run_search_wrapper.py Wrapper example Demonstrates simplified search using the wrapper library
run_search_manual.py Manual example Shows direct type resolution and advanced search control
requirements.txt Dependencies Python package requirements (pythonnet)

πŸ”§ Key Components

Wrapper Library (wrapper/SearchWrapper.cs)

  • Provides simplified static methods for common search tasks
  • Handles dependency resolution internally
  • Exposes high-level APIs for Python consumption

Python Examples

  • Wrapper approach: Simple, high-level API for basic search needs
  • Manual approach: Full control over index creation and search customization

πŸ’Ό Use Cases

🏒 Business Applications

Document Discovery & Knowledge Management

  • Legal firms: Search through contracts, agreements, and legal documents for specific clauses
  • Healthcare: Find patient records and medical documents using keywords and terms
  • Education: Search through course materials, research papers, and educational content
  • Real Estate: Locate property documents, contracts, and specifications using search terms

Enterprise Content Search

  • Manufacturing: Search technical documentation, specifications, and quality control documents
  • Financial Services: Find compliance documents, audit reports, and financial records
  • Government: Search policy documents, regulations, and administrative materials
  • Insurance: Locate claim documents, policy information, and risk assessments

Content Management & Publishing

  • Publishing houses: Search through manuscripts, research materials, and editorial content
  • Marketing agencies: Find campaign materials, brand guidelines, and creative assets
  • Technical writing: Search technical documentation and knowledge bases
  • Translation services: Find reference materials and translation glossaries

πŸ”§ Technical Use Cases

Automated Document Processing

  • Batch indexing: Process hundreds of documents and create searchable indexes
  • API integration: Add search capabilities as part of document processing workflows
  • Cloud services: Integrate search functionality into cloud-based applications
  • Microservices: Deploy search services as part of larger document processing systems

Custom Search Workflows

  • Form processing: Search through form submissions and responses
  • Report analysis: Find specific data and patterns in generated reports
  • Document comparison: Search for differences between document versions
  • Template matching: Find documents matching specific criteria or templates

🎯 This Repository Solves

The Core Problem: Python developers need to implement document search functionality but face challenges with:

  • Loading .NET libraries with embedded dependencies
  • Complex type resolution in pythonnet environments
  • Maintaining compatibility across different Python/.NET versions

Our Solution Provides:

  • βœ… Simplified Integration: Easy-to-use wrapper for common search tasks
  • βœ… Full Control: Direct access to all GroupDocs.Search features
  • βœ… Dependency Resolution: Automatic handling of embedded .NET dependencies
  • βœ… Cross-Platform: Works on Windows, Linux, and macOS
  • βœ… Production Ready: Tested approaches for real-world applications

⚠️ Important Notes

🚧 Early Implementation Status

This solution represents an early implementation for using GroupDocs.Search with pythonnet. While it successfully demonstrates both wrapper-based and manual type resolution approaches, please note:

Current Status:

  • βœ… Functional: Both implementation approaches work as demonstrated
  • βœ… Tested: Examples have been validated with basic search scenarios
  • ⚠️ Limited Testing: Not yet extensively tested across all GroupDocs.Search features
  • ⚠️ Production Readiness: Requires additional testing for production environments

πŸ” Recommended Next Steps

Before Production Use:

  1. Comprehensive Testing: Test with your specific document types and search requirements
  2. Performance Validation: Evaluate performance with large document collections and complex queries
  3. Error Handling: Implement robust error handling for edge cases
  4. Security Review: Ensure compliance with your security and data protection requirements

For Development:

  • Use the wrapper approach for quick prototyping and simple search tasks
  • Use the manual approach when you need full control over index properties and search customization
  • Consider extending the wrapper with additional methods for your specific use cases

🀝 Contributing & Feedback

We welcome your feedback, test results, and suggestions for improvements! Your input will help us:

  • Refine the implementation approaches
  • Add more comprehensive examples
  • Improve error handling and edge cases
  • Explore additional GroupDocs.Search features

How to Contribute:

  • Test the examples with your documents and use cases
  • Report any issues or unexpected behavior
  • Suggest additional wrapper methods or examples
  • Share your successful integration stories

Related Topics to Investigate

🏷️ Keywords

Core Technologies: pythonnet, GroupDocs.Search, .NET, Python, document search, CLR integration, assembly loading, dependency resolution

Document Processing: document search, full-text search, search indexing, document indexing, search queries, search results, document discovery, content search, enterprise search

Technical Implementation: wrapper library, type resolution, reflection, embedded dependencies, obfuscated assemblies, pythonnet integration, .NET interop, cross-platform

Business Applications: document discovery, knowledge management, enterprise search, content management, legal document search, healthcare documentation, educational search, technical documentation search

Development & Integration: API integration, microservices, cloud services, batch processing, automated workflows, search automation, enterprise solution, production deployment

About

How to run GroupDocs.Search product in Python with PythonNet

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published