[CFP] Is Col Pali the new OCR !?

Even today, document retrieval systems struggle with PDFs or scanned files that have complex layouts — think tables, charts, images, or multi-column structures. The standard approach involves OCR → layout detection → chunking → embedding → search. It works… but it’s clunky, brittle, and doesn’t scale well across real-world data.

ColPali introduces a new method: skip OCR completely. Instead, it uses a Vision-Language Model (VLM) to directly process the document image and generate multi-vector embeddings that capture both the content and the layout in a single pass.

This is particularly useful for documents where structure matters — contracts, forms, invoices, academic papers. ColPali performs better on these types of documents, as shown by the ViDoRe benchmark.

## Example scenarios:

- A user wants to search across scanned contracts for a clause that appears in a footnote or table.

- A company wants to make old regulatory PDFs searchable without reformatting or running OCR on thousands of pages.

- You’re building a chatbot that needs to retrieve information from visual documents like forms or handwritten PDFs.

- Traditional pipelines would require several fragile steps. ColPali simplifies this by doing everything — layout understanding, text encoding, and visual structure — in one shot using PaliGemma and a late interaction retrieval mechanism.

## In this session, I’ll walk through:

- The limitations of traditional OCR-based document retrieval

- ColPali’s architecture: patch-based visual encoding, MaxSim-based similarity, and embedding search

- How these components work together

- A real-time example 

## Key Takeaways from this talk
- Understand why OCR-based document retrieval breaks down in complex real-world scenarios

- Learn how ColPali uses vision-language models to represent documents as layout-aware embeddings

- See how multi-vector search improves retrieval performance

- Get a working sense of how to use ColPali in your own projects

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CFP] Is Col Pali the new OCR !? #196

Example scenarios:

In this session, I’ll walk through:

Key Takeaways from this talk

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[CFP] Is Col Pali the new OCR !? #196

Description

Example scenarios:

In this session, I’ll walk through:

Key Takeaways from this talk

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions