Skip to content

gyj155/SearchPaperByEmbedding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Paper Semantic Search

Find similar papers using semantic search. Supports both local models (free) and OpenAI API (better quality).

Features

  • Request for papers from OpenReview (e.g., ICLR2026 submissions)
  • Semantic search with example papers or text queries
  • Support embedding caching
  • Embed model support: Open-source (e.g., all-MiniLM-L6-v2) or OpenAI

Quick Start

pip install -r requirements.txt

1. Prepare Papers

from crawl import crawl_papers

crawl_papers(
    venue_id="ICLR.cc/2026/Conference/Submission",
    output_file="iclr2026_papers.json"
)

2. Search Papers

from search import PaperSearcher

# Local model (free)
searcher = PaperSearcher('iclr2026_papers.json', model_type='local')

# OpenAI model (better, requires API key)
# export OPENAI_API_KEY='your-key'
# searcher = PaperSearcher('iclr2026_papers.json', model_type='openai')

searcher.compute_embeddings()

# Search with example papers that you are interested in
examples = [
    {
        "title": "Your paper title",
        "abstract": "Your paper abstract..."
    }
]

results = searcher.search(examples=examples, top_k=100)

# Or search with text query
results = searcher.search(query="interesting topics", top_k=100)

searcher.display(results, n=10)
searcher.save(results, 'results.json')

How It Works

  1. Paper titles and abstracts are converted to embeddings
  2. Embeddings are cached automatically
  3. Your query is embedded using the same model
  4. Cosine similarity finds the most similar papers
  5. Results are ranked by similarity score

Cache

Embeddings are cached as cache_<filename>_<hash>_<model>.npy. Delete to recompute.

Example Output

================================================================================
Top 100 Results (showing 10)
================================================================================

1. [0.8456] Paper a
   #12345 | foundation or frontier models, including LLMs
   https://openreview.net/forum?id=xxx

2. [0.8234] Paper b
   #12346 | applications to robotics, autonomy, planning
   https://openreview.net/forum?id=yyy

Tips

  • Use 1-5 example papers for best results, or a paragraph of description of your interested topic
  • Local model is good enough for most cases
  • OpenAI model for critical search (~$1 for 18k queries)

If it's useful, please consider giving a star~

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages