Skip to content

soonhp/GraphRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 

Repository files navigation

Implementing Microsoft GraphRAG in Neo4j(GraphRAG)

1. DATA LOAD

source code : database_generate.py, data_load.py, community_summary.py

Details

Microsoft์‚ฌ์—์„œ ๋ฐœํ‘œํ•œ Community Detection์„ ํ™œ์šฉํ•œ GraphRAG ๋ฐฉ๋ฒ•๋ก ์„ Neo4j(Graph Database)๋ฅผ ํ†ตํ•ด ๊ตฌํ˜„ํ•˜์—ฌ ๋ชจ๋“ˆํ™”ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ฃผ์š” ๋‚ด์šฉ์€ LLMGraphTransformer ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ entity์™€ relation์„ ์ถ”์ถœํ•˜๊ณ  ์ดํ›„์— entity ๋…ธ๋“œ id ํ˜น์€ discription์˜ ํ…์ŠคํŠธ ์ž„๋ฒ ๋”ฉ ๊ฐ’์„ neo4j gds๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ KNN๊ณผ WCC(Weakly Connected Components)๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์œ ์‚ฌํ•œ ์—”ํ‹ฐํ‹ฐ๋Š” ๋ณ‘ํ•ฉ์„ ํ•ด์ฃผ๊ณ  Leiden ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์ปค๋ฎค๋‹ˆํ‹ฐ๋ฅผ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

How to Run Files

  1. database ์ƒ์„ฑ
python database_generate.py --DATABASE [name]

[name] ์€ ์˜ˆ๋ฅผ ๋“ค์–ด graphrag-doc-test ๋กœ ๋ณธ์ธ์ด ์ƒ์„ฑํ•˜๊ณ  ์‹ถ์€ DB ์ด๋ฆ„์„ ์ž…๋ ฅ.(๋”ฐ์˜ดํ‘œ ๋„ฃ์„ ํ•„์š” ์—†์Œ)

์‹œ์ž‘ํ•˜๊ธฐ ์•ž์„œ ์ฒซ ๋ฒˆ์งธ๋กœ DB๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์ด์œ ๋Š” database๋ฅผ ์„ค์ •์„ ์•ˆํ•ด์ฃผ๋ฉด default ๊ฐ’์ธ neo4j ๋กœ ์žกํžˆ๊ธฐ ๋•Œ๋ฌธ์— ํ…Œ์ŠคํŠธ ๋‹จ๊ณ„์—์„œ๋Š” ์ƒ์„ฑ์„ ํ•ด์ค˜์•ผ ํ•œ๋‹ค.

  1. data ์ „์ฒ˜๋ฆฌ ํ›„ DB ์ ์žฌ ๋ฐ Community Detection
python data_load.py --DATABASE [name] --DATA_PATH [File Directory]

File Directory ์— ๋ฐ์ดํ„ฐ(ํŒŒ์ผ) ๊ฒฝ๋กœ ์„ค์ •.

  1. ํƒ์ง€๋œ ์ปค๋ฎค๋‹ˆํ‹ฐ ๋ณ„ Summary ์ƒ์„ฑ
python community_summary.py --level1 [int] --level2 [int] --level3 [int]

[int]์—๋Š” ์–ด๋– ํ•œ level ๋‹จ์˜ summary๋ฅผ ์ƒ์„ฑํ•  ์ง€ ์„ค์ •

2. Retrieval

source code : retrieval.py

Details

1. DATA LOAD๋ฅผ ํ†ตํ•ด ๊ตฌ์„ฑ๋œ ์ปค๋ฎค๋‹ˆํ‹ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์–ด๋–ป๊ฒŒ Retrieval์„ ํ•  ์ง€์— ๋Œ€ํ•œ ๋‚ด์šฉ์ด ๋‹ด๊ฒจ์žˆ์Šต๋‹ˆ๋‹ค. Local๊ณผ Global๋กœ ๋‚˜๋ˆ„์–ด์„œ Retrieval์ด ์ง„ํ–‰๋˜๊ณ  ๋‘ ๋ฐฉ์‹์˜ ์•„ํ‚คํ…์ณ๊ฐ€ ๋‹ค๋ฅด๋‹ˆ ์ง์ ‘ ๋“ค์–ด๊ฐ€์„œ ํ™•์ธํ•ด๋ณด์‹œ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

ms_graphrag_import.ipynb ์™€ ms_graphrag_retriever.ipynb ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋น„์ •ํ˜• PDF ์˜ˆ์‹œ ๋ฌธ์„œ๋ฅผ 'graphrag-doc-02-ms' ์— ์ ์žฌํ•˜์˜€์œผ๋‹ˆ ์ฐธ๊ณ ๋ฐ”๋ž๋‹ˆ๋‹ค.

How to Run Files

  1. Retrieval & Answer Generation
python retrieval.py --index_name [vector_index_name] --question [query]

[vector_index_name] ์€ Neo4j๋ฅผ ํ†ตํ•ด ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์— ๋Œ€ํ•ด ์ธ๋ฑ์‹ฑํ•  ์ด๋ฆ„ ์„ค์ •

[query]๋Š” user query๋ฅผ ์ž…๋ ฅ

์ฐธ๊ณ ๋กœ Retrieval & Answer Generation ๋ฐฉ์‹์€ ์•„๋ž˜ ๊ทธ๋ฆผ์˜ Local retriever๋กœ ๋‹ต๋ณ€ ์ƒ์„ฑํ•จ.

Local retriever

Alt text

Local Search ๋ฐฉ๋ฒ•๋ก ์€ ์œ ์ € ์ฟผ๋ฆฌ๋กœ๋ถ€ํ„ฐ ์—”ํ‹ฐํ‹ฐ๋ฅผ ์ธ์‹ํ•˜์—ฌ ๊ทธ ์—”ํ‹ฐํ‹ฐ์™€ ์œ ์‚ฌํ•œ ์—”ํ‹ฐํ‹ฐ๊ฐ€ ํฌํ•จ๋œ 1. Source chunk, 2. Community Summary, 3. Entities, 4. Relationship description, 5. Covariate ๋ฅผ ๊ฐ€์ ธ์™€์„œ ์งˆ๋ฌธ์— ๋„์›€์ด ๋˜์ง€ ์•Š๋Š” ๋‹ต๋ณ€์„ ํ•„ํ„ฐ๋งํ•˜๊ณ  ๊ฐ ๊ธฐ์ค€์— ๋”ฐ๋ผ ์ ์ˆ˜๋ฅผ ๋งค๊ฒจ(Ranking) ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ํ† ํฐ ์ œํ•œ์— ๋„๋‹ฌํ•˜๊ธฐ ์ „๊นŒ์ง€ context๋ฅผ ๊ตฌ์„ฑํ•˜์—ฌ LLM์— ๋ณด๋‚ด ์ตœ์ข… ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

Global retriever

Alt text

Global Search ๋ฐฉ๋ฒ•๋ก ์€ ์šฐ์„  ๋ชจ๋“  Community Summary๋ฅผ ๋žœ๋คํ•˜๊ฒŒ ์„ž์€ ํ›„ LLM์—๊ฒŒ 0~100์ ๊นŒ์ง€ ์ ์ˆ˜๋ฅผ ๋งค๊ฒจ์„œ ๋„์›€์ด ๋˜์ง€ ์•Š๋Š”๋‹ค๊ณ  ํŒ๋‹จ๋œ Summary(์ ์ˆ˜ 0)๋Š” ํ•„ํ„ฐ๋งํ•ฉ๋‹ˆ๋‹ค. ์ด ํ›„ ์ƒ์„ฑ๋œ ์ค‘๊ฐ„ ์ปค๋ฎค๋‹ˆํ‹ฐ ๋‹ต๋ณ€(Rated Intermediate Response)๋“ค์„ ์œ ์šฉ์„ฑ ์ ์ˆ˜์— ๋”ฐ๋ผ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜๊ณ , ํ† ํฐ ์ œํ•œ์— ๋„๋‹ฌํ•  ๋•Œ๊นŒ์ง€ ๊ฐ€์žฅ ์œ ์šฉํ•œ ๋‹ต๋ณ€๋“ค์„ ์„ ํƒํ•˜์—ฌ ์ƒˆ๋กœ์šด context ์œˆ๋„์šฐ์— ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ตœ์ข… context๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ LLM์— ๋ณด๋‚ด ์œ ์ €์—๊ฒŒ ๊ธ€๋กœ๋ฒŒ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹์ด ์œ„ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด๊ณ  ๊ทธ์— ๋Œ€ํ•œ ์•„ํ‚คํ…์ฒ˜์ž…๋‹ˆ๋‹ค.

Local Search ์™€ Global Search๋กœ ๋‚˜๋ˆˆ ์ด์œ ๋Š” ๋ฏธ์‹œ์  / ๊ฑฐ์‹œ์  ์งˆ๋ฌธ์— ๋”ฐ๋ผ์„œ ์ทจ์‚ฌ ์„ ํƒ(Routing)ํ•˜์—ฌ ๋‹ต๋ณ€์„ ๋ฐ›์„ ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•จ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ œ๊ณต๋œ ๋ฐฉ๋ฒ•๋ก ์„ ์ž˜ ํ™œ์šฉํ•˜๋ฉด ์–ด๋– ํ•œ level ๋‹จ์˜ ์งˆ๋ฌธ์—๋„ ์ž˜ ๋‹ต๋ณ€ํ•  ์ˆ˜ ์žˆ๋Š” GraphRAG๋ฅผ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Reference

1. DATA LOAD : Implementing โ€˜From Local to Globalโ€™ GraphRAG with Neo4j and LangChain: Constructing the Graph

2. Retrieval : Integrating Microsoft GraphRAG into Neo4j

Paper : From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Microsoft Github Blog : Microsoft_GraphRAG

About

A repository that organizes what I has studied and applied about graphrag.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published