keyword_extract_copy, support textrank keyword extract by LoskiClaw · Pull Request #224 · apache/hugegraph-ai

LoskiClaw · 2025-05-09T10:47:25Z

No description provided.

Copilot

Pull Request Overview

This PR introduces a new keyword extraction module with support for TextRank-based extraction alongside LLM-based extraction.

Adds a new module "keyword_extract_copy.py" with support for both English and Chinese text.
Implements language-specific pre-processing and includes basic test functions.

Comments suppressed due to low confidence (2)

hugegraph-llm/src/hugegraph_llm/operators/llm_op/keyword_extract_copy.py:159

Replace print statements with formal assertions in test cases to ensure that failures are automatically detected during testing.

print( any(k in ["processing", "language", "human"] for k in result["keywords"]))

hugegraph-llm/src/hugegraph_llm/operators/llm_op/keyword_extract_copy.py:173

Consider using assertion statements instead of print statements in the test function for a more robust and automated testing approach.

print( any(k in expected_keywords for k in result["keywords"]))

Copilot · 2025-05-09T10:54:14Z

hugegraph-llm/src/hugegraph_llm/operators/llm_op/keyword_extract_copy.py

+sys.path.append('/mnt/WD4T/workspace/hs/incubator-hugegraph-ai/hugegraph-llm/src')
+


Avoid using hardcoded absolute paths to modify the module search path; consider configuring paths through environment variables or project configuration to ensure portability.

Suggested change

sys.path.append('/mnt/WD4T/workspace/hs/incubator-hugegraph-ai/hugegraph-llm/src')

import os

# Dynamically determine the base directory of the project

base_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../.."))

sys.path.append(base_dir)

MrJs133 · 2025-05-14T07:01:35Z

Do we need to install gensim < 4.0.0 ?

LoskiClaw · 2025-05-14T07:05:24Z

Do we need to install gensim < 4.0.0 ?

yes, I use version 3.8.1.

MrJs133 · 2025-05-14T08:45:48Z

Do we need to install gensim < 4.0.0 ?

yes, I use version 3.8.1.

I want to run this code, but I'm having issues related to packages versions. What are the versions of scipy, numpy, and Python that you're using?

LoskiClaw · 2025-05-14T09:00:48Z

Do we need to install gensim < 4.0.0 ?

yes, I use version 3.8.1.

I want to run this code, but I'm having issues related to packages versions. What are the versions of scipy, numpy, and Python that you're using?

scipy=1.12.0,numpy=1.26.4, python=3.10.16

fix apache#224 problem, update new UI to support change keyword extracion method

imbajin · 2025-07-25T11:09:52Z

addressed by #282

BREAKING CHANGE **MUST** :UPDATE YOUR "KEYWORD EXTRACT PROMPT" To LATEST VERSION fix #224 problem, update the new UI to support change keyword extraction method. **Main changes** Added options to the RAG interface for selecting the keyword extraction method(including LLM, TextRank, Hybrid) and the max number of keywords. <img width="619" height="145" alt="QQ20250818-193453" src="https://github.com/user-attachments/assets/3c0d21f0-82bb-4176-bfe2-1b0744c06b6d" /> A 'TextRank mask words' setting has also been added. It allows users to manually input specific phrases composed of letters and symbols to prevent them from being split during word segmentation. And the input will also be saved. <img width="1207" height="263" alt="QQ20250818-193518" src="https://github.com/user-attachments/assets/6366789a-f87d-46a4-a85a-9f3b4d9ce9a5" /> **Test results** TextRank Method: -Input <img width="363" height="144" alt="image" src="https://github.com/user-attachments/assets/4a6267f7-3982-4fca-82df-60cd55bed6af" /> -Result: <img width="232" height="118" alt="image" src="https://github.com/user-attachments/assets/54a34d00-e588-44ad-9eff-d7281d7d93e5" /> Hybrid Method: <img width="710" height="129" alt="QQ20250818-193508" src="https://github.com/user-attachments/assets/541534fd-cec0-4002-9967-e49954a6c19e" /> --------- Co-authored-by: imbajin <jin@apache.org>

keyword_extract_copy, support textrank keyword extract

cacc119

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 9, 2025

github-actions bot added the llm label May 9, 2025

dosubot bot added the enhancement New feature or request label May 9, 2025

imbajin mentioned this pull request Mar 6, 2025

Support BM25(TF-IDF) / TextRank/LexRank / RAKE (Rapid Automatic Keyword Extraction) in keywords matching step #193

Closed

Merge branch 'main' into addtextrank

bf2e261

imbajin requested a review from Copilot May 9, 2025 10:53

Copilot AI reviewed May 9, 2025

View reviewed changes

Merge branch 'main' into addtextrank

08f1dd5

Gfreely added a commit to Gfreely/incubator-hugegraph-ai that referenced this pull request Jun 27, 2025

TextRank-fix

11c211d

fix apache#224 problem, update new UI to support change keyword extracion method

Gfreely mentioned this pull request Jun 27, 2025

feat(llm): update keyword extraction method (BREAKING CHANGE) #282

Merged

imbajin closed this Jul 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keyword_extract_copy, support textrank keyword extract#224

keyword_extract_copy, support textrank keyword extract#224
LoskiClaw wants to merge 3 commits intoapache:mainfrom
LoskiClaw:addtextrank

LoskiClaw commented May 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 9, 2025

Uh oh!

MrJs133 commented May 14, 2025

Uh oh!

LoskiClaw commented May 14, 2025

Uh oh!

MrJs133 commented May 14, 2025

Uh oh!

LoskiClaw commented May 14, 2025

Uh oh!

imbajin commented Jul 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		sys.path.append('/mnt/WD4T/workspace/hs/incubator-hugegraph-ai/hugegraph-llm/src')

-sys.path.append('/mnt/WD4T/workspace/hs/incubator-hugegraph-ai/hugegraph-llm/src')
+import os
+# Dynamically determine the base directory of the project
+base_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../.."))
+sys.path.append(base_dir)

Conversation

LoskiClaw commented May 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI May 9, 2025

Choose a reason for hiding this comment

Uh oh!

MrJs133 commented May 14, 2025

Uh oh!

LoskiClaw commented May 14, 2025

Uh oh!

MrJs133 commented May 14, 2025

Uh oh!

LoskiClaw commented May 14, 2025

Uh oh!

imbajin commented Jul 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants