keyword_extract_copy, support textrank keyword extract#224
keyword_extract_copy, support textrank keyword extract#224LoskiClaw wants to merge 3 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a new keyword extraction module with support for TextRank-based extraction alongside LLM-based extraction.
- Adds a new module "keyword_extract_copy.py" with support for both English and Chinese text.
- Implements language-specific pre-processing and includes basic test functions.
Comments suppressed due to low confidence (2)
hugegraph-llm/src/hugegraph_llm/operators/llm_op/keyword_extract_copy.py:159
- Replace print statements with formal assertions in test cases to ensure that failures are automatically detected during testing.
print( any(k in ["processing", "language", "human"] for k in result["keywords"]))
hugegraph-llm/src/hugegraph_llm/operators/llm_op/keyword_extract_copy.py:173
- Consider using assertion statements instead of print statements in the test function for a more robust and automated testing approach.
print( any(k in expected_keywords for k in result["keywords"]))
| sys.path.append('/mnt/WD4T/workspace/hs/incubator-hugegraph-ai/hugegraph-llm/src') | ||
|
|
There was a problem hiding this comment.
Avoid using hardcoded absolute paths to modify the module search path; consider configuring paths through environment variables or project configuration to ensure portability.
| sys.path.append('/mnt/WD4T/workspace/hs/incubator-hugegraph-ai/hugegraph-llm/src') | |
| import os | |
| # Dynamically determine the base directory of the project | |
| base_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../..")) | |
| sys.path.append(base_dir) |
|
Do we need to install gensim < 4.0.0 ? |
yes, I use version 3.8.1. |
I want to run this code, but I'm having issues related to packages versions. What are the versions of scipy, numpy, and Python that you're using? |
scipy=1.12.0,numpy=1.26.4, python=3.10.16 |
fix apache#224 problem, update new UI to support change keyword extracion method
|
addressed by #282 |
BREAKING CHANGE **MUST** :UPDATE YOUR "KEYWORD EXTRACT PROMPT" To LATEST VERSION fix #224 problem, update the new UI to support change keyword extraction method. **Main changes** Added options to the RAG interface for selecting the keyword extraction method(including LLM, TextRank, Hybrid) and the max number of keywords. <img width="619" height="145" alt="QQ20250818-193453" src="https://github.com/user-attachments/assets/3c0d21f0-82bb-4176-bfe2-1b0744c06b6d" /> A 'TextRank mask words' setting has also been added. It allows users to manually input specific phrases composed of letters and symbols to prevent them from being split during word segmentation. And the input will also be saved. <img width="1207" height="263" alt="QQ20250818-193518" src="https://github.com/user-attachments/assets/6366789a-f87d-46a4-a85a-9f3b4d9ce9a5" /> **Test results** TextRank Method: -Input <img width="363" height="144" alt="image" src="https://github.com/user-attachments/assets/4a6267f7-3982-4fca-82df-60cd55bed6af" /> -Result: <img width="232" height="118" alt="image" src="https://github.com/user-attachments/assets/54a34d00-e588-44ad-9eff-d7281d7d93e5" /> Hybrid Method: <img width="710" height="129" alt="QQ20250818-193508" src="https://github.com/user-attachments/assets/541534fd-cec0-4002-9967-e49954a6c19e" /> --------- Co-authored-by: imbajin <jin@apache.org>
No description provided.