Adding summarization endpoint by manalilatkar · Pull Request #292 · IBM/project-ai-services

manalilatkar · 2026-02-05T09:44:02Z

This PR adds a separate microservice for summarization. Input to be summarized can be given by file or plaintext. A summary length has to be given as well.

Signed-off-by: Manali Latkar <manali.latkar@ibm.com>

manalilatkar · 2026-02-05T10:40:46Z

spyre-rag/src/summarization/backend_server.py

+MAX_WORDS_BEST_PERFORMANCE = 2000
+MAX_WORDS_DEGRADED_PERFORMANCE = 21500


These values come from the following calculation in the design doc:

mkumatag

Let us push the design doc somewhere in this repo itself so that it becomes handy while reviewing the PR

mkumatag · 2026-02-05T16:42:46Z

spyre-rag/src/common/settings.py

        default_max_input_length = 6000
        default_prompt_template_token_count = 250
+        default_max_summary_length = 1000
+        default_max_file_size_mb = 10


10MB is really huge file, have you tested with 10MB file size?

handling file size separately for pdf and text files. more details added in the proposal PR.

mkumatag · 2026-02-05T16:43:15Z

spyre-rag/src/common/settings.py

        default_temperature = 0.0
        default_max_input_length = 6000
        default_prompt_template_token_count = 250
+        default_max_summary_length = 1000


any logic behind this number?

Wanted a big enough number for our maximum input size which is 21500 words. 21500 words -> 1000 words = good summary. Anything larger than 1000 won't be a summary.

mkumatag · 2026-02-05T16:45:02Z

spyre-rag/src/common/settings.py

        default_prompt_template_token_count = 250
+        default_max_summary_length = 1000
+        default_max_file_size_mb = 10
+        default_summarization_temperature = 0.3


this can be 0.2

mkumatag · 2026-02-05T16:45:39Z

spyre-rag/src/common/settings.py

+        default_max_summary_length = 1000
+        default_max_file_size_mb = 10
+        default_summarization_temperature = 0.3
+        default_summarization_stop_words = "\n\n,Note,Word Count,Revised Summary"


what is the purpose of these stop words?

@mkumatag When I tested this, the LLM was responding with extra data that started with some of these stop words. When I started sending them, the response was always clean.

this could be because of the prompt: Summarize the following text in approximately {summary_length} words

we are supposed to take the summary length as input from the user. So we have to include it in the prompt for the output to stick to that limit.

mkumatag · 2026-02-05T16:53:19Z

spyre-rag/src/summarization/backend_server.py

+
+if __name__ == "__main__":
+    initialize_models()
+    uvicorn.run(app, host="0.0.0.0", port=8000)


how to override this port?

and also how user will run this service alone vs with rag service

Currently this needs to be run along with rag service as per Sebasitan's slide.
To run alone, this would require vllm as prerequisite, once we figure out the custom usecase/template problem, then we can address it I believe.

I'm talking about an individual summarisation service itself, may be we will explore post merging this..

mkumatag · 2026-02-05T16:54:41Z

spyre-rag/src/summarization/backend_server.py

@@ -0,0 +1,200 @@
+import os


add some readme in this directory

mkumatag · 2026-02-05T16:57:35Z

spyre-rag/src/summarization/backend_server.py

+
+    # Log warning if text exceeds the best performance threshold
+    if word_count > MAX_WORDS_BEST_PERFORMANCE:
+        logger.info(f"Input text exceeds maximum word limit of {MAX_WORDS_BEST_PERFORMANCE} words. Performance may be degraded.")


I'm not sure without displaying anything else and just be this logging message how much is going to helpful for the admin who is running this solution

@mkumatag should we send a "degraded_performance": True in the response?

I think we should log other details to identify the request like what is the type of content user has passed & word count

Q: What will be the admin/user action on this? IMO: I think we should consider collecting via some perf counter for monitoring purposes. Otherwise, exposing such information to end users won't be very helpful. If necessary, we should limit character count(input) so that we can be efficient and return an error if it exceeds.

when we say degraded_performance, do we mean qualitatively? isn't max_model_len = 32k should take care of this ?

mkumatag · 2026-02-05T17:13:31Z

spyre-rag/src/summarization/backend_server.py

+            detail="Server busy. Try again shortly."
+        )
+    try:
+        summary = query_vllm_completions(llm_endpoint, prompt, llm_model, settings.llm_max_tokens, settings.summarization_temperature)


llm_max_tokens is set to 512 tokens which is too less if you are considering of 1000 output words

@mkumatag you are right, I had changed the max_tokens while testing larger inputs, but missed adding the change here. Will probably come up with a max_tokens value based on the summary length given by the user.

dharaneeshvrd · 2026-02-06T14:32:57Z

spyre-rag/src/summarization/backend_server.py

+        file_content = file.file.read()
+
+        # Validate file size
+        if len(file_content) > MAX_FILE_SIZE_BYTES:


I think the file size should vary based on file type, .txt would consume less space compared to pdf.
Please do some exercise to come up with a valid file sizes.

have addressed the separate max file sizes in my document proposal.

dharaneeshvrd · 2026-02-06T14:34:53Z

spyre-rag/src/summarization/backend_server.py

+
+    # Log warning if text exceeds the best performance threshold
+    if word_count > MAX_WORDS_BEST_PERFORMANCE:
+        logger.info(f"Input text exceeds maximum word limit of {MAX_WORDS_BEST_PERFORMANCE} words. Performance may be degraded.")


I think we should log other details to identify the request like what is the type of content user has passed & word count

dharaneeshvrd · 2026-02-06T14:36:53Z

spyre-rag/src/summarization/backend_server.py

+    except requests.exceptions.RequestException as e:
+        concurrency_limiter.release()
+        error_details = str(e)
+        if e.response is not None:
+            error_details += f", Response Text: {e.response.text}"
+        logger.error(f"Error calling vLLM API: {error_details}")
+
+        raise HTTPException(
+            status_code=500,
+            detail=error_details
+        )
+    except Exception as e:
+        concurrency_limiter.release()
+        logger.error(f"Error calling vLLM API: {e}")
+        raise HTTPException(
+            status_code=500,
+            detail=str(e)
+        )


Can we move this block to llm_utils itself like how its handled in other query methods?

Adding summarization endpoint

3a0bc7e

Signed-off-by: Manali Latkar <manali.latkar@ibm.com>

manalilatkar force-pushed the summary_endpoint branch from d9e3136 to 3a0bc7e Compare February 5, 2026 10:34

manalilatkar requested review from dharaneeshvrd and mkumatag February 5, 2026 10:34

manalilatkar commented Feb 5, 2026

View reviewed changes

mkumatag requested changes Feb 5, 2026

View reviewed changes

dharaneeshvrd reviewed Feb 6, 2026

View reviewed changes

		MAX_WORDS_BEST_PERFORMANCE = 2000
		MAX_WORDS_DEGRADED_PERFORMANCE = 21500

Conversation

manalilatkar commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkumatag left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

manalilatkar commented Feb 5, 2026 •

edited

Loading