-
Notifications
You must be signed in to change notification settings - Fork 40
Description
I am using the meta-llama/Llama-2-70b-chat-hf model on a data frame with 3000 rows, each including a 500-token text. But after 10 rows is processed, I get the following error
` in call_llama2_api(self, messages)
79 def call_llama2_api(self, messages):
80 huggingface.prompt_builder = "llama2"
---> 81 response = huggingface.ChatCompletion.create(
82 model="meta-llama/Llama-2-70b-chat-hf",
83 messages=messages,
/usr/local/lib/python3.10/dist-packages/easyllm/clients/huggingface.py in create(messages, model, temperature, top_p, top_k, n, max_tokens, stop, stream, frequency_penalty, debug)
205 generated_tokens = 0
206 for _i in range(request.n):
--> 207 res = client.text_generation(
208 prompt,
209 details=True,
/usr/local/lib/python3.10/dist-packages/huggingface_hub/inference/_client.py in text_generation(self, prompt, details, stream, model, do_sample, max_new_tokens, best_of, repetition_penalty, return_full_text, seed, stop_sequences, temperature, top_k, top_p, truncate, typical_p, watermark, decoder_input_details)
1063 decoder_input_details=decoder_input_details,
1064 )
-> 1065 raise_text_generation_error(e)
1066
1067 # Parse output
/usr/local/lib/python3.10/dist-packages/huggingface_hub/inference/_text_generation.py in raise_text_generation_error(http_error)
472 raise IncompleteGenerationError(message) from http_error
473 if error_type == "overloaded":
--> 474 raise OverloadedError(message) from http_error
475 if error_type == "validation":
476 raise ValidationError(message) from http_error
OverloadedError: Model is overloaded`
Is there any solution to fix this problem, like increasing the rate limit?