Skip to content

Add .en model support #4

@Marble879

Description

@Marble879

User story

As a user, I want to be able to use .en models, so that I can have a better transcription performance.

Acceptance criteria

  • The system should be able to download .en models if they do not already exist
  • The system should be able to utilize already downloaded .en models.

Development information

The model_handler.rs contains code responsible for downloading models based on their name.

The download of a model is as follows:

  1. Instantiate the model handler:
let m = model_handler::ModelHandler::new("tiny", "models/").await;
  1. The model handler then assigns the model name based on a hashmap:
const MODEL_MAP: phf::Map<&'static str, &'static str> = phf::phf_map! {
    "tiny" => "ggml-tiny",
    "base" => "ggml-base",
    "small" => "ggml-small",
    "medium" => "ggml-medium",
    "large" => "ggml-large",
};

impl ModelHandler {
    pub async fn new(model_name: &str, models_dir: &str) -> ModelHandler {
        let model_handler = ModelHandler {
            model_name: MODEL_MAP
                .get(&model_name.to_lowercase())
                .copied()
                .unwrap()
                .to_string(),
            models_dir: models_dir.to_string(),
        };
  1. The download function uses this name to download the model:
    async fn download_model(&self) -> Result<(), Box<dyn std::error::Error>> {
        if !self.is_model_existing() {
            self.setup_directory()?;
        }
        let base_url = "https://huggingface.co/ggerganov/whisper.cpp/resolve/main";
        let response = reqwest::get(format!("{}/{}.bin", base_url, &self.model_name)).await?;
        let mut file =
            std::fs::File::create(format!("{}/{}.bin", &self.models_dir, &self.model_name))?;
        let mut content = std::io::Cursor::new(response.bytes().await?);
        std::io::copy(&mut content, &mut file)?;
        Ok(())
    }

Potential solution

A possible solution would be to add the .en variant to the MODEL_MAP constant in the model_handler.rs file. As an example, if the user instantiates the ModelHandler with "tiny.en", a mapping should exist for: "tiny.en" => "ggml-tiny-en"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions