Switch audio transcription from openai-whisper to HuggingFace transformers by Copilot · Pull Request #535 · saebyn/glowing-telegram

Copilot · 2025-11-17T05:05:55Z

Migration from openai-whisper to Hugging Face transformers

Successfully switched from the prepackaged openai-whisper pip package to using the Hugging Face transformers library with the whisper-large-v3 model to address issues with long silences breaking/hanging transcription.

✅ Changes Completed:

Documentation Updates:

Added comprehensive documentation for EFS model caching including:

How EFS model caching works (file system, access point, volume mount, environment)
First run behavior and expected timing
Cached model structure
Troubleshooting guide for common issues:
- Mount target not found
- Job hanging during model download
- Permission denied errors
- Corrupted/incomplete models
Model reset options (delete specific model, clear all cache, force re-download)
EFS monitoring (storage size, CloudWatch metrics)
Comparison of deployment approaches

Original prompt

can you switch from using the prepackaged openai whisper pip package to using the model from https://huggingface.co/openai/whisper-large-v3 ? as this is suggested as something that doesn't have the same problems we're seeing now with long silences breaking/hanging the transcription

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

saebyn · 2025-11-28T21:59:14Z

@copilot 1. is there a better way to handle downloading the model rather than baking it into the docker image, given this image is used in an AWS Batch environment? 2. are there any good ways to provide the initial context to the model?

Copilot · 2025-11-28T22:05:52Z

@copilot 1. is there a better way to handle downloading the model rather than baking it into the docker image, given this image is used in an AWS Batch environment? 2. are there any good ways to provi...

Addressed in commit 9f98217:

1. Model downloading for AWS Batch:
Added DOWNLOAD_MODEL_AT_BUILD build arg (default: true). For AWS Batch with EFS:

# Build without baking model
docker buildx bake audio_transcriber --set audio_transcriber.args.DOWNLOAD_MODEL_AT_BUILD=false

# Mount EFS and set HF_HOME in job definition
{"environment": [{"name": "HF_HOME", "value": "/mnt/efs/models"}]}

Model downloads once on first run, then cached. See MIGRATION_GUIDE.md "AWS Batch Deployment Options" section for EFS/S3 details.

2. Initial prompt support:
Now implemented using processor.get_prompt_ids(initial_prompt) which converts the prompt text to token IDs that condition the model's generation. This helps with consistent terminology and style.

Co-authored-by: saebyn <185030+saebyn@users.noreply.github.com>

saebyn · 2025-11-28T23:43:31Z

@copilot can we update the CDK code that deploys this for use in Batch to provision and provide HF_HOME?

Co-authored-by: saebyn <185030+saebyn@users.noreply.github.com>

Copilot · 2025-11-28T23:50:23Z

@copilot can we update the CDK code that deploys this for use in Batch to provision and provide HF_HOME?

Updated CDK in commit 9e2f41f:

Created EFS file system in BatchEnvironmentConstruct for model caching
Added EFS volume mount at /mnt/efs/models in audio transcriber job
Set HF_HOME=/mnt/efs/models environment variable
Configured access point with correct POSIX permissions (UID 10001)
Added security group rules for NFS traffic from Batch instances
EFS has RETAIN policy and 30-day lifecycle for infrequently accessed files

First job downloads the ~3GB model to EFS, subsequent jobs reuse the cache.

saebyn · 2025-11-29T16:45:58Z

@copilot can you add documentation about how EFS is used, how to troubleshoot if there are issues with the job, and how to reset/redownload the model?

Co-authored-by: saebyn <185030+saebyn@users.noreply.github.com>

Copilot · 2025-11-29T16:48:34Z

@copilot can you add documentation about how EFS is used, how to troubleshoot if there are issues with the job, and how to reset/redownload the model?

Added comprehensive EFS documentation in commit 79e43fa to MIGRATION_GUIDE.md:

EFS Usage:

How model caching works (file system, access point, volume mount)
First run behavior and expected timing (~5-10 min initial download)
Cached model directory structure

Troubleshooting:

Mount target not found errors
Job hanging during model download
Permission denied errors (POSIX user mismatch)
Corrupted/incomplete model detection

Model Reset Options:

Delete specific model: rm -rf /mnt/efs/models/models--openai--whisper-large-v3/
Clear all cache: rm -rf /mnt/efs/models/*
Using EFS access point with mount helper
Force re-download via environment override

Monitoring:

Check EFS storage size via AWS CLI
CloudWatch metrics to monitor (BurstCreditBalance, ClientConnections, I/O)

Removed the '--system' flag from pipenv install command.

Copilot

Pull request overview

This PR migrates the audio transcription system from the openai-whisper pip package to HuggingFace's transformers library to address issues with long silences breaking or hanging transcription. The migration includes implementing EFS-based model caching for AWS Batch deployments to avoid baking the ~3GB model into Docker images.

Key Changes:

Replaced openai-whisper with HuggingFace transformers (whisper-large-v3 model)
Added EFS file system infrastructure for model caching in AWS Batch
Created Python wrapper script (whisper_hf.py) for CLI compatibility with existing Rust code
Added comprehensive migration guide with EFS troubleshooting documentation

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
cdk/lib/batch/environment.ts	Added EFS file system, security group, and access point for model caching
cdk/lib/batch/audioTranscriberJob.ts	Configured EFS volume mount and HF_HOME environment variable for batch jobs
cdk/lib/appStack.ts	Passed EFS resources to audio transcriber job construct
audio_transcriber/whisper_hf.py	New Python wrapper providing CLI-compatible interface to HuggingFace transformers
audio_transcriber/src/whisper.rs	Updated command invocation to call Python script instead of whisper CLI
audio_transcriber/download_model.py	Switched from openai-whisper to transformers for model downloading
audio_transcriber/Pipfile	Added transformers and dependencies
audio_transcriber/Pipfile.lock	Locked dependency versions for reproducible builds
Dockerfile	Updated to install dependencies via pipenv and support conditional model download
audio_transcriber/MIGRATION_GUIDE.md	Comprehensive documentation for migration, EFS setup, and troubleshooting

Copilot · 2025-12-24T19:56:06Z

Dockerfile

+
+# Conditionally download model at build time
+# If DOWNLOAD_MODEL_AT_BUILD=false, model will be downloaded at runtime
+RUN if [ "$DOWNLOAD_MODEL_AT_BUILD" = "true" ]; then python3 /app/download_model.py; fi


The conditional download logic uses shell string comparison, but the build argument should use the proper shell test. The current implementation with [ "$DOWNLOAD_MODEL_AT_BUILD" = "true" ] will fail if the variable is unset. Consider using [ "${DOWNLOAD_MODEL_AT_BUILD}" = "true" ] for safer variable expansion.

Suggested change

RUN if [ "$DOWNLOAD_MODEL_AT_BUILD" = "true" ]; then python3 /app/download_model.py; fi

RUN if [ "${DOWNLOAD_MODEL_AT_BUILD}" = "true" ]; then python3 /app/download_model.py; fi

Copilot · 2025-12-24T19:56:07Z

audio_transcriber/whisper_hf.py

+    # This conditions the model to generate text consistent with the prompt style
+    if initial_prompt:
+        try:
+            prompt_ids = processor.get_prompt_ids(initial_prompt, return_tensors="pt")


The prompt_ids should be moved to the correct device before being used in generation. When using CUDA, the prompt_ids tensor is created on CPU but needs to be moved to the GPU device. Add .to(device_str) after getting the prompt_ids to ensure they're on the correct device.

Suggested change

prompt_ids = processor.get_prompt_ids(initial_prompt, return_tensors="pt")

prompt_ids = processor.get_prompt_ids(initial_prompt, return_tensors="pt").to(device_str)

Copilot · 2025-12-24T19:56:07Z

cdk/lib/batch/audioTranscriberJob.ts

+      containerPath: '/mnt/efs/models',
+      fileSystem: props.modelCacheFileSystem,
+      accessPointId: props.modelCacheAccessPoint.accessPointId,
+      rootDirectory: '/',


The rootDirectory parameter is set to / which means the volume mount starts at the EFS root. However, the access point already specifies path: '/models' in environment.ts (line 60). This double configuration could lead to confusion about the actual path. When using an access point, the rootDirectory should typically be / (which is correct), but the documentation should clarify that the access point's path configuration takes precedence.

Copilot · 2025-12-24T19:56:07Z

audio_transcriber/whisper_hf.py

+    parser.add_argument("--output_format", default="json", help="Output format")
+    parser.add_argument("--output_dir", required=True, help="Output directory")
+    parser.add_argument("--task", default="transcribe", help="Task to perform")


The --task and --output_format arguments are defined but never used in the script. These should either be implemented or removed from the argument parser to maintain a clean API surface.

cdk/lib/batch/environment.ts

Copilot · 2025-12-24T19:56:08Z

cdk/lib/batch/environment.ts

+    // Create EFS for model caching (used by audio transcriber for HuggingFace models)
+    const modelCacheSecurityGroup = new ec2.SecurityGroup(this, 'ModelCacheSecurityGroup', {
+      vpc,
+      allowAllOutbound: true,


The EFS security group allows all outbound traffic but this is unnecessary for an EFS file system which only needs to accept inbound NFS connections. Consider removing allowAllOutbound: true to follow the principle of least privilege, as EFS doesn't initiate outbound connections.

Suggested change

allowAllOutbound: true,

allowAllOutbound: false,

Copilot · 2025-12-24T19:56:08Z

audio_transcriber/whisper_hf.py

+    parser = argparse.ArgumentParser(description="Transcribe audio using Hugging Face Whisper")
+    parser.add_argument("audio_file", help="Audio file to transcribe (use '-' for stdin)")
+    parser.add_argument("--model", required=True, help="Model name (e.g., 'large-v3')")
+    parser.add_argument("--model_dir", default="/model", help="Directory containing models")


The --model_dir argument is defined but never used in the script. The cache_dir parameter uses os.environ.get("HF_HOME", "/model") instead of args.model_dir. Either use the command-line argument or remove it from the argument parser to avoid confusion.

Copilot · 2025-12-24T19:56:09Z

audio_transcriber/whisper_hf.py

+            # Clean up temporary file
+            os.unlink(audio_path)


The temporary file cleanup in the finally block could fail if os.unlink() raises an exception (e.g., file already deleted or permission denied). Consider wrapping the cleanup in a try-except block to prevent masking the original exception from the transcription.

Suggested change

# Clean up temporary file

os.unlink(audio_path)

# Clean up temporary file without masking any prior exceptions

try:

os.unlink(audio_path)

except OSError as cleanup_err:

# Best-effort cleanup; log in verbose mode but do not raise

if verbose:

print(

f"Warning: failed to delete temporary file {audio_path}: {cleanup_err}",

file=sys.stderr,

)

Copilot · 2025-12-24T19:56:09Z

audio_transcriber/src/whisper.rs

                AudioTranscriberError::WhisperExecutableNotFoundError(
-                    "whisper executable not found in PATH".to_string(),
+                    "python3 or whisper_hf.py script not found".to_string(),
                )


The error message should be more specific. When python3 is not found versus when whisper_hf.py is missing, these are different issues requiring different troubleshooting. Consider checking for python3 availability first and providing a more helpful error message that distinguishes between missing Python interpreter and missing script file.

Copilot · 2025-12-24T19:56:09Z

Dockerfile


-RUN python3 /app/download_model.py
+WORKDIR /app
+RUN pipenv install --deploy


The pipenv install --deploy command should be run with the --system flag when installing in a Docker container. Without this flag, packages are installed in a virtual environment which may not be accessible to the Python interpreter when running scripts directly with python3. This could cause import errors at runtime.

Suggested change

RUN pipenv install --deploy

RUN pipenv install --deploy --system

Copilot AI assigned Copilot and saebyn Nov 17, 2025

Copilot started work on behalf of saebyn November 17, 2025 05:06 View session

Copilot AI changed the title ~~[WIP] Switch to Hugging Face Whisper model for transcription~~ Switch audio transcription from openai-whisper to HuggingFace transformers Nov 17, 2025

Copilot AI requested a review from saebyn November 17, 2025 05:23

Copilot finished work on behalf of saebyn November 17, 2025 05:23

Copilot started work on behalf of saebyn November 28, 2025 21:59 View session

Copilot finished work on behalf of saebyn November 28, 2025 22:07

Copilot AI and others added 4 commits November 28, 2025 15:42

Switch from openai-whisper to Hugging Face transformers library

5dd1716

Co-authored-by: saebyn <185030+saebyn@users.noreply.github.com>

Improve error handling and device detection in whisper_hf.py

6e648f6

Co-authored-by: saebyn <185030+saebyn@users.noreply.github.com>

Add migration guide documentation

00e83ee

Co-authored-by: saebyn <185030+saebyn@users.noreply.github.com>

Add runtime model download option and initial_prompt support

4a50b46

Co-authored-by: saebyn <185030+saebyn@users.noreply.github.com>

saebyn force-pushed the copilot/switch-to-huggingface-whisper branch from 9f98217 to 4a50b46 Compare November 28, 2025 23:42

Copilot started work on behalf of saebyn November 28, 2025 23:43 View session

Update CDK to provision EFS for HuggingFace model cache with HF_HOME

9e2f41f

Co-authored-by: saebyn <185030+saebyn@users.noreply.github.com>

Copilot finished work on behalf of saebyn November 28, 2025 23:51

Copilot started work on behalf of saebyn November 29, 2025 16:46 View session

Add comprehensive EFS documentation for troubleshooting and model reset

79e43fa

Co-authored-by: saebyn <185030+saebyn@users.noreply.github.com>

Copilot finished work on behalf of saebyn November 29, 2025 16:49

saebyn added 3 commits December 3, 2025 09:58

Use Pipfile for hf_whisper script

dd3ae52

Merge branch 'main' into copilot/switch-to-huggingface-whisper

a6a72cf

Update pipenv install command in Dockerfile

04a6052

Removed the '--system' flag from pipenv install command.

saebyn marked this pull request as ready for review December 24, 2025 19:49

Copilot AI review requested due to automatic review settings December 24, 2025 19:49

Copilot started reviewing on behalf of saebyn December 24, 2025 19:50 View session

Copilot AI reviewed Dec 24, 2025

View reviewed changes

	RUN if [ "$DOWNLOAD_MODEL_AT_BUILD" = "true" ]; then python3 /app/download_model.py; fi
	RUN if [ "${DOWNLOAD_MODEL_AT_BUILD}" = "true" ]; then python3 /app/download_model.py; fi

	prompt_ids = processor.get_prompt_ids(initial_prompt, return_tensors="pt")
	prompt_ids = processor.get_prompt_ids(initial_prompt, return_tensors="pt").to(device_str)

-            # Clean up temporary file
-            os.unlink(audio_path)
+            # Clean up temporary file without masking any prior exceptions
+            try:
+                os.unlink(audio_path)
+            except OSError as cleanup_err:
+                # Best-effort cleanup; log in verbose mode but do not raise
+                if verbose:
+                    print(
+                        f"Warning: failed to delete temporary file {audio_path}: {cleanup_err}",
+                        file=sys.stderr,
+                    )

	RUN pipenv install --deploy
	RUN pipenv install --deploy --system

Conversation

Copilot AI commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Migration from openai-whisper to Hugging Face transformers

✅ Changes Completed:

Documentation Updates:

Uh oh!

saebyn commented Nov 28, 2025

Uh oh!

Copilot AI commented Nov 28, 2025

Uh oh!

saebyn commented Nov 28, 2025

Uh oh!

Copilot AI commented Nov 28, 2025

Uh oh!

saebyn commented Nov 29, 2025

Uh oh!

Copilot AI commented Nov 29, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Nov 17, 2025 •

edited

Loading