The app integrates the video-chaptering model from the paper into CLAMS pipeline by specifying I/O to follow MMIF schema and delivering as a docker image.
The chapter-llama is a multimodal model for dividing an input video into continuous chapters where each chapter is described using few keywords. In general, the model consists of following components:
- An ASR model for automatic transcription for the video
- A LLM fine-tuned for generating (chapter) boundaries given the transcribed texts, and the boundaries are used for key-frame selection
- A video captioner that generates short descriptions for the keyframes
- Another fine-tuned LLM that eventually generates predicted boundaries in a key-value format where the key is timestamp and the value is the chapter title
The app is recommended to run as a container (Docker or Podman), below is how to run as Docker:
docker pull clamsproject/app-chapter-llama
docker run \
--device nvidia.com/gpu=all \
--network host \
--security-opt=label=disable \
-v <path-to-videos>:/data \
-e HF_TOKEN=<HuggingFace token> \
-e TRITON_LIBCUDA_PATH=/lib64/libcuda.so.1 \
# -e CUDA_VISIBLE_DEVICES=2 \
clamsproject/app-chapter-llama:0.1conda create -yn app-chapter-llama python=3.12
pip install -e ".[inference]"
pip install -r requirements.txtTo get a better sense of how to use CLAMS apps, please visit this site.
Since a CLAMS app is launched as a HTTP server, the input and any configurations are sent via curl. Thus, to use this app, run:
curl -X POST -d@<input.mmf> -s http://localhost:5000 > output.mmifWithout using the container, execute the script python app.py will launch the same HTTP server (in debug mode by default). Then, send the input .mmif request to the server as same as running a container.
[
{
"@type": "http://mmif.clams.ai/vocabulary/Chapter/v6",
"properties": {
"start": 0,
"end": 455000,
"title": "News Summary",
"id": "v_0:c_1"
}
},
{
"@type": "http://mmif.clams.ai/vocabulary/Chapter/v6",
"properties": {
"start": 455000,
"end": 1464000,
"title": "Army Sex Scandal",
"id": "v_0:c_2"
}
}
]