The AI-Box is an encapsulated solution general AI tasks including:
- Generative AI. Function calling included
(is in progress)- Proxy to OpenAI
- The list of OpenSource models:
- gorilla-openfunctions-v2
- Audio processing including
- Segmentation
- Diarization
- Realtime audio stream processing
(is in progress)
docker build -t ai-box:latest .To run the Docker container, use the following command:
docker run -it \
-v local_path_for_video_files:/usr/src/app/download \
-p 8765:8765 \
ghcr.io/symfa-inc/ai-box:latestMount your local directory to the container's /usr/src/app/download directory, so that the transcriber can access the files for processing.
version: "3.8"
services:
aibox:
image: ghcr.io/symfa-inc/ai-box:latest
container_name: aibox
ports:
- "8765:8765"
environment:
SPEAKER: segmentation
MODE: CPU
QUALITY: LOW
PARALLELISM: 1
volumes:
- local_path_for_video_files:/usr/src/app/downloadYou can change parameters of the server to find the optimal performance/quality comprise for you solution with the following parameters:
SPEAKER:segmentationordiarization.diarizationis better but as it can say you who say, what and when. Segmentation is only split audio to segments with different speakers, however it can be better choice forCPUprocessing.MODE:CPUorGPU.QUALITY: - transcription quality level.DEBUG- not acceptable level of quality for most of cases. But can be useful for debug environments.LOW- the optimal level for CPUMEDIUMHIGH
PARALLELISM: Integer, default1. How many files transcriber can process in parallel.
All requests should follow the JSON format. Below is an example of a request to process a file:
{
"file_path": "video.mp4",
"speaker": "diarization",
"mode": "cpu",
"quality": "medium"
}file_path(required): Specifies the path to the input file.speaker: (optional): Specifies the desired processing mode, eitherdiarizationorsegmentation.mode(optional): Specifies the processing mode, eithergpuorcpu.quality(optional): Specifies the quality of processing, one ofdebug,low,medium, orhigh.
Responses from are also in JSON format. Below is an example response to the processing request:
{
"type": "recording_processed",
"file_name": "video.wav",
"data": "{transcriptionText}"
}type: Indicates the status of the processing. It can be "recording_queued" when the file is queued, "recording_processed" when the file is processed successfully, or "recording_errored" if processing failed.file_name: Specifies the name of the input file.data(optional): Contains additional information. If processing is successful, it returns the result of the processing. If processing fails, it returns an error message explaining the failure.
const WebSocket = require('ws');
// Connect to the WebSocket server
const ws = new WebSocket('ws://localhost:8765');
ws.on('open', function open() {
console.log('Connected to server');
// Send a request to prcess the file. Expectation that the video file
// is in the {local_path_for_video_files}
ws.send(JSON.stringify({"file_path": "video.mp4"}));
});
ws.on('message', function incoming(message) {
//{"data": "transcriptionText"}.
console.log('Message from server: %s', message.data);
});import asyncio
import websockets
import json
async def talk():
uri = "ws://localhost:8765"
async with websockets.connect(uri) as websocket:
print("Connected to server")
message = {
'file_path': "video.mp4",
'speaker': "segmentation",
'quality': "medium"
}
await websocket.send(json.dumps(message))
response = await websocket.receive()
response_json = json.loads(response)
# {"result": "transcriptionText"}.
print(f"Message from server: {response_json.data}")
asyncio.run(talk())