GitHub - Symfa-Inc/ai-box: AI box. A server which provide audio and chat services

The AI-Box is an encapsulated solution general AI tasks including:

Generative AI. Function calling included (is in progress)
- Proxy to OpenAI
- The list of OpenSource models:
  - gorilla-openfunctions-v2
Audio processing including
- Segmentation
- Diarization
- Realtime audio stream processing (is in progress)

1. Building the Docker Image

docker build -t ai-box:latest .

2. Running the Docker Container from public repo

To run the Docker container, use the following command:

docker run -it \
  -v local_path_for_video_files:/usr/src/app/download \
  -p 8765:8765 \
  ghcr.io/symfa-inc/ai-box:latest

Mount your local directory to the container's /usr/src/app/download directory, so that the transcriber can access the files for processing.

2.1. Docker compose example:

version: "3.8"

services:
  aibox:
    image: ghcr.io/symfa-inc/ai-box:latest
    container_name: aibox
    ports:
      - "8765:8765"
    environment:
      SPEAKER: segmentation
      MODE: CPU
      QUALITY: LOW
      PARALLELISM: 1
    volumes:
      - local_path_for_video_files:/usr/src/app/download

2.2. Options (Environment variables)

You can change parameters of the server to find the optimal performance/quality comprise for you solution with the following parameters:

SPEAKER: segmentation or diarization. diarization is better but as it can say you who say, what and when. Segmentation is only split audio to segments with different speakers, however it can be better choice for CPU processing.
MODE: CPU or GPU.
QUALITY: - transcription quality level.
- DEBUG - not acceptable level of quality for most of cases. But can be useful for debug environments.
- LOW - the optimal level for CPU
- MEDIUM
- HIGH
PARALLELISM: Integer, default 1. How many files transcriber can process in parallel.

3. How to work with the server.

Request and Response Formats

All requests should follow the JSON format. Below is an example of a request to process a file:

{
  "file_path": "video.mp4",
  "speaker": "diarization",
  "mode": "cpu",
  "quality": "medium"
}

file_path (required): Specifies the path to the input file.
speaker: (optional): Specifies the desired processing mode, either diarization or segmentation.
mode (optional): Specifies the processing mode, either gpu or cpu.
quality(optional): Specifies the quality of processing, one of debug, low, medium, or high.

Responses from are also in JSON format. Below is an example response to the processing request:

{
  "type": "recording_processed",
  "file_name": "video.wav",
  "data": "{transcriptionText}"
}

type: Indicates the status of the processing. It can be "recording_queued" when the file is queued, "recording_processed" when the file is processed successfully, or "recording_errored" if processing failed.
file_name: Specifies the name of the input file.
data(optional): Contains additional information. If processing is successful, it returns the result of the processing. If processing fails, it returns an error message explaining the failure.

3.1 JavaScript

const WebSocket = require('ws');

// Connect to the WebSocket server
const ws = new WebSocket('ws://localhost:8765');

ws.on('open', function open() {
    console.log('Connected to server');

    // Send a request to prcess the file. Expectation that the video file 
    // is in the {local_path_for_video_files}
    ws.send(JSON.stringify({"file_path": "video.mp4"}));
});

ws.on('message', function incoming(message) {
    //{"data": "transcriptionText"}.
    console.log('Message from server: %s', message.data);
});

3.2 Python

import asyncio
import websockets
import json

async def talk():
    uri = "ws://localhost:8765"
    async with websockets.connect(uri) as websocket:
        print("Connected to server")
        message = {
            'file_path': "video.mp4",
            'speaker': "segmentation",
            'quality': "medium"
        }
        await websocket.send(json.dumps(message))

        response = await websocket.receive()
        response_json = json.loads(response)
        # {"result": "transcriptionText"}.
        print(f"Message from server: {response_json.data}")

asyncio.run(talk())

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
download		download
processed		processed
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
helpers.py		helpers.py
model.py		model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Building the Docker Image

2. Running the Docker Container from public repo

2.1. Docker compose example:

2.2. Options (Environment variables)

3. How to work with the server.

Request and Response Formats

3.1 JavaScript

3.2 Python

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Languages

Symfa-Inc/ai-box

Folders and files

Latest commit

History

Repository files navigation

1. Building the Docker Image

2. Running the Docker Container from public repo

2.1. Docker compose example:

2.2. Options (Environment variables)

3. How to work with the server.

Request and Response Formats

3.1 JavaScript

3.2 Python

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

Packages