This would include: - live transcription - voice activity detection - barge-in support / interruptible speech synthesis - etc