Skip to content

A chrome extension that summarizes the audio from a chrome tab into text with support for multiple languages. It is primarily intended to take notes during a live session/lecture

Notifications You must be signed in to change notification settings

smitdesai1010/LectureNotes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LectureNotes

A chrome extension that summarizes audio from a chrome tab into text with support for multiple languages.
Get it on Chrome Web Store

This application was developed after my friends complained how difficult it is to take understand and take notes from a lecture at the same time and is primarily intended to take notes during a live session/lecture.

System-Design

Screenshots


Working

  • Upon intialization
    • The client registered itself with the server - uses short polling incase of a server error
    • Audio configration is sent to the server
    • The server creates a session corresponding to the client

  • Upon start recording
    • A WebSocket connection is opened with the server
    • The audio stream of the tab is captured by the background script - before it reaches to the speaker.
    • A Audio object plays the audio from the background script
    • Every 14 secs, a base64 string of the audio is sent to the server for transcription via Google speech to text.

  • Upon stop recording
    • The last buffered audio is sent to the server as a base64 string
    • Websocket connection is terminated
    • All audio streams are disconnected
    • Other objects are destroyed

  • Upon get notes
    • Server performs text summerization using DeepAI, sends back the response and resets transcription
    • Client converts the response into a blob and downloads it.

Optimizations

  • Increased performance by ~70ms by implementing a dual communication channel of http requests and web sockets.
    • Intially, a single Websocket connection was used for server-client communication and different event data were transmitted as JSON.
    • The audio is in uncompressed .wav format, A 14 sec base64 audio string is of size ~1.5Mb
    • To avoid json processing of a 1.5 Mb string, websockets now transmit the raw base64 audio string whereas http request are used for other type of events.

Future Planning

  • Use redis for storing session data
  • Implement a heartbeat mechanism for websockets to identify and terminate broken connections
  • Capture a tab close event to stop recording.
  • Make CORS opaque.
  • Use a compressed audio format like .mp3.
  • Add unit tests.

API Reference

Register a client

  POST /register
  Return: <text/plain> unique ID, used for further communication. 
Parameter Type Description
config json Configration of google speech to text api

Get notes

  POST /getNotes
  Return: <text/plain> Summarized text and transcription 
Parameter Type Description
ID Integer Required. ID returned from /register api

Send audio

  Websocket ws://${HOST}?ID=${clientID}&languageCode=${languageCode}
  Example: ws://LectureNotes:8080?ID=12&languageCode=hi-IN
  Note: Use native websockets
  
  Return: <text/plain> errors, if any
QueryString Parameter Type Description
ID Integer Required. ID returned from /register api
QueryString Parameter Type Description
language String Language code; Defaults to en-US
Websocket message Type Description
base64 string base64 encoded string of audio; must not exceed 10MB/15 secs

Run Locally

Install node and npm

https://nodejs.org/en/download/

Clone the project

  git clone https://github.com/smitdesai1010/LectureNotes.git

Add the following environment variables in a .env file located in ./Server folder

  DEEPAI_KEY

  GOOGLE_APPLICATION_CREDENTIALS

Go to the project directory

   cd Server
   npm install      //Install dependencies
   npm start        //starts server 

To start client (chrome-extension)

    Open Google chrome
    Click on "extension" > "Manage extensions"
    Click on "Load unpacked" > select the ./client/ folder in the project directory
    Click on "extensions" > "LectureNotes"

Acknowledgements

Feedback

If you have any feedback, please reach out to me at smitdesai1010@gmail.com

About

A chrome extension that summarizes the audio from a chrome tab into text with support for multiple languages. It is primarily intended to take notes during a live session/lecture

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published