A chrome extension that summarizes audio from a chrome tab into text with support for multiple languages.
Get it on Chrome Web Store
This application was developed after my friends complained how difficult it is to take understand and take notes from a lecture at the same time and is primarily intended to take notes during a live session/lecture.
- Upon intialization
- The client registered itself with the server - uses short polling incase of a server error
- Audio configration is sent to the server
- The server creates a session corresponding to the client
- Upon start recording
- A WebSocket connection is opened with the server
- The audio stream of the tab is captured by the background script - before it reaches to the speaker.
- A Audio object plays the audio from the background script
- Every 14 secs, a base64 string of the audio is sent to the server for transcription via Google speech to text.
- Upon stop recording
- The last buffered audio is sent to the server as a base64 string
- Websocket connection is terminated
- All audio streams are disconnected
- Other objects are destroyed
- Upon get notes
- Server performs text summerization using DeepAI, sends back the response and resets transcription
- Client converts the response into a blob and downloads it.
- Increased performance by ~70ms by implementing a dual communication channel of http requests and web sockets.
- Intially, a single Websocket connection was used for server-client communication and different event data were transmitted as JSON.
- The audio is in uncompressed .wav format, A 14 sec base64 audio string is of size ~1.5Mb
- To avoid json processing of a 1.5 Mb string, websockets now transmit the raw base64 audio string whereas http request are used for other type of events.
- Use redis for storing session data
- Implement a heartbeat mechanism for websockets to identify and terminate broken connections
- Capture a tab close event to stop recording.
- Make CORS opaque.
- Use a compressed audio format like .mp3.
- Add unit tests.
POST /register
Return: <text/plain> unique ID, used for further communication.
| Parameter | Type | Description |
|---|---|---|
| config | json | Configration of google speech to text api |
POST /getNotes
Return: <text/plain> Summarized text and transcription | Parameter | Type | Description |
|---|---|---|
| ID | Integer | Required. ID returned from /register api |
Websocket ws://${HOST}?ID=${clientID}&languageCode=${languageCode}
Example: ws://LectureNotes:8080?ID=12&languageCode=hi-IN
Note: Use native websockets
Return: <text/plain> errors, if any| QueryString Parameter | Type | Description |
|---|---|---|
| ID | Integer | Required. ID returned from /register api |
| QueryString Parameter | Type | Description |
|---|---|---|
| language | String | Language code; Defaults to en-US |
| Websocket message | Type | Description |
|---|---|---|
| base64 string | base64 encoded string of audio; must not exceed 10MB/15 secs |
Install node and npm
https://nodejs.org/en/download/
Clone the project
git clone https://github.com/smitdesai1010/LectureNotes.git
Add the following environment variables in a .env file located in ./Server folder
DEEPAI_KEY GOOGLE_APPLICATION_CREDENTIALS
Go to the project directory
cd Server
npm install //Install dependencies
npm start //starts server
To start client (chrome-extension)
Open Google chrome
Click on "extension" > "Manage extensions"
Click on "Load unpacked" > select the ./client/ folder in the project directory
Click on "extensions" > "LectureNotes"
If you have any feedback, please reach out to me at smitdesai1010@gmail.com



