🛡️ Sentinel: Validate audio file uploads using magic bytes#67
🛡️ Sentinel: Validate audio file uploads using magic bytes#67Krosebrook wants to merge 1 commit intomainfrom
Conversation
- Implemented `detectAudioFormat` in `server/utils.ts` to inspect file buffers for valid magic bytes (MP3, WAV, OGG, FLAC, M4A/AAC). - Updated `/api/audio/generate-with-reference` in `server/routes.ts` to use this validation, rejecting files that do not match expected audio formats regardless of their extension or MIME type. - Added comprehensive tests for audio detection in `server/utils.test.ts`. This mitigates the risk of malicious file uploads disguised as audio files. Co-authored-by: Krosebrook <214532761+Krosebrook@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Summary of ChangesHello @Krosebrook, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the security posture of the application by introducing server-side validation for audio file uploads. By inspecting the "magic bytes" at the beginning of uploaded files, the system can now accurately determine if a file is a genuine audio format, thereby mitigating risks associated with disguised or malicious file uploads. This ensures that only expected and safe audio content is processed, improving the overall integrity and stability of the service. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughThe changes add audio format detection to validate uploaded audio files before processing. A new utility function identifies audio formats via magic bytes, which is integrated into the audio generation route to reject invalid formats with a 400 error, and covered with comprehensive tests for multiple audio format signatures. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
The pull request introduces a robust server-side validation for audio uploads using magic bytes, which significantly improves the security of the application. The implementation of detectAudioFormat is logically sound and covers the required formats. I have suggested a few improvements regarding efficiency and ensuring consistency between the detected format and the data URL used for external API calls.
| export function detectAudioFormat(buffer: Buffer): string | null { | ||
| if (!buffer || buffer.length < 4) return null; | ||
|
|
||
| // WAV: RIFF .... WAVE | ||
| if (buffer.length >= 12 && | ||
| buffer.toString('hex', 0, 4) === '52494646' && | ||
| buffer.toString('hex', 8, 12) === '57415645') { | ||
| return 'audio/wav'; | ||
| } | ||
|
|
||
| // MP3: ID3 | ||
| if (buffer.toString('hex', 0, 3) === '494433') { | ||
| return 'audio/mpeg'; | ||
| } | ||
|
|
||
| // AAC: ADTS Header | ||
| // Sync (12 bits) means first 4 bits of byte 1 are 1. | ||
| // Layer (bits 1,2 of byte 1) is 00. | ||
| // Mask 0xF6 (1111 0110) should result in 0xF0 (1111 0000) | ||
| if (buffer[0] === 0xFF && (buffer[1] & 0xF6) === 0xF0) { | ||
| return 'audio/aac'; | ||
| } | ||
|
|
||
| // MP3: Frame Sync (11 bits) means first 3 bits of byte 1 are 1. | ||
| // Layer (bits 1,2 of byte 1) is NOT 00. | ||
| if (buffer[0] === 0xFF && (buffer[1] & 0xE0) === 0xE0 && (buffer[1] & 0x06) !== 0x00) { | ||
| return 'audio/mpeg'; | ||
| } | ||
|
|
||
| // OGG: OggS | ||
| if (buffer.toString('utf8', 0, 4) === 'OggS') { | ||
| return 'audio/ogg'; | ||
| } | ||
|
|
||
| // FLAC: fLaC | ||
| if (buffer.toString('utf8', 0, 4) === 'fLaC') { | ||
| return 'audio/flac'; | ||
| } | ||
|
|
||
| // M4A/AAC (MP4 container): ftypM4A or ftypmp42 or similar | ||
| // Looking for 'ftyp' at offset 4 | ||
| if (buffer.length >= 8 && buffer.toString('utf8', 4, 8) === 'ftyp') { | ||
| return 'audio/mp4'; // Covers m4a, mp4 audio | ||
| } | ||
|
|
||
| return null; | ||
| } |
There was a problem hiding this comment.
The current implementation of detectAudioFormat creates multiple strings (hex and utf8) for each validation check. For better performance and reduced memory allocation, especially when processing binary data, it is recommended to perform direct byte comparisons on the buffer.
export function detectAudioFormat(buffer: Buffer): string | null {
if (!buffer || buffer.length < 4) return null;
// WAV: RIFF .... WAVE
if (buffer.length >= 12 &&
buffer[0] === 0x52 && buffer[1] === 0x49 && buffer[2] === 0x46 && buffer[3] === 0x46 &&
buffer[8] === 0x57 && buffer[9] === 0x41 && buffer[10] === 0x56 && buffer[11] === 0x45) {
return 'audio/wav';
}
// MP3: ID3
if (buffer[0] === 0x49 && buffer[1] === 0x44 && buffer[2] === 0x33) {
return 'audio/mpeg';
}
// AAC: ADTS Header
if (buffer[0] === 0xFF && (buffer[1] & 0xF6) === 0xF0) {
return 'audio/aac';
}
// MP3: Frame Sync
if (buffer[0] === 0xFF && (buffer[1] & 0xE0) === 0xE0 && (buffer[1] & 0x06) !== 0x00) {
return 'audio/mpeg';
}
// OGG: OggS
if (buffer[0] === 0x4F && buffer[1] === 0x67 && buffer[2] === 0x67 && buffer[3] === 0x53) {
return 'audio/ogg';
}
// FLAC: fLaC
if (buffer[0] === 0x66 && buffer[1] === 0x4C && buffer[2] === 0x61 && buffer[3] === 0x43) {
return 'audio/flac';
}
// M4A/AAC (MP4 container): ftyp
if (buffer.length >= 8 &&
buffer[4] === 0x66 && buffer[5] === 0x74 && buffer[6] === 0x79 && buffer[7] === 0x70) {
return 'audio/mp4';
}
return null;
}There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: abec10fc9e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (buffer.length >= 8 && buffer.toString('utf8', 4, 8) === 'ftyp') { | ||
| return 'audio/mp4'; // Covers m4a, mp4 audio |
There was a problem hiding this comment.
Restrict MP4 magic-byte detection to audio brands
The new detectAudioFormat logic classifies any buffer with ftyp at offset 4 as audio/mp4, but many non-audio ISO-BMFF files (e.g., HEIC/AVIF images and video MP4s) also match this signature; because this function is now the content gate for /api/audio/generate-with-reference, such files can pass the new validation and be forwarded as reference audio. This weakens the intended security check—ftyp should be combined with an allowed major brand (or deeper container inspection) before accepting as audio.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR adds server-side validation of audio file uploads by inspecting magic bytes to prevent malicious files disguised as audio. A new detectAudioFormat utility function validates file signatures for MP3, WAV, OGG, FLAC, M4A, and AAC formats, integrated into the /api/audio/generate-with-reference endpoint with comprehensive unit tests.
Changes:
- Added
detectAudioFormatfunction inserver/utils.tsto validate audio file formats via magic byte inspection - Integrated magic byte validation into the audio upload endpoint in
server/routes.ts - Added comprehensive unit tests in
server/utils.test.tscovering various audio formats and edge cases
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| server/utils.ts | Implements detectAudioFormat function with magic byte checks for WAV, MP3, AAC, OGG, FLAC, and M4A/MP4 formats |
| server/utils.test.ts | Adds test cases for audio format detection including valid formats and rejection of invalid inputs |
| server/routes.ts | Integrates magic byte validation into /api/audio/generate-with-reference endpoint before processing uploads |
| name: "Detect M4A/MP4 (ftypM4A)", | ||
| input: Buffer.from("00000020667479704d344120", "hex"), // ...ftypM4A | ||
| expected: "audio/mp4" | ||
| }, | ||
| { |
There was a problem hiding this comment.
The test case for M4A/MP4 detection doesn't validate the major brand checks needed to distinguish audio-only containers from video containers. According to the codebase pattern (server/utils.test.ts:110-124 from memories), tests should include cases for M4A (4d344120), M4B (4d344220), M4P (4d345020) acceptance and isom rejection. The current test only checks for 'ftyp' presence which is insufficient.
| name: "Detect M4A/MP4 (ftypM4A)", | |
| input: Buffer.from("00000020667479704d344120", "hex"), // ...ftypM4A | |
| expected: "audio/mp4" | |
| }, | |
| { | |
| name: "Detect M4A (ftypM4A)", | |
| input: Buffer.from("00000020667479704d344120", "hex"), // size(32) + ftyp + major brand "M4A " | |
| expected: "audio/mp4" | |
| }, | |
| { | |
| name: "Detect M4B (ftypM4B)", | |
| input: Buffer.from("00000020667479704d344220", "hex"), // size(32) + ftyp + major brand "M4B " | |
| expected: "audio/mp4" | |
| }, | |
| { | |
| name: "Detect M4P (ftypM4P)", | |
| input: Buffer.from("00000020667479704d345020", "hex"), // size(32) + ftyp + major brand "M4P " | |
| expected: "audio/mp4" | |
| }, | |
| { | |
| name: "Reject MP4 video (ftypisom)", | |
| input: Buffer.from("000000206674797069736f6d", "hex"), // size(32) + ftyp + major brand "isom" | |
| expected: null | |
| }, | |
| { |
| input: Buffer.from("FFF15080", "hex"), | ||
| expected: "audio/aac" | ||
| }, | ||
| { |
There was a problem hiding this comment.
The test suite lacks coverage for MP3 files without ID3 tags (frame sync only). The current test only covers MP3 with ID3 tags (line 65), but the implementation also supports MP3 frame sync detection (server/utils.ts:69-71). Add a test case with MP3 frame sync bytes (e.g., Buffer.from("FFE30000", "hex")) to ensure both detection paths work correctly and don't conflict with AAC detection.
| { | |
| { | |
| name: "Detect MP3 (frame sync)", | |
| input: Buffer.from("FFE30000", "hex"), | |
| expected: "audio/mpeg" | |
| }, | |
| { |
| return res.status(400).json({ message: "Reference audio file is required" }); | ||
| } | ||
|
|
||
| // Sentinel: Validate file content using magic bytes |
There was a problem hiding this comment.
The comment uses the "Sentinel:" branded prefix, which violates the established codebase convention. According to the style guide, comments should not use branded prefixes like "Sentinel:", "Tool:", or "AI:". Use a simple descriptive comment instead, such as "Validate file content using magic bytes".
| export function detectAudioFormat(buffer: Buffer): string | null { | ||
| if (!buffer || buffer.length < 4) return null; | ||
|
|
||
| // WAV: RIFF .... WAVE | ||
| if (buffer.length >= 12 && | ||
| buffer.toString('hex', 0, 4) === '52494646' && | ||
| buffer.toString('hex', 8, 12) === '57415645') { | ||
| return 'audio/wav'; | ||
| } | ||
|
|
||
| // MP3: ID3 | ||
| if (buffer.toString('hex', 0, 3) === '494433') { | ||
| return 'audio/mpeg'; | ||
| } | ||
|
|
||
| // AAC: ADTS Header | ||
| // Sync (12 bits) means first 4 bits of byte 1 are 1. | ||
| // Layer (bits 1,2 of byte 1) is 00. | ||
| // Mask 0xF6 (1111 0110) should result in 0xF0 (1111 0000) | ||
| if (buffer[0] === 0xFF && (buffer[1] & 0xF6) === 0xF0) { | ||
| return 'audio/aac'; | ||
| } | ||
|
|
||
| // MP3: Frame Sync (11 bits) means first 3 bits of byte 1 are 1. | ||
| // Layer (bits 1,2 of byte 1) is NOT 00. | ||
| if (buffer[0] === 0xFF && (buffer[1] & 0xE0) === 0xE0 && (buffer[1] & 0x06) !== 0x00) { | ||
| return 'audio/mpeg'; | ||
| } | ||
|
|
||
| // OGG: OggS | ||
| if (buffer.toString('utf8', 0, 4) === 'OggS') { | ||
| return 'audio/ogg'; | ||
| } | ||
|
|
||
| // FLAC: fLaC | ||
| if (buffer.toString('utf8', 0, 4) === 'fLaC') { | ||
| return 'audio/flac'; | ||
| } | ||
|
|
||
| // M4A/AAC (MP4 container): ftypM4A or ftypmp42 or similar | ||
| // Looking for 'ftyp' at offset 4 | ||
| if (buffer.length >= 8 && buffer.toString('utf8', 4, 8) === 'ftyp') { | ||
| return 'audio/mp4'; // Covers m4a, mp4 audio | ||
| } | ||
|
|
||
| return null; | ||
| } |
There was a problem hiding this comment.
This function duplicates existing functionality. There is already a detectAudioFormat function in server/replit_integrations/audio/client.ts (lines 20-47) that serves a similar purpose. The existing function returns an AudioFormat type instead of MIME type strings and includes WebM support. Consider consolidating these functions to avoid code duplication and maintenance overhead. Either reuse the existing function or refactor to have a single source of truth for audio format detection.
| // M4A/AAC (MP4 container): ftypM4A or ftypmp42 or similar | ||
| // Looking for 'ftyp' at offset 4 | ||
| if (buffer.length >= 8 && buffer.toString('utf8', 4, 8) === 'ftyp') { | ||
| return 'audio/mp4'; // Covers m4a, mp4 audio |
There was a problem hiding this comment.
The M4A/MP4 container detection accepts any file with 'ftyp' at offset 4, which is too broad and could lead to accepting non-audio MP4 files (e.g., video files). According to established patterns in the codebase (server/utils.ts:95-102 from memories), M4A validation should check the major brand at bytes 8-11 and accept only M4A (4d344120), M4B (4d344220), M4P (4d345020) brands while rejecting generic MP4 brands like isom, mp41, mp42. This prevents accepting video files disguised as audio.
| // M4A/AAC (MP4 container): ftypM4A or ftypmp42 or similar | |
| // Looking for 'ftyp' at offset 4 | |
| if (buffer.length >= 8 && buffer.toString('utf8', 4, 8) === 'ftyp') { | |
| return 'audio/mp4'; // Covers m4a, mp4 audio | |
| // M4A-family (MP4 container): ftyp + specific major brands (M4A/M4B/M4P) | |
| // Looking for 'ftyp' at offset 4 and validating major brand at bytes 8-11 | |
| if (buffer.length >= 12 && buffer.toString('utf8', 4, 8) === 'ftyp') { | |
| const majorBrandHex = buffer.toString('hex', 8, 12).toLowerCase(); | |
| if ( | |
| majorBrandHex === '4d344120' || // "M4A " | |
| majorBrandHex === '4d344220' || // "M4B " | |
| majorBrandHex === '4d345020' // "M4P " | |
| ) { | |
| return 'audio/mp4'; // Covers m4a/m4b/m4p audio containers | |
| } |
| // Sync (12 bits) means first 4 bits of byte 1 are 1. | ||
| // Layer (bits 1,2 of byte 1) is 00. | ||
| // Mask 0xF6 (1111 0110) should result in 0xF0 (1111 0000) | ||
| if (buffer[0] === 0xFF && (buffer[1] & 0xF6) === 0xF0) { | ||
| return 'audio/aac'; | ||
| } | ||
|
|
||
| // MP3: Frame Sync (11 bits) means first 3 bits of byte 1 are 1. | ||
| // Layer (bits 1,2 of byte 1) is NOT 00. |
There was a problem hiding this comment.
The AAC ADTS detection logic is too permissive and could create false positives with MP3 files. According to established patterns (server/utils.ts:96-98 from memories), AAC ADTS files should use explicit checks for 0xFF 0xF1 (MPEG-4) and 0xFF 0xF9 (MPEG-2) rather than bit masking to avoid ambiguity with MP3 sync frames. The current mask (buffer[1] & 0xF6) === 0xF0 would match bytes like 0xF1, 0xF5, 0xF9, but also 0xF0, 0xF4, 0xF8 which might not be valid AAC. Using explicit byte value checks reduces the risk of misidentification.
| // Sync (12 bits) means first 4 bits of byte 1 are 1. | |
| // Layer (bits 1,2 of byte 1) is 00. | |
| // Mask 0xF6 (1111 0110) should result in 0xF0 (1111 0000) | |
| if (buffer[0] === 0xFF && (buffer[1] & 0xF6) === 0xF0) { | |
| return 'audio/aac'; | |
| } | |
| // MP3: Frame Sync (11 bits) means first 3 bits of byte 1 are 1. | |
| // Layer (bits 1,2 of byte 1) is NOT 00. | |
| // MPEG-4: 0xFF 0xF1 | |
| // MPEG-2: 0xFF 0xF9 | |
| if (buffer[0] === 0xFF && (buffer[1] === 0xF1 || buffer[1] === 0xF9)) { | |
| return 'audio/aac'; | |
| } | |
| // MP3: Frame Sync (11 bits) means first 3 bits of byte 1 are 1. | |
| // Layer (bits 1,2 of byte 1) is NOT 00. | |
| // Layer (bits 1,2 of byte 1) is NOT 00. |
| console.error(`Invalid file format detected. Mime: ${file.mimetype}`); | ||
| return res.status(400).json({ message: "Invalid audio file format. Please upload a valid audio file." }); |
There was a problem hiding this comment.
According to the security logging memory (server/routes.ts:1151-1158), file upload validation failures should be logged with sanitizeLog() including userId, fileSize, mimeType, and originalName for security monitoring and incident response. The current implementation only logs the MIME type without using sanitizeLog() and doesn't include other security-relevant metadata like userId or fileSize. This makes it harder to track and investigate potential security incidents.
| // AAC: ADTS Header | ||
| // Sync (12 bits) means first 4 bits of byte 1 are 1. | ||
| // Layer (bits 1,2 of byte 1) is 00. | ||
| // Mask 0xF6 (1111 0110) should result in 0xF0 (1111 0000) | ||
| if (buffer[0] === 0xFF && (buffer[1] & 0xF6) === 0xF0) { | ||
| return 'audio/aac'; | ||
| } | ||
|
|
||
| // MP3: Frame Sync (11 bits) means first 3 bits of byte 1 are 1. | ||
| // Layer (bits 1,2 of byte 1) is NOT 00. | ||
| if (buffer[0] === 0xFF && (buffer[1] & 0xE0) === 0xE0 && (buffer[1] & 0x06) !== 0x00) { | ||
| return 'audio/mpeg'; | ||
| } |
There was a problem hiding this comment.
The AAC ADTS detection (lines 63-65) is placed before the MP3 frame sync detection (lines 69-71). This ordering could cause MP3 files without ID3 tags to be misidentified as AAC, since MP3 frame sync bytes (0xFF 0xE0-0xFF) could match the AAC check depending on the specific byte values. The AAC check should be made more restrictive (as noted in another comment) or moved after the MP3 frame sync check, or both checks should be mutually exclusive to prevent misidentification.
There was a problem hiding this comment.
2 issues found across 3 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="server/routes.ts">
<violation number="1" location="server/routes.ts:1152">
P2: The magic‑byte detection is only used as a boolean check, but the untrusted `file.mimetype` is still used to build the data URL. A spoofed MIME can pass validation and be forwarded with the wrong type. Use the detected format (and/or reject mismatches) so downstream processing uses the verified MIME.</violation>
</file>
<file name="server/utils.ts">
<violation number="1" location="server/utils.ts:85">
P2: The `ftyp` check is too permissive for audio validation. `ftyp` is present in any MP4 container (including video), so this accepts video MP4s as `audio/mp4`, undermining the audio-only validation. Consider validating the MP4 brand/compatible brands or inspecting tracks to ensure the container is audio-only.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| } | ||
|
|
||
| // Sentinel: Validate file content using magic bytes | ||
| const detectedFormat = detectAudioFormat(file.buffer); |
There was a problem hiding this comment.
P2: The magic‑byte detection is only used as a boolean check, but the untrusted file.mimetype is still used to build the data URL. A spoofed MIME can pass validation and be forwarded with the wrong type. Use the detected format (and/or reject mismatches) so downstream processing uses the verified MIME.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/routes.ts, line 1152:
<comment>The magic‑byte detection is only used as a boolean check, but the untrusted `file.mimetype` is still used to build the data URL. A spoofed MIME can pass validation and be forwarded with the wrong type. Use the detected format (and/or reject mismatches) so downstream processing uses the verified MIME.</comment>
<file context>
@@ -1147,6 +1148,13 @@ Also suggest a fitting title for the song.`;
}
+ // Sentinel: Validate file content using magic bytes
+ const detectedFormat = detectAudioFormat(file.buffer);
+ if (!detectedFormat) {
+ console.error(`Invalid file format detected. Mime: ${file.mimetype}`);
</file context>
|
|
||
| // M4A/AAC (MP4 container): ftypM4A or ftypmp42 or similar | ||
| // Looking for 'ftyp' at offset 4 | ||
| if (buffer.length >= 8 && buffer.toString('utf8', 4, 8) === 'ftyp') { |
There was a problem hiding this comment.
P2: The ftyp check is too permissive for audio validation. ftyp is present in any MP4 container (including video), so this accepts video MP4s as audio/mp4, undermining the audio-only validation. Consider validating the MP4 brand/compatible brands or inspecting tracks to ensure the container is audio-only.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/utils.ts, line 85:
<comment>The `ftyp` check is too permissive for audio validation. `ftyp` is present in any MP4 container (including video), so this accepts video MP4s as `audio/mp4`, undermining the audio-only validation. Consider validating the MP4 brand/compatible brands or inspecting tracks to ensure the container is audio-only.</comment>
<file context>
@@ -40,3 +40,51 @@ export function sanitizeLog(data: any): any {
+
+ // M4A/AAC (MP4 container): ftypM4A or ftypmp42 or similar
+ // Looking for 'ftyp' at offset 4
+ if (buffer.length >= 8 && buffer.toString('utf8', 4, 8) === 'ftyp') {
+ return 'audio/mp4'; // Covers m4a, mp4 audio
+ }
</file context>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/routes.ts (1)
1151-1164:⚠️ Potential issue | 🟠 Major
detectedFormatis computed but then discarded — the data URL is still built from client-suppliedfile.mimetype.The magic-bytes check only acts as a boolean gate (
if (!detectedFormat)). When validation passes,file.mimetype— which is client-controlled and may have already been spoofed past the multer MIME filter — is used verbatim in the data URL at line 1164. This defeats the primary purpose of the detection: ensuring the downstream service receives a data URL whose MIME type actually matches the file content.Replace
file.mimetypewithdetectedFormatwhen constructing the data URL:🛡️ Proposed fix
const detectedFormat = detectAudioFormat(file.buffer); if (!detectedFormat) { - console.error(`Invalid file format detected. Mime: ${file.mimetype}`); + console.warn("Invalid audio file format detected", { mimetype: file.mimetype }); return res.status(400).json({ message: "Invalid audio file format. Please upload a valid audio file." }); } const { prompt, duration } = req.body; ... const base64 = file.buffer.toString("base64"); - const dataUrl = `data:${file.mimetype};base64,${base64}`; + const dataUrl = `data:${detectedFormat};base64,${base64}`;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/routes.ts` around lines 1151 - 1164, detectedFormat from detectAudioFormat is validated but then ignored; replace use of the client-supplied file.mimetype when building the data URL so the MIME reflects the detected format. Update the construction of dataUrl to use detectedFormat (the value returned by detectAudioFormat) instead of file.mimetype so downstream consumers receive an accurate MIME type; ensure any subsequent logic that reads dataUrl still works with the new MIME string and keep base64 encoding of file.buffer unchanged.
🧹 Nitpick comments (4)
server/utils.ts (2)
44-44: Add JSDoc to this public exported function.
detectAudioFormatis part of the module's public surface but has no JSDoc, which violates the "Add JSDoc comments for public APIs" guideline.📝 Suggested addition
+/** + * Detects the audio format of a file buffer by inspecting its magic bytes. + * `@param` buffer - Raw file bytes to inspect (minimum 4 bytes required). + * `@returns` MIME type string (e.g. `'audio/wav'`) if a known format is detected, or `null`. + */ export function detectAudioFormat(buffer: Buffer): string | null {As per coding guidelines, "Add JSDoc comments for public APIs".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/utils.ts` at line 44, Add a JSDoc block for the exported function detectAudioFormat describing its purpose, parameters and return value: document that it accepts a Node.js Buffer (parameter name buffer) containing audio data and that it returns a string with the detected audio format or null if unknown; include `@param` {Buffer} buffer and `@returns` {string|null} annotations and a short one-line description (optionally an example or notes about supported formats) so the public API is properly documented.
48-87: Inconsistent byte-inspection style across the function.Three different approaches are used to read the same underlying bytes:
buffer.toString('hex', …)for WAV and MP3-ID3 (lines 49–50, 55)buffer[n]direct byte access for AAC and MP3 frame sync (lines 63–64, 69)buffer.toString('utf8', …)for OGG, FLAC, and M4A (lines 74, 79, 85)Direct byte comparisons (
buffer[n]) are both the fastest and the most readable for this use-case. Standardising on them would also make it easier to compare with the existingclient.tsimplementation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/utils.ts` around lines 48 - 87, The detection code mixes buffer.toString() and buffer[n] checks; standardize to direct byte comparisons on the same buffer variable: replace the WAV check (currently using buffer.toString('hex', 0, 4) and buffer.toString('hex', 8, 12)) with explicit byte checks for 'R','I','F','F' (0x52,0x49,0x46,0x46) and 'W','A','V','E' (0x57,0x41,0x56,0x45); replace the ID3 MP3 check (buffer.toString('hex', 0, 3) === '494433') with checks for 'I','D','3' bytes; replace OggS, fLaC and the 'ftyp' offset-4 MP4/M4A check (currently using buffer.toString('utf8', ...)) with byte comparisons for 'O','g','g','S', 'f','L','a','C', and 'f','t','y','p' at offset 4 respectively; keep the existing AAC and MP3 frame-sync logic that already uses buffer[n]; ensure bounds checks (buffer.length) remain before accessing indexes.server/utils.test.ts (2)
62-108: Missing test case for the MP3 frame-sync detection path (lines 69–71 ofutils.ts).The existing MP3 test covers only the ID3 header path (
buffer[0-2] = 0x49 0x44 0x33). The second MP3 branch — raw MPEG frame sync (0xFF 0xFBetc.) — has no coverage. A typical MPEG1 Layer III frame starts with0xFF 0xFBor0xFF 0xFA; these should be verified to return'audio/mpeg'and should also confirm the AAC gate doesn't mis-classify them.➕ Suggested additional test cases
+ { + name: "Detect MP3 (frame sync, MPEG1 Layer III)", + input: Buffer.from("FFFB9400", "hex"), // 0xFF 0xFB: MPEG1, Layer III, no CRC + expected: "audio/mpeg" + }, + { + name: "Detect MP3 (frame sync, MPEG2 Layer III)", + input: Buffer.from("FFF39400", "hex"), // 0xFF 0xF3: MPEG2, Layer III, no CRC + expected: "audio/mpeg" + },As per coding guidelines, "Test utility functions and data transformations".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/utils.test.ts` around lines 62 - 108, Add test cases to cover the MP3 frame-sync branch in utils.ts (lines 69–71) by extending the existing audioTestCases: include at least two inputs whose first bytes are 0xFF 0xFB and 0xFF 0xFA (typical MPEG1 Layer III frame headers) and assert the detection returns "audio/mpeg"; place these cases alongside the other entries in audioTestCases and ensure they also verify the AAC detection path does not mis-classify these buffers (i.e., expected remains "audio/mpeg").
73-82: OGG/FLAC test buffers useutf8encoding — the trailing bytes are ASCII digit characters, not binary zeros.
Buffer.from("OggS00020000000000000000", "utf8")produces bytes where the'0'and'2'characters are0x30and0x32(ASCII), not0x00/0x02. The tests still pass since only the first 4 bytes are checked, but the intent looks like real file header data. Use hex encoding for clarity.♻️ Clearer test data construction
- input: Buffer.from("OggS00020000000000000000", "utf8"), + input: Buffer.from("4f67675300020000000000000000000000000000", "hex"), // OggS + real OGG capture page header bytes - input: Buffer.from("fLaC00000022", "utf8"), + input: Buffer.from("664c614300000022", "hex"), // fLaC + STREAMINFO block header🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/utils.test.ts` around lines 73 - 82, The test buffers for "Detect OGG" and "Detect FLAC" are constructed with utf8, so characters like '0' and '2' become ASCII bytes instead of the intended binary zeros and 0x02; update the Buffer.from calls in server/utils.test.ts for the "Detect OGG" and "Detect FLAC" cases to use hex-encoded literals representing the real file headers (e.g., OggS -> hex "4f676753000200000000000000" and fLaC -> hex "664c614300000022") and call Buffer.from(..., "hex") so the tests use actual binary header bytes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/routes.ts`:
- Line 1154: Replace the console.error call that logs user-supplied
file.mimetype with a warning-level log and sanitize the value first: change the
call from console.error(...) to a warning (e.g., console.warn(...) or your
structured logger's warn method) and pass sanitizeLog(file.mimetype) instead of
file.mimetype so the output uses sanitized user data; keep the descriptive
message (e.g., "Invalid file format detected. Mime:") and concatenate or
template it with the sanitized value.
In `@server/utils.ts`:
- Around line 44-89: There are two divergent detectAudioFormat implementations
(this one and the one in the audio client module); consolidate by extracting a
single canonical utility, update both callers to import it, and ensure the
canonical function returns MIME types and covers all formats needed (WAV, MP3,
AAC/ADTS, MP4/M4A, OGG, FLAC, WebM if required); specifically, move logic from
this detectAudioFormat into a shared utility (e.g., shared audio detection
helper), reconcile minimum buffer checks (use safe length checks like >=12 where
needed), preserve/merge checks from the other implementation (frame sync, ADTS,
'ftyp' at offset 4, 'OggS', 'fLaC'), export the unified detectAudioFormat and
replace both local implementations with imports.
- Around line 83-87: The current detection logic that checks
buffer.toString('utf8', 4, 8) === 'ftyp' is too broad and will match video
MP4/MOV; update the check inside the MIME-detection routine (the branch that
currently returns 'audio/mp4') to also verify the brand bytes at offset 8–12 and
only treat as audio when the brand is an audio-specific brand (e.g., 'M4A ',
'M4B ', 'm4a ', 'mp42' as appropriate); specifically, update the condition
around the buffer.toString(...) ftyp check and add a secondary
buffer.toString('utf8', 8, 12) in the same function to match allowed audio
brands before returning 'audio/mp4'.
---
Outside diff comments:
In `@server/routes.ts`:
- Around line 1151-1164: detectedFormat from detectAudioFormat is validated but
then ignored; replace use of the client-supplied file.mimetype when building the
data URL so the MIME reflects the detected format. Update the construction of
dataUrl to use detectedFormat (the value returned by detectAudioFormat) instead
of file.mimetype so downstream consumers receive an accurate MIME type; ensure
any subsequent logic that reads dataUrl still works with the new MIME string and
keep base64 encoding of file.buffer unchanged.
---
Nitpick comments:
In `@server/utils.test.ts`:
- Around line 62-108: Add test cases to cover the MP3 frame-sync branch in
utils.ts (lines 69–71) by extending the existing audioTestCases: include at
least two inputs whose first bytes are 0xFF 0xFB and 0xFF 0xFA (typical MPEG1
Layer III frame headers) and assert the detection returns "audio/mpeg"; place
these cases alongside the other entries in audioTestCases and ensure they also
verify the AAC detection path does not mis-classify these buffers (i.e.,
expected remains "audio/mpeg").
- Around line 73-82: The test buffers for "Detect OGG" and "Detect FLAC" are
constructed with utf8, so characters like '0' and '2' become ASCII bytes instead
of the intended binary zeros and 0x02; update the Buffer.from calls in
server/utils.test.ts for the "Detect OGG" and "Detect FLAC" cases to use
hex-encoded literals representing the real file headers (e.g., OggS -> hex
"4f676753000200000000000000" and fLaC -> hex "664c614300000022") and call
Buffer.from(..., "hex") so the tests use actual binary header bytes.
In `@server/utils.ts`:
- Line 44: Add a JSDoc block for the exported function detectAudioFormat
describing its purpose, parameters and return value: document that it accepts a
Node.js Buffer (parameter name buffer) containing audio data and that it returns
a string with the detected audio format or null if unknown; include `@param`
{Buffer} buffer and `@returns` {string|null} annotations and a short one-line
description (optionally an example or notes about supported formats) so the
public API is properly documented.
- Around line 48-87: The detection code mixes buffer.toString() and buffer[n]
checks; standardize to direct byte comparisons on the same buffer variable:
replace the WAV check (currently using buffer.toString('hex', 0, 4) and
buffer.toString('hex', 8, 12)) with explicit byte checks for 'R','I','F','F'
(0x52,0x49,0x46,0x46) and 'W','A','V','E' (0x57,0x41,0x56,0x45); replace the ID3
MP3 check (buffer.toString('hex', 0, 3) === '494433') with checks for
'I','D','3' bytes; replace OggS, fLaC and the 'ftyp' offset-4 MP4/M4A check
(currently using buffer.toString('utf8', ...)) with byte comparisons for
'O','g','g','S', 'f','L','a','C', and 'f','t','y','p' at offset 4 respectively;
keep the existing AAC and MP3 frame-sync logic that already uses buffer[n];
ensure bounds checks (buffer.length) remain before accessing indexes.
| // Sentinel: Validate file content using magic bytes | ||
| const detectedFormat = detectAudioFormat(file.buffer); | ||
| if (!detectedFormat) { | ||
| console.error(`Invalid file format detected. Mime: ${file.mimetype}`); |
There was a problem hiding this comment.
console.error inappropriate for a validation failure; log user-controlled data through sanitizeLog.
Two issues on this line:
- A 400 validation rejection is not an unexpected error —
console.erroris the wrong level. Useconsole.warn(or a structured info-level log). file.mimetypeis client-supplied and logged directly. Per guidelines, user-controlled data should pass throughsanitizeLog()before logging.
📝 Proposed fix
- console.error(`Invalid file format detected. Mime: ${file.mimetype}`);
+ console.warn("Rejected upload: unrecognised audio format", sanitizeLog({ mimetype: file.mimetype }));As per coding guidelines, "Use sanitizeLog() before logging sensitive data".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| console.error(`Invalid file format detected. Mime: ${file.mimetype}`); | |
| console.warn("Rejected upload: unrecognised audio format", sanitizeLog({ mimetype: file.mimetype })); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/routes.ts` at line 1154, Replace the console.error call that logs
user-supplied file.mimetype with a warning-level log and sanitize the value
first: change the call from console.error(...) to a warning (e.g.,
console.warn(...) or your structured logger's warn method) and pass
sanitizeLog(file.mimetype) instead of file.mimetype so the output uses sanitized
user data; keep the descriptive message (e.g., "Invalid file format detected.
Mime:") and concatenate or template it with the sanitized value.
| export function detectAudioFormat(buffer: Buffer): string | null { | ||
| if (!buffer || buffer.length < 4) return null; | ||
|
|
||
| // WAV: RIFF .... WAVE | ||
| if (buffer.length >= 12 && | ||
| buffer.toString('hex', 0, 4) === '52494646' && | ||
| buffer.toString('hex', 8, 12) === '57415645') { | ||
| return 'audio/wav'; | ||
| } | ||
|
|
||
| // MP3: ID3 | ||
| if (buffer.toString('hex', 0, 3) === '494433') { | ||
| return 'audio/mpeg'; | ||
| } | ||
|
|
||
| // AAC: ADTS Header | ||
| // Sync (12 bits) means first 4 bits of byte 1 are 1. | ||
| // Layer (bits 1,2 of byte 1) is 00. | ||
| // Mask 0xF6 (1111 0110) should result in 0xF0 (1111 0000) | ||
| if (buffer[0] === 0xFF && (buffer[1] & 0xF6) === 0xF0) { | ||
| return 'audio/aac'; | ||
| } | ||
|
|
||
| // MP3: Frame Sync (11 bits) means first 3 bits of byte 1 are 1. | ||
| // Layer (bits 1,2 of byte 1) is NOT 00. | ||
| if (buffer[0] === 0xFF && (buffer[1] & 0xE0) === 0xE0 && (buffer[1] & 0x06) !== 0x00) { | ||
| return 'audio/mpeg'; | ||
| } | ||
|
|
||
| // OGG: OggS | ||
| if (buffer.toString('utf8', 0, 4) === 'OggS') { | ||
| return 'audio/ogg'; | ||
| } | ||
|
|
||
| // FLAC: fLaC | ||
| if (buffer.toString('utf8', 0, 4) === 'fLaC') { | ||
| return 'audio/flac'; | ||
| } | ||
|
|
||
| // M4A/AAC (MP4 container): ftypM4A or ftypmp42 or similar | ||
| // Looking for 'ftyp' at offset 4 | ||
| if (buffer.length >= 8 && buffer.toString('utf8', 4, 8) === 'ftyp') { | ||
| return 'audio/mp4'; // Covers m4a, mp4 audio | ||
| } | ||
|
|
||
| return null; |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Duplicate implementation — consolidate with existing detectAudioFormat in server/replit_integrations/audio/client.ts.
A function with the same name and purpose already exists in server/replit_integrations/audio/client.ts (lines 19–46). The two implementations diverge in return type (string | null here vs. short format names there), minimum buffer length (4 vs. 12 bytes), and format coverage (FLAC/AAC here; WebM there). Having two parallel implementations will inevitably drift.
Consider extending the existing function (or extracting a shared utility in @shared/) to return the MIME type and cover the missing formats, then importing from one canonical location.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/utils.ts` around lines 44 - 89, There are two divergent
detectAudioFormat implementations (this one and the one in the audio client
module); consolidate by extracting a single canonical utility, update both
callers to import it, and ensure the canonical function returns MIME types and
covers all formats needed (WAV, MP3, AAC/ADTS, MP4/M4A, OGG, FLAC, WebM if
required); specifically, move logic from this detectAudioFormat into a shared
utility (e.g., shared audio detection helper), reconcile minimum buffer checks
(use safe length checks like >=12 where needed), preserve/merge checks from the
other implementation (frame sync, ADTS, 'ftyp' at offset 4, 'OggS', 'fLaC'),
export the unified detectAudioFormat and replace both local implementations with
imports.
| // M4A/AAC (MP4 container): ftypM4A or ftypmp42 or similar | ||
| // Looking for 'ftyp' at offset 4 | ||
| if (buffer.length >= 8 && buffer.toString('utf8', 4, 8) === 'ftyp') { | ||
| return 'audio/mp4'; // Covers m4a, mp4 audio | ||
| } |
There was a problem hiding this comment.
ftyp box check at offset 4 matches any ISO BMFF container, including video MP4/MOV.
The check buffer.toString('utf8', 4, 8) === 'ftyp' will match a video .mp4 file equally, because the ftyp box is present in all ISO Base Media File Format containers regardless of whether they carry audio or video. To restrict to audio-only M4A, validate the brand bytes at offset 8–12 (e.g. M4A , M4B , mp42).
🛡️ Tighter brand check
- // M4A/AAC (MP4 container): ftypM4A or ftypmp42 or similar
- // Looking for 'ftyp' at offset 4
- if (buffer.length >= 8 && buffer.toString('utf8', 4, 8) === 'ftyp') {
- return 'audio/mp4'; // Covers m4a, mp4 audio
- }
+ // M4A/AAC (MP4 container): check ftyp box and audio-only brands
+ const AUDIO_MP4_BRANDS = ['M4A ', 'M4B ', 'mp42', 'isom', 'MSNV'];
+ if (buffer.length >= 12 &&
+ buffer.toString('utf8', 4, 8) === 'ftyp' &&
+ AUDIO_MP4_BRANDS.includes(buffer.toString('utf8', 8, 12))) {
+ return 'audio/mp4';
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/utils.ts` around lines 83 - 87, The current detection logic that
checks buffer.toString('utf8', 4, 8) === 'ftyp' is too broad and will match
video MP4/MOV; update the check inside the MIME-detection routine (the branch
that currently returns 'audio/mp4') to also verify the brand bytes at offset
8–12 and only treat as audio when the brand is an audio-specific brand (e.g.,
'M4A ', 'M4B ', 'm4a ', 'mp42' as appropriate); specifically, update the
condition around the buffer.toString(...) ftyp check and add a secondary
buffer.toString('utf8', 8, 12) in the same function to match allowed audio
brands before returning 'audio/mp4'.
Implemented server-side validation of audio file uploads by inspecting magic bytes. This ensures that only valid audio files (MP3, WAV, OGG, FLAC, M4A, AAC) are processed, preventing potential security risks associated with malicious file uploads disguised as audio. Added
detectAudioFormatutility and integrated it into the/api/audio/generate-with-referenceroute. Verified with unit tests covering various audio formats and invalid inputs.PR created automatically by Jules for task 15389321388176749818 started by @Krosebrook
Summary by cubic
Adds server-side magic byte validation for audio uploads to block disguised files. Only MP3, WAV, OGG, FLAC, M4A, and AAC are accepted in /api/audio/generate-with-reference.
Written for commit abec10f. Summary will update on new commits.
Summary by CodeRabbit