Skip to content

Comments

feat: extract email metadata from file-type messages#192

Open
derodero24 wants to merge 1 commit intokorotovsky:masterfrom
derodero24:feat/extract-email-file-text
Open

feat: extract email metadata from file-type messages#192
derodero24 wants to merge 1 commit intokorotovsky:masterfrom
derodero24:feat/extract-email-file-text

Conversation

@derodero24
Copy link

@derodero24 derodero24 commented Feb 8, 2026

Summary

  • Add FilesToText() to extract email metadata (From, CC, Subject) from files[] when msg.Text is empty
  • Email messages forwarded to Slack channels store content in files[] with filetype: "email", resulting in completely empty text output — this PR fills that gap

Ref #191 (partial fix — metadata only; email body requires upstream dependency update)

Before / After

Slack API response (forwarded email)

{
  "text": "",
  "files": [{
    "filetype": "email",
    "subject": "Meeting Tomorrow",
    "from": [{"name": "John Doe", "address": "john@example.com"}],
    "cc": [{"address": "team@example.com"}]
  }]
}

conversations_history CSV output

Before:

MsgID,UserID,UserName,RealName,Channel,ThreadTs,Text,Time,BotName,Cursor
1770523338.574369,USLACKBOT,Email,Email,C001,,,2026-02-08T04:02:18Z,Email,

After:

MsgID,UserID,UserName,RealName,Channel,ThreadTs,Text,Time,BotName,Cursor
1770523338.574369,USLACKBOT,Email,Email,C001,,Email, From: John Doe - john at example.com, CC: team at example.com, Subject: Meeting Tomorrow,2026-02-08T04:02:18Z,Email,

Design decisions

Conflict with #190

This PR conflicts with #190 (Block Kit text extraction) at conversations.go L702, since both modify the same fallback logic. The resolution is to chain the fallbacks:

msgText := msg.Text
if msgText == "" {
    msgText = text.BlocksToText(msg.Blocks)   // #190
}
if msgText == "" {
    msgText = text.FilesToText(msg.Files)      // this PR
}
msgText += text.AttachmentsTo2CSV(msgText, msg.Attachments)

Whichever PR merges first, the other can be rebased with this 3-line addition.

Test plan

  • 7 unit tests for FilesToText (filtering, all field combinations, edge cases)
  • 2 pipeline tests verifying output survives ProcessText/filterSpecialChars
  • Verified against real Slack email messages (with/without CC, with attachments, with inline images)

When emails are forwarded to Slack channels, message content is stored in
files[] with filetype "email" rather than in text or blocks. This adds
FilesToText() to extract From, CC, and Subject metadata as a fallback
when msg.Text is empty, so these messages no longer appear as blank rows
in conversations_history output.

Closes korotovsky#191
@korotovsky
Copy link
Owner

@derodero24 Could you please check if this is something your PR collides with? #188

@derodero24
Copy link
Author

@korotovsky I checked — minimal collision. #188 modifies AttachmentToText internals, while this PR adds new functions below it. The only conflict would be a duplicate import line in the test file. Happy to rebase if #188 merges first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants