Skip to content

Conversation

@deepfates
Copy link
Owner

Summary

Adds support for importing Bluesky/AT Protocol repository exports (CAR files) alongside existing Twitter archive support.

Changes

  • New source adapter: src/sources/bluesky.ts for parsing AT Protocol CAR files
  • Dependencies: Added @atproto/repo and @atproto/api
  • Default formats: Added json to default output formats (now: markdown, oai, json)
  • Permalinks: Bluesky posts now include bsky.app links in Markdown output
  • Multi-source: Updated transforms to recognize bluesky:post for thread detection

Usage

# Export your Bluesky data from the app settings, then:
splice --source ~/Downloads/your-repo.car --out ./output

Testing

  • All existing tests pass (3/3)
  • Manually tested with a 16,904 post Bluesky archive

- Add src/sources/bluesky.ts adapter for AT Protocol CAR files
- Add @atproto/repo and @atproto/api dependencies
- Add 'json' to default output formats (markdown, oai, json)
- Update writers.ts with Bluesky permalink support
- Add bluesky:post to SELF_POST_SOURCES in transforms
- Update tests for multi-source support
- Add .splice/ to gitignore
inferRole() was using a Twitter-specific heuristic (checking for
full_text in raw) that didn't apply to Bluesky. All Bluesky posts
were classified as 'user', causing conversations_oai.jsonl to be empty.

Now bluesky:post source items are recognized as assistant messages.
Adds the ability to fetch parent posts from Bluesky's public API when
processing CAR exports. This enables proper multi-turn conversation
format in OAI JSONL output.

Changes:
- Add --enrich flag to CLI options
- Add enrichBlueskyPosts() with rate-limited batch fetching
- Preserve bluesky:fetched posts through filtering
- Skip fetched posts as chain starters in grouping
- Mark fetched posts as 'user' role in OAI output

Results for Berduck (16.9K posts):
- Fetched 12,498 parent posts successfully
- 12,890 conversations now have user→assistant turns
- 4,013 single-post conversations (parent unavailable)
Changed fetchPostChain to use parentHeight=50 and walk the full
parent chain. This enables multi-turn conversations with complete
thread history.

Results for Berduck:
- 13,886 unique context posts fetched (vs 12,498 before)
- 8,939 conversations with 4+ messages
- 3,976 conversations with 3 messages
Updated @mention regex from /@\w+/g to /@[\w.-]+/g to properly
match Bluesky handles like @berduck.deepfates.com instead of
leaving orphaned .deepfates.com fragments.
@deepfates deepfates merged commit b9ff5a2 into main Jan 5, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant