-
Notifications
You must be signed in to change notification settings - Fork 6
[Codex] Add Rust OG metadata server for dynamic OpenGraph tags #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Codex] Add Rust OG metadata server for dynamic OpenGraph tags #4
Conversation
| } | ||
| } | ||
|
|
||
| async fn handle_request( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function, if it's the only handler, isn't horrible in a vacuum, because if you're doing things once then it's kinda whatever in terms of manual vs not. But in general you should write or use a proper axum extractor for this sort of thing. Here's the XRPC extractor from jacquard as an example. There's also extractors for query and path params in axum you can use, that'll handle much of this for you https://docs.rs/axum/latest/axum/extract/struct.Query.html https://docs.rs/axum/latest/axum/extract/struct.Path.html. These will tend to be more efficient in terms of handling, and they'll better distinguish between "structural error in the request" versus other issues.
| text.replace(['\n', '\r'], " ") | ||
| } | ||
|
|
||
| fn escape_html_attr(text: &str) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the function above it are a classic LLM-ism, reimplementing a thing in a slapdash way inline, when there's a dependency they could bring in that does it well, but that's slightly more friction. https://crates.io/crates/urlencoding for example. The replacing of the newlines is functional as is, if you're okay with a \r\n turning into (two spaces). What might look better as output would be something like this, which is a bit more complex, but won't double spaces for windows newlines and still only allocates just the once at the end when it re-joins, everything before that just borrows from the original string slice.
text.lines().map(|l| l.trim_end_matches('\r')).collect::<Vec<_>>().join(" ")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, to be fair to Codex, I did explicitly say this at the end of the instructions:
"Try to minimize the total number of crate dependencies if possible, within reason" 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming from Ruby, it kinda terrifies me that this absolutely minimal service has already ~3x as many dependencies listed in Cargo.lock (including some stuff for Windows??) than the Gemfile.lock in my feed server…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally fair. Rust dependencies are different that way, they don't really impact binary size or runtime performance the same way (compilation performance, well, ...). And the Cargo.lock includes basically everything that is in the tree, even if it's never really used in the final binary (perhaps because it's the wrong platform, or because it's gated from being compiled in by a crate feature). Linking and optimization basically drop anything that's not actually used from the resulting binary, among many other things.
Obviously there's good reasons to minimize dependencies, but equally if there's a known-good implementation of something basic, then there's little reason to re-implement locally in Rust, versus depending on the library implementation.
Rust also has a much more minimal standard library, quite deliberately. The big gap is obviously that it doesn't ship an async runtime (what tokio is), and there's been tons of arguments over that decision, but in general because the Rust standard library has very strict requirements in terms of maintaining compatibility even across editions, they're very careful about what they add to it, because then it becomes subject to those requirements and can't evolve like it might need to. That's why there's not, like, a stable randomness source in the standard library, and why std::time is quite limited (with the latter also being partially because std has to implement all this for every single target that's remotely reasonable (which is a hell of a lot), and potentially be possible to implement in an OS-agnostic way for basically anything capable of running an operating system, and how different operating systems do time is...messy.
They iirc took what is now the rand crate out of std before or around 1.0 because they weren't entirely happy with it, and knew it needed to evolve more, and the time crate was similar, though they kept some core critical stuff in std::time. That's also why Rust's Path and PathBuf types have some interesting limitations in terms of what they expose, because there's actually surprisingly little overlap in terms of how Windows and Unix-likes handle file paths, and so something that needs to provide a consistent interface for either can't necessarily expose all the stuff in a coherent way, not without causing footgun opportunities, and Rust is broadly against providing unexpected opportunities to introduce bullets to your feet at speed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, makes sense. I mean, languages like Python or Ruby also run on many platforms and implement things like time and paths there, but I suppose maybe on fewer platforms and with less strictness and backwards compatibility... (Ruby can add breaking changes even in a minor version).
I was surprised that it made me include a crate (chrono) just to print "[2026-02-05 11:22:33] ..." to the log, does std::time not do formatting like that?
Is this set of crates here (tokio, hyper, reqwest, serde) a good "default" set to use for things like this and comparable small services at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally i would recommend something like axum over straight hyper, as axum is higher level and easier to use. but yeah my normal "this is a web service" dependency list generally starts with tokio, axum, reqwest, serde, chrono, and miette (bc i like my pretty errors).
as an aside, rust runs without modification on stuff python and ruby can't. micropython is actually very different and much more limited than normal python, for example. whereas the rust that runs on the same microcontrollers can include the std library sometimes if someone's taken the time to implement it. and rust also is more honest (see the classic Amos post about lies go makes you believe about paths, which are generally the same as the lies almost every non-rust (C and C++ also are a bit more honest here, mostly through lack of guarantees) language wants you to believe about paths and so on) about what's actually available and comparable cross-platform versus kinda glossing it over.
og_server/src/main.rs
Outdated
| Ok((post.author.handle, normalize_text(&text))) | ||
| } | ||
|
|
||
| async fn resolve_handle(client: &Client, handle: &str) -> Result<String, ()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the function above will work well enough, but I am going to be selfish and direct you to jacquard's identity resolver, which will just work (the axum example actually shows it off). Bluesky's appview has a couple weird edge cases it handles badly that I know of. Jacquard falls back either to the bluesky appview or to Slingshot, depending on which you pick (the example uses Slingshot), but it will also cache resolutions locally in memory if you enable the crate feature, save you a bunch of requests. And you can use the resolver as well to just make the xrpc call for getting posts without needing to handle the other housekeeping around serialization. You'll also be able to check if a handle is valid easily and match/dispatch based on that.
| }; | ||
|
|
||
| let at_uri = format!("at://{}/app.bsky.feed.post/{}", did, rkey); | ||
| let encoded_uri: String = form_urlencoded::byte_serialize(at_uri.as_bytes()).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do a couple of extra stages of allocating and such here that are pretty redundant. you could, if you don't want to add another dependency, just do this by using the Url crate to construct it all nicely, and it'll handle everything (https://docs.rs/url/latest/url/struct.Url.html#method.parse_with_params).
| html.replacen(META_CHARSET_LINE, &insert, 1) | ||
| } | ||
|
|
||
| fn add_meta_after_title(html: &str, meta: &str) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really a good way to do this. I get why it's this way, because you're reading in the html file from a path passed in an environment variable, and string manipulation here will work, but it's kinda fragile and not terribly efficient. Something like the html crate will provide some good guardrails if you just want to read in the html as is and screw around with it, or you can use a templating library like Askama (if you know the contents at compile-time) or minijinja (which will work loading it at runtime as well).
orual
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some comments on some stylistic and structural issues I noticed.
Motivation
User-Agentcheck.index.htmlin memory.Description
og_serverthat launches an HTTP server (usinghyper/tokio) and readsindex.htmlinto memory on startup, configurable viaPORTandINDEXenvironment variables.<meta name="robots" content="noindex, nofollow">after themeta charsetline and returns the unmodified HTML immediately for likely browsers detected by User-Agent.q=(Bluesky post URL),author+post,hash,quotes, andpage(withmode=likes), resolving handles withcom.atproto.identity.resolveHandleand fetching posts withapp.bsky.feed.getPostsas needed.Starting server on http://localhost:{port}are implemented.hyper,tokio,reqwest,serde,serde_json, andurlinog_server/Cargo.toml.Testing
Codex Task