-
Notifications
You must be signed in to change notification settings - Fork 0
Description
🧠 Screenshots, AI, and a Style Guide That Actually Learns: How I Accidentally Built a Taste Engine While Redesigning My Docs Site
TL;DR: I tried to redesign the [JODS](https://github.com/clamstew/jods) docs site. I ended up with:
✅ automated design feedback
✅ screenshot-based regression testing
✅ live memory of past mistakes
✅ async AI feature prototyping
✅ and a workflow I never want to build without again.
🧩 The Problem: Cursor Was Copying My Old (Bad) Code
At first, I just wanted to give the JODS docs a facelift. New header. Cleaner footer. Maybe some fun Remix stuff and a playground.
But something weird kept happening...
Every time I added a new section, Cursor would suggest styles I hated.
Not because it was wrong — but because it was doing its job: mirroring the rest of my codebase.
Unfortunately, the rest of my codebase had... questionable gradients 😬.
I wasn’t making fresh design choices — I was inheriting them from past me.
🎯 Step 1: The Screenshot Bot Awakens
It all really started with a mess of CSS.
I was throwing styles into custom.css, jamming huge blobs into React components, and just kind of hoping it’d all work. And it did... but it also made the AI worse at writing code. The more spaghetti it saw, the more spaghetti it wrote.
So I started refactoring. And I said:
“Hey AI, while you’re cleaning this up... go ahead and install Playwright, take screenshots after each change, and let’s make sure we’re not visually breaking stuff, cool?”
Boom 💥 — screenshot utility born.
Imagine an AI agent refactoring your code, then agentically using a screenshot util and html diffing tool it built to make sure that the changes it just made didn't cause a visual regression and if it did, it would fix it and then go through that loop again.
📸 Step 2: Visual Regression with Playwright (But Make It Human)
With Playwright wired in, I started getting side-by-side diffs after each change:
🖼️ Before on the left, 🖼️ After on the right.
If padding shifted or a font got weird — I’d know.
But here's the trick: I didn’t just want diffs. I wanted context.
So I made folders like this:
iteration-2024-07-01/
🔍 screenshot-home.png
📁 html-dump/
📄 notes.md
Timestamps tied it all together. Every change had a trail:
- the code
- the visual output
- the reason why
😣 Step 3: Speak Now, Markdown Later
Fun fact: most of my notes.md files started as voice-to-text using Mac's built-in speech recognition. I'd ramble:
"Ugh… spacing on Remix card still sucks, maybe drop margin-top by 8px…"
Cursor and GPT would pick it up, clean it into Markdown, and by the time I came back — I had a design log that actually made sense.
This made everything searchable, shareable, and most importantly, remembered by the AI in future interactions.
⚙️ Step 4: Cursor Rules Enter the Chat
Eventually, I noticed the same themes kept showing up:
“Avoid gradients.”
“Don’t stack three buttons in a row.”
“Keep sidebar padding consistent.”
So I started formalizing them:
# .cursor/rules/no-gradient.mdc
description: Gradients are banned. Flat colors only. This is war.
globs: ["**/*.tsx", "**/*.css"]Now every time Cursor or GPT proposed a change, it had a mini style guide in its ears.
No more ghost gradients.
No more chaos buttons.
Just ✨ taste ✨.
🔀 The Loop: Build → Screenshot → Notes → Reflect → Refine
Once this was humming, the loop felt incredible:
- Suggest making design changes to a section
- Auto-generate screenshots
- Capture feedback (me + AI)
- Store it all in a timestamped folder
- Feed that into GPT/Claude
- Improve everything
It was collaborative. It was structured. It was ✨ self-correcting ✨.
Cursor started learning what I wanted. Claude remembered what didn’t work last time.
It was like having a junior designer with perfect memory.
🧪 Async Dev with GPT: Deep Research, Real Progress
Here’s where it got cool.
While out living my life — walking, parenting, grabbing tacos 🌮 — I could fire up GPT-4’s Deep Research from my phone.
I'd say:
“Go look at this GitHub issue on
.persist(), read the thread, summarize the plan, and sketch some test cases.”
By the time I sat down again, GPT had done the work.
It was like having a ghost coworker working the night shift 👻.
And when I committed the actual code? Cursor helped me auto-generate commit messages based on the diff — clean, descriptive, and consistent.
🎨 Coloring Books, But Make It Dev
I used to make custom coloring books for my daughter with GPT.
She had rules:
- No backgrounds
- Simple lines
- Just the character she wanted
But GPT would always throw in surprises — shadows, sparkles, text boxes.
So I refined the prompt.
And refined it again.
Sound familiar?
Just like with docs design, I had to build guardrails, clarify intent, and iterate with feedback.
Both projects taught me the same thing:
You don’t prompt once.
You build taste over time.
🧠 Taste Isn’t a Talent — It’s a System
This whole setup — screenshots, rules, notes, GPT-as-researcher — wasn’t just about making the docs prettier.
It turned into a framework for design decisions that evolve.
It turned into a memory system.
It turned into a taste engine.
And now, every time I build something new — I bring this whole system with me.
🛠️ Want to Try the Stack?
Here’s what I used (and loved):
- [🧠 Cursor](https://cursor.sh) — where most of the editing, rules, and feedback live
- [📸 Playwright](https://playwright.dev) — for screenshot testing and diffing
- [🤖 GPT-4 Deep Research + Claude](https://openai.com/gpt-4) — async thinking buddies
- [📅 Markdown logs + voice-to-text](https://support.apple.com/guide/mac-help/dictate-text-mchlp2591/mac) — lightweight, human-first context
- [📦 JODS](https://github.com/clamstew/jods) — the reactive state library this site was for
🏑️ TL;DR (One More Time)
Design like a dev.
Build feedback into your tools.
Use screenshots. Use rules. Use logs.
Let AI iterate with you — not just code for you.
You’ll ship better UIs.
You’ll build smarter systems.
And hey, you might even enjoy it.
longer form
Building Taste into the Docs: How I Turned Design Feedback into a Workflow for JODS
When I set out to redesign the [JODS](https://github.com/clamstew/jods) docs site, the goal was simple: make it feel more modern. A better header. A footer that wasn’t embarrassing. A new section for Remix integration, a fun comparison table — the kind of small touches that make a library feel alive.
But things got messy. Quickly.
What started as a clean-up turned into a realization: I was unintentionally reusing patterns I didn’t even like. Not because I thought they were good — but because Cursor, reading the existing repo, kept suggesting them. It was reflecting back the past instead of helping me design something fresh.
So I did what any dev spiraling into an obsession would do: I built a system.
It Started With Some Bad CSS
Before any Cursor rules or markdown journaling, the real turning point was code quality.
I’d been adding new sections to the site — and, like a lot of quick projects — I was dumping styles into custom.css or embedding massive blocks of CSS straight into my React components. It worked, but it created a situation where the AI had no good examples to learn from.
The output got worse over time. Layouts became fragile. Suggestions felt random.
So I paused. Refactored. And while I was doing that, I told the AI:
“Hey — since you're helping clean this up, go ahead and install Playwright, and after each code change, take a screenshot and check for visual regressions.”
That was the birth of the screenshot utility. Not because I cared about perfect regression coverage — but because I wanted an agent to catch visual weirdness before I saw it in the browser.
From Screenshots to Feedback Loops
Once screenshots were being generated on every run, a new problem emerged: I had no way to track which screenshot belonged to which change.
So I started creating folders like:
iteration-2024-07-01/
- screenshot-home.png
- html-dump/
- notes.md
Each folder had a timestamp. Each file inside it tracked what changed, what it looked like, and what I (or the AI) thought about it. These weren’t just diffs — they were snapshots of intent.
Often, the notes weren’t typed at all — I’d use macOS’s built-in voice-to-text, just rambling ideas aloud. “Fix spacing on Remix section... header still too tall... try new color next run...” Cursor and GPT would pick those up, clean them into markdown, and that became the feedback I’d use in the next iteration.
Over time, I noticed a pattern: if I did 3–5 design passes like this, then reviewed what I liked or didn’t like, and followed it up with 1–2 refinements, I got much better designs than I would have landed on the first time. Plus, I never saw the same mistake twice — because I’d already explained it away in markdown.
Cursor Rules Came Later — and They Worked
Once I had all these design notes and iteration folders, Cursor started doing something smart: it noticed the patterns I’d flagged as “not great” and began incorporating those into its suggestions.
Eventually, I wrote proper .cursor/rules/ files — things like:
# .cursor/rules/no-gradients.mdc
description: Avoid using gradients. Stick to flat colors.
globs: ["docs/**/*.tsx", "docs/**/*.css"]These weren’t just lint rules. They were design memory.
Cursor started using them as part of its context when proposing edits. It was like training a junior designer who slowly internalized your preferences — and then stopped making the same old mistakes.
Playwright Became the Truth Teller
By now, Playwright wasn’t just a novelty — it was critical. It was my objective reviewer. After every change, it captured screenshots of the homepage, docs pages, comparison grid — everything.
If something looked wrong, it flagged it.
If nothing changed, it passed.
If I intended a change, I updated the baseline and moved on.
Each screenshot had a timestamp. The folder it lived in had a timestamp. The note about the change referenced that timestamp. It all synced up — and the result was a self-documenting feedback loop that didn’t just track what changed, but why.
Context Building Over Time
Here's what my workflow evolved into:
- Make a change
- Save screenshots, HTML output, and notes in a timestamped folder
- Feed it all into Cursor or Claude
- Get an opinion from the AI based on actual history
- Apply refinement
- Repeat
This process was more than just design iteration. It was taste iteration — not just refining the code, but refining how the code thinks about design. Cursor learned what I wanted. Claude remembered what I said three iterations ago. Markdown kept it all visible and searchable.
Even when I deleted older folders to save space, I had the artifact trail. Nothing was lost — the decisions stayed in the logs.
Async Dev, for Real
One of my favorite moments was realizing I could make real progress while away from my computer.
Using GPT-4's Deep Research mode, I’d queue up tasks on my phone:
- "Read this GitHub issue"
- "Summarize the PR discussion"
- "Compare it to how other libraries do
persist()" - "Draft a possible implementation with tests"
I'd come back to my laptop later and find that the model had done its homework — often giving me better ideas than I had going in.
When I did ship code, I'd use AI to generate commit messages based on the diff — making them more consistent, more informative, and easier to parse later when the AI had to look back and understand why something changed.
This wasn’t just pair programming. It was asynchronous collaboration with a memory.
The Coloring Book Connection
Weirdly enough, this all reminded me of something very different: making GPT-powered coloring book pages for my daughter.
She had strict requests — plain backgrounds, specific characters, no extra stuff. And yet, GPT would always sneak in something weird. So I’d refine. “Try again. Less shadow. No border.” Slowly, we’d get it right.
Same process here.
Just with gradients instead of unicorns.
The lesson? Taste takes iteration. And AI needs constraints — the right ones, in the right order, with feedback that evolves.
Final Thoughts
I didn’t start this project to invent a system.
I just wanted a better docs site.
But through screenshots, markdown, Cursor rules, and async research, I ended up with something way more powerful:
🧠 A design memory
🎯 A feedback loop that improves over time
🛠️ Tools that know what I want
⏱️ Progress that continues while I’m not working
This wasn’t just “prompt engineering.”
This was building a second brain for visual iteration.
And now that I’ve seen how it works, I can’t imagine designing without it.
Would you like me to go ahead and rewrite the breezy, emoji-filled blog post version now using this same feedback — and maybe sprinkle in a few bonus sections or dev memes to match the vibe?
longer longer form
Building the JODS Documentation: Iterative Design, AI Tools, and Visual Regression Testing
The project began with a complete redesign of the JODS documentation site’s layout. The site uses Docusaurus (a React-based static site generator), which means most content lives in Markdown (MDX) files. In practice, I rewrote the site header (navigation bar and logo), simplified the footer, and refreshed the homepage marketing sections. For example, I created new MDX pages for the landing content and adjusted the header component to match the updated branding. Because Docusaurus “uses Markdown as its main content authoring format”, authoring these changes was a matter of updating .mdx and React components together. Early iterations focused on cleaning up color schemes, typography, and layout to unify the look and feel across all pages. These initial changes set the baseline for the documentation’s new visual style.
Automating Style Consistency with Cursor Rules
As I iterated on these designs, certain style mistakes kept reappearing. For example, I found myself accidentally re-using unwanted gradients or inconsistent padding in several places. To avoid repeating these errors, I turned to Cursor AI’s rules system and wrote custom rule files. Cursor lets you encode project-specific guidelines as rule files in .cursor/rules/. I created rules that acted like linting feedback for design – for instance, a rule might state “always use flat colors; do not use background gradients.” Whenever I asked Cursor (or chatted with GPT/Claude via Cursor) about a page design, these rules would automatically be prepended to the prompt as guidance. In effect, they “standardize style or architecture decisions” across the project. Every time I reviewed a change, the AI would flag violations (e.g. “Gradient detected, which is discouraged by project rules”), helping me catch recurring issues early. Over time, the Cursor rules ensured that the visual feedback loop itself followed the evolving style guide, preventing old mistakes from creeping back in.
Logging Decisions in Markdown
To keep track of what I was trying and why, I made Markdown notes a central part of the workflow. For each design iteration, I wrote a dated markdown file summarizing the changes, context, and intent. Since the docs are Markdown-based, this felt natural – I could version-control my notes alongside the site content. For example, in iter-2024-06-01/notes.md I documented “Adjusted button color to dark blue (#1234ab) per branding” or “Removed gradient from header background after feedback”. These logs became the project’s memory: I could search them to recall why a design decision was made. Over time, the collection of markdown feedback formed a living journal of the site’s evolution. Whenever I collaborated with an AI assistant, I would often paste relevant excerpts from these logs into the prompt, so the model understood my past decisions. In short, the markdown files linked each iteration together, preserving design intent and context across time.
Visual Regression Testing with Playwright
To verify that changes didn’t break the layout, I built a screenshot-based regression test suite using Playwright. Playwright Test can take full-page screenshots and automatically compare them to baseline images. On the first run of a test, Playwright captured reference screenshots for each page; subsequent runs compared live pages to these saved images. If something changed visually, Playwright reported a test failure and generated a side-by-side diff.
Setting up Playwright also involved automating it across all pages. I wrote a small crawler (using Playwright’s Global Setup) to enumerate every URL in the Docusaurus sitemap, and then dynamically generated one test(page).toHaveScreenshot() call per page. This way, I had full coverage: each page’s new layout was compared to its own baseline image.
By keeping a library of reference screenshots, I could automatically verify that the app remained visually correct. When I implemented a design tweak that I intended to keep, I simply updated the baseline snapshot. Otherwise, any unintended change would pop up as a failure to be reviewed and fixed.
Organizing Iteration Artifacts
Each time I ran the regression tests, I archived the results with timestamps and context. In practice this meant creating a folder like iteration-2024-06-15/ containing that run’s outputs: the new HTML build artifacts, the full set of screenshots, the diff images, and a Markdown log of what was done. By grouping assets this way, I preserved a complete snapshot of each iteration. If later something broke, I could roll back to a previous folder and see exactly how the site looked and what I changed. I often annotated the folders with short notes (e.g. “fixed nav margin”), effectively creating a timeline of artifacts. In other words, my workflow itself became traceable: I had the visual history (screenshots and diffs) and textual history (markdown notes and commit messages) side by side. This structure made it easy to audit changes over time or to resurrect a prior version if needed.
Cursor and LLMs: A Dynamic Feedback Loop
With the infrastructure in place, I used both Cursor and external LLMs to evaluate and summarize iterations. For instance, after a few design passes, I would prompt the AI with the recent markdown notes and ask for an assessment: “What visual inconsistencies remain?” or “Suggest improvements based on current feedback.” Cursor’s chat mode (using either GPT or Anthropic’s Claude) would scan the rules and the notes, then highlight things I might have overlooked (e.g. “the footer links still use a smaller font, which may be hard to read”). Because Cursor included my custom rules in context, the advice automatically adhered to our style guidelines.
I also experimented with asking the LLM to summarize multiple iterations at once. By providing several dated notes, the model could trace how a theme evolved (e.g. tracking how we iterated on color choices). Over time, this loop helped me refine my own taste: the AI acted like a consistent reviewer, recalling our past decisions. In effect, the combination of Cursor rules and LLM chat gave the project a kind of memory – it wouldn’t forget what was documented in those markdown logs. This collaborative process meant each cycle of design became smarter, as the AI could recall what we had already tried and use that history to guide the next steps.
Refining Taste Through Iteration
This entire workflow ended up resembling a classic iterative design process: we built something, tested it, gathered feedback, and then refined it again. Each iteration incorporated lessons from the last. I found that my own visual taste sharpened over time – I became better at spotting misalignments or color conflicts because I had clearly logged and reviewed them before. Importantly, the system itself remembered earlier feedback thanks to the logs and rules, so I didn’t accidentally undo a fix. Instead of a one-off redesign, the project was always evolving. Some mistakes (like using that old gradient) simply never resurfaced after they were documented and ruled out. In this way, our design feedback process evolved, not just the page layouts. Every pass taught me more about what worked, which in turn informed the next round of fixes. It was a feedback loop in the truest sense: build → evaluate → adjust → repeat.
Iterative Creativity: Coloring Books and Documentation
Interestingly, I saw a parallel with an unrelated project I had done with GPT: generating coloring-book pages for my daughter. In that earlier project, I iteratively refined prompts to GPT until the images matched what I wanted (for example, removing unwanted floating butterflies or bizarre textures). It was very much a trial-and-error process: I’d ask for a simple line drawing, examine the output, then tweak the instructions to eliminate any oddities. Over many iterations I learned to structure my prompts better and gradually “nudge” the style in the right direction. The documentation redesign followed the same pattern: at first the AI-assisted designs still slipped in old patterns, but with each iteration my prompts and rules got clearer. In both cases (coloring pages or doc design), well-structured context was key. The more specific instructions and background I provided (either to GPT when asking for an image, or to Cursor’s LLM about our style guide), the fewer unwanted elements crept back in. Both processes depended on a cycle of refine, evaluate, and refine again. The main difference is that for the docs, I was improving a taste-driven design workflow rather than just generating art, but the underlying idea – repeated refinement to eliminate undesired patterns – was the same.
Asynchronous AI Collaboration: Research and Commits
One of the most powerful aspects of this workflow was that work could continue even when I was offline. For example, I used GPT’s “Deep Research” tools while away from my computer to triage GitHub issues and PR discussions. I had several open issues about the core JODS methods (like sync, stream, and persist). By feeding transcripts of those issue threads and past PR commentary into GPT (or Claude), the AI could quickly review what had been discussed and suggest concise requirements or even code sketches for the features. In essence, the AI became an asynchronous collaborator: it could read the backlog of conversation and prepare a plan of action, so that when I returned to coding I had concrete guidance ready. This felt like having a co-worker who never sleeps.
I also leveraged AI to improve our commit history. Many AI coding assistants (including Cursor) offer a feature to generate commit messages from a diff. For instance, I would stage my changes and let Cursor’s Git integration or a custom script send the diff to GPT-4. It then proposed a commit message based on the changes. I found that the AI-generated messages were often more informative and consistent than what I might have typed off the cuff. I always reviewed and tweaked the suggested text, but even the first draft saved time and clarified the intent of each change. Ultimately, having AI help draft the commit log meant that every code change was accompanied by a clear, articulate summary – strengthening the project documentation. It also enforced discipline: before accepting the AI’s message, I made sure I truly understood what I had changed.
This blend of asynchronous AI assistance – whether summarizing issue conversations or drafting commit logs – kept the project momentum high. I could leave a note or a diff for GPT, continue with other tasks, and come back to refined suggestions later. Documenting decisions via AI (in the form of commit messages or PR comments) closed the loop: everything done with AI’s help was also recorded formally in the project.
This ongoing narrative captures the foundation of our process. The design and documentation will continue to evolve with each iteration, guided by the ever-improving feedback loop of rules, logs, and AI collaboration.
Sources: Standardizing style and context was enabled by Cursor rules; Docusaurus indeed relies on Markdown (MDX) for content. Playwright’s built-in screenshot testing automatically creates reference images and compares future output. Visual regression testing is broadly about comparing current UI against stored snapshots. Iterative design theory emphasizes “continuous refinement” through cycles of feedback. AI-assisted commit messages have been noted to improve project documentation by producing consistent, descriptive logs. These concepts underpin the workflow described above.