-
-
Notifications
You must be signed in to change notification settings - Fork 20
create internal links & improved sanitizeUrl for non-english characters #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👷 Deploy request for musical-croissant-a0db2d pending review.Visit the deploys page to approve it
|
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
WalkthroughThe changes in this pull request involve enhancements to the functionality of processing Notion posts in the Changes
Sequence Diagram(s)sequenceDiagram
participant A as Notion Database
participant B as astronot.js
participant C as links.mjs
participant D as sanitize.mjs
participant E as File System
A->>B: Fetch posts
B->>B: Transform blocks to markdown
B->>C: Convert internal links
C->>B: Return modified markdown
B->>D: Sanitize URLs
D->>B: Return sanitized URLs
B->>E: Write markdown to file system
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (2)
src/helpers/sanitize.mjs (1)
6-9: Consider using a comprehensive character mapping for special charactersThe current implementation only handles a few German characters. Consider using a more comprehensive mapping object for better maintainability and coverage of other languages.
+const SPECIAL_CHARS_MAP = { + 'ä': 'ae', 'ö': 'oe', 'ü': 'ue', 'ß': 'ss', + 'á': 'a', 'é': 'e', 'í': 'i', 'ó': 'o', 'ú': 'u', + 'à': 'a', 'è': 'e', 'ì': 'i', 'ò': 'o', 'ù': 'u', + // Add more mappings as needed +}; + export function sanitizeUrl(str) { return str .normalize('NFD') - .replace(/[\u0300-\u036f]/g, "") + .replace(/([^\u0300-\u036f]|\n)/g, "$1") .toLowerCase() - .replace('ä', 'ae') - .replace('ö', 'oe') - .replace('ü', 'ue') - .replace('ß', 'ss') + .replace(/[äöüßáéíóúàèìòù]/g, char => SPECIAL_CHARS_MAP[char] || char) .replace(/[^a-z0-9]+/g, "-") .replace(/(^-|-$)+/g, ""); }src/helpers/links.mjs (1)
39-48: Add logging for unmatched internal linksConsider adding debug logging when internal links can't be matched to help diagnose issues.
// find the page id in the pages array const page = pages.find(page => page.id === targetPageId); // if the page exists, replace the link with the page url if (page) { // this seems to be a bug in the original code. Some links have the title "link_to_page" if (title === "link_to_page") { title = page.title; } mdString = mdString.replace(link, `[${title}](/posts/${page.slug})`); + } else if (targetPageId) { + console.warn(`No matching page found for internal link: ${targetPageId}`); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
src/astronot.js(2 hunks)src/helpers/links.mjs(1 hunks)src/helpers/sanitize.mjs(1 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
src/helpers/sanitize.mjs
[error] 4-4: A character class cannot match a character and a combining character.
A character and a combining character forms a new character. Replace the character class with an alternation.
(lint/suspicious/noMisleadingCharacterClass)
| .normalize('NFD') // decompose combined graphemes into base characters and diacritical marks | ||
| .replace(/[\u0300-\u036f]/g, "") // remove diacritical marks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the character class regex for diacritical marks removal
The current regex /[\u0300-\u036f]/g for removing diacritical marks has a potential issue with matching combined characters. Consider using a more robust approach.
- .normalize('NFD') // decompose combined graphemes into base characters and diacritical marks
- .replace(/[\u0300-\u036f]/g, "") // remove diacritical marks
+ .normalize('NFD')
+ .replace(/([^\u0300-\u036f]|\n)/g, "$1") // preserve base characters while removing combining marksCommittable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Biome (1.9.4)
[error] 4-4: A character class cannot match a character and a combining character.
A character and a combining character forms a new character. Replace the character class with an alternation.
(lint/suspicious/noMisleadingCharacterClass)
| if (slug.startsWith("/") && slug.length === 33) { | ||
| // remove the leading "/" from the slug and add "-" like XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX | ||
| targetPageId = slug.substring(1, 9) + "-" + slug.substring(9, 13) + "-" + slug.substring(13, 17) + "-" + slug.substring(17, 21) + "-" + slug.substring(21); | ||
| } | ||
| // block links are like XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX without the leading "/" | ||
| else if (slug.length === 36) { | ||
| targetPageId = slug; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Simplify UUID formatting logic using regex
The current implementation manually formats the UUID string. Consider using regex for a more maintainable solution.
- if (slug.startsWith("/") && slug.length === 33) {
- // remove the leading "/" from the slug and add "-" like XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
- targetPageId = slug.substring(1, 9) + "-" + slug.substring(9, 13) + "-" + slug.substring(13, 17) + "-" + slug.substring(17, 21) + "-" + slug.substring(21);
- }
- // block links are like XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX without the leading "/"
- else if (slug.length === 36) {
- targetPageId = slug;
- }
+ const uuidPattern = /^(?:\/)?([0-9a-f]{8})([0-9a-f]{4})([0-9a-f]{4})([0-9a-f]{4})([0-9a-f]{12})$/i;
+ const match = slug.match(uuidPattern);
+ if (match) {
+ targetPageId = match.slice(1).join('-');
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (slug.startsWith("/") && slug.length === 33) { | |
| // remove the leading "/" from the slug and add "-" like XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX | |
| targetPageId = slug.substring(1, 9) + "-" + slug.substring(9, 13) + "-" + slug.substring(13, 17) + "-" + slug.substring(17, 21) + "-" + slug.substring(21); | |
| } | |
| // block links are like XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX without the leading "/" | |
| else if (slug.length === 36) { | |
| targetPageId = slug; | |
| } | |
| const uuidPattern = /^(?:\/)?([0-9a-f]{8})([0-9a-f]{4})([0-9a-f]{4})([0-9a-f]{4})([0-9a-f]{12})$/i; | |
| const match = slug.match(uuidPattern); | |
| if (match) { | |
| targetPageId = match.slice(1).join('-'); | |
| } |
| const links = mdString.match(/\[.*?\]\(.*?\)/g); | ||
| if (!links) { | ||
| return mdString; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for malformed markdown links
The current regex pattern might fail with malformed markdown links. Consider adding error handling and using a more robust pattern.
export function convertInternalLinks(mdString, pages) {
+ if (!mdString || typeof mdString !== 'string') {
+ throw new Error('mdString must be a non-empty string');
+ }
+ if (!Array.isArray(pages)) {
+ throw new Error('pages must be an array');
+ }
+
// find all link url in markdown
- const links = mdString.match(/\[.*?\]\(.*?\)/g);
+ const links = mdString.match(/\[([^\]]*)\]\(([^)]+)\)/g);
if (!links) {
return mdString;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const links = mdString.match(/\[.*?\]\(.*?\)/g); | |
| if (!links) { | |
| return mdString; | |
| } | |
| if (!mdString || typeof mdString !== 'string') { | |
| throw new Error('mdString must be a non-empty string'); | |
| } | |
| if (!Array.isArray(pages)) { | |
| throw new Error('pages must be an array'); | |
| } | |
| // find all link url in markdown | |
| const links = mdString.match(/\[([^\]]*)\]\(([^)]+)\)/g); | |
| if (!links) { | |
| return mdString; | |
| } |
| let { parent: mdString } = n2m.toMarkdownString(mdblocks); | ||
|
|
||
| const estimatedReadingTime = readingTime(mdString || '').text; | ||
|
|
||
| mdString = convertInternalLinks(mdString, pages); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for link conversion and update reading time calculation
The reading time should be calculated after converting internal links, and error handling should be added for the conversion process.
const mdblocks = await n2m.pageToMarkdown(page.id);
let { parent: mdString } = n2m.toMarkdownString(mdblocks);
- const estimatedReadingTime = readingTime(mdString || '').text;
+ try {
+ mdString = convertInternalLinks(mdString, pages);
+ } catch (error) {
+ console.error(`Error converting internal links for page ${page.id}:`, error);
+ // Continue with original mdString
+ }
- mdString = convertInternalLinks(mdString, pages);
+ const estimatedReadingTime = readingTime(mdString || '').text;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let { parent: mdString } = n2m.toMarkdownString(mdblocks); | |
| const estimatedReadingTime = readingTime(mdString || '').text; | |
| mdString = convertInternalLinks(mdString, pages); | |
| let { parent: mdString } = n2m.toMarkdownString(mdblocks); | |
| try { | |
| mdString = convertInternalLinks(mdString, pages); | |
| } catch (error) { | |
| console.error(`Error converting internal links for page ${page.id}:`, error); | |
| // Continue with original mdString | |
| } | |
| const estimatedReadingTime = readingTime(mdString || '').text; |
internal links from one imported notion page to another are converted to Markdown like
[link text](/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX)for inline links and[link text](XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX)for pages linked as separate blockwhrer XXXX... are the page ids.
In this pull request I extract the page ids, check if they belong to a synced page and replace by /posts/
Summary by CodeRabbit
New Features
Bug Fixes
Documentation