Conversation
…nto 431-website-scraper
WalkthroughThis pull request integrates a new "website" platform across various modules. It updates the dependency version of Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client
participant M as ModuleService
participant W as WebsiteCoreService
participant T as TemporalWebsiteService
participant P as PlatformService
C->>M: Request update for Hivemind module (Website platform)
M->>W: Invoke createWebsiteSchedule(platformId)
W->>T: Call createSchedule(platformId)
T-->>W: Return scheduleId
W-->>M: Provide scheduleId
M->>P: Retrieve platform by ID
P-->>M: Return platform details
M->>P: Update platform metadata with scheduleId
P-->>M: Confirm platform update
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (8)
src/services/temporal/website.service.ts (4)
11-25: Consider user-configurable scheduling.
By tying the schedule to the current UTC date/time (lines 13–18), every new schedule adopts the weekday, hour, and minute at which the code is invoked. If your use case eventually requires more flexible scheduling or user-defined intervals, you may want to extract this logic (e.g., to environment variables or request parameters) or allow day-of-week/hour overrides.
46-48: Consider using a custom error or logging context.
Currently, the catch block rethrows a standardError. If desired, wrap it in your consistent error-handling strategy (similar toApiError) to streamline error reporting and troubleshooting across the codebase.
51-55: Pause schedule error-handling.
Interacting with a non-existent or already-paused schedule might throw runtime errors. If you need to handle or ignore those specific cases gracefully, consider adding a try/catch block.
57-61: Ensure schedule deletion is idempotent.
Similar to pausing, attempting to delete a non-existent schedule can throw. If desired, handle or log such errors more gracefully.src/services/module.service.ts (2)
66-73: Validate or safeguard platform data.
Your logic checks forupdateBody.options.platforms[0].name == undefinedand updates metadata accordingly. While functional, consider validating this data structure more robustly (e.g., ensuring the array is not empty, verifying metadata shape) to avoid potential runtime errors and improve maintainability.🧰 Tools
🪛 Biome (1.9.4)
[error] 66-66: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
80-85: Handle potential errors when creating website schedule.
In this block, you invokewebsiteService.coreService.createWebsiteScheduleand then attempt to save the resulting schedule ID to the platform’s metadata. IfgetPlatformByIdreturnsnullor schedule creation fails, the subsequent code may silently do nothing. A dedicated try/catch here would allow you to manage failures (e.g., logging or reverting partial updates).Do you want me to generate an example refactor that adds local error handling for schedule creation or platform retrieval?
src/services/website/core.service.ts (1)
21-28: Unify error handling strategy.
You useApiError(590, ...)for failed schedule deletions, which is consistent with your create method. This is good for capturing system-level issues. Consider standardizing on a narrower error code or HTTP status if that suits your application’s design (e.g., 404 for a non-existent schedule).src/validations/platform.validation.ts (1)
222-231: Consider adding validation for update-specific fields.The update metadata schema could include additional fields specific to updating website resources, such as:
- Last scrape timestamp
- Success/failure metrics
- Resource status
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (12)
package.json(1 hunks)src/docs/module.doc.yml(2 hunks)src/docs/platform.doc.yml(3 hunks)src/services/index.ts(2 hunks)src/services/module.service.ts(2 hunks)src/services/platform.service.ts(2 hunks)src/services/temporal/discourse.service.ts(1 hunks)src/services/temporal/website.service.ts(1 hunks)src/services/website/core.service.ts(1 hunks)src/services/website/index.ts(1 hunks)src/validations/module.validation.ts(3 hunks)src/validations/platform.validation.ts(4 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/services/website/index.ts
🧰 Additional context used
🪛 Biome (1.9.4)
src/validations/module.validation.ts
[error] 117-117: Do not add then to an object.
(lint/suspicious/noThenProperty)
src/validations/platform.validation.ts
[error] 147-147: Do not add then to an object.
(lint/suspicious/noThenProperty)
[error] 151-151: Do not add then to an object.
(lint/suspicious/noThenProperty)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci / lint / Lint
🔇 Additional comments (14)
src/services/temporal/website.service.ts (2)
27-45: Check handling for existing schedules.
When creating a schedule with a derived “website/” ID, if a schedule with the same ID already exists, a conflict might arise. Confirm whether you want to overwrite, update, or fail in these cases. Temporal’sclient.schedule.createcan raise errors if the ID is in use.
64-64: Singleton export confirmed.
Exporting a single instance ofTemporalWebsiteServicesupports straightforward usage across the codebase. Looks good!src/services/website/core.service.ts (2)
9-19: Confirm schedule collision behaviors.
When callingtemporalWebsite.createSchedule, you log the schedule ID and trigger it. If a schedule with the same ID already exists, or if any concurrency constraints fail, the request might raise exceptions that differ from typical flow. Ensure these potential collisions align with your larger application strategy (e.g., whether to update existing or fail fast).
30-33: Core service exports look consistent.
Your default export collects both scheduling functions in one place, keeping them easily discoverable. Great job!src/services/index.ts (1)
16-16: LGTM!The addition of
websiteServicefollows the established pattern for service imports and exports.Also applies to: 40-40
src/services/temporal/discourse.service.ts (1)
1-8: LGTM! Logger integration enhances observability.The addition of logging configuration follows best practices by:
- Using a child logger with module context
- Maintaining separation of concerns
src/services/platform.service.ts (2)
181-182: LGTM! Added support for website platform metadata.The metadata key 'resources' aligns with the metadata schema defined in the API documentation.
215-215: LGTM! Enhanced platform connection validation.Added platform name check to prevent duplicate connections across communities.
package.json (1)
31-31:Details
❓ Verification inconclusive
Verify the updated dependency version.
Let's check if version 3.3.0 of
@togethercrew.dev/dbexists and if there are any security advisories.
🏁 Script executed:
#!/bin/bash # Description: Check for security advisories and latest versions of the `@togethercrew.dev/db` package. # Check npm for latest versions npm view @togethercrew.dev/db versions --json # Check for security advisories npm audit @togethercrew.dev/db@3.3.0Length of output: 1340
Dependency Version Confirmed – Manual Security Review Needed
- The available versions list confirms that version 3.3.0 of
@togethercrew.dev/dbexists.- The npm audit check did not complete due to a missing lockfile (error ENOLOCK). Please create a lockfile (e.g., using
npm i --package-lock-only) and re-run the audit manually to verify that no security advisories affect version 3.3.0.src/docs/module.doc.yml (2)
172-173: LGTM! Added website platform to module API.The platform enum has been updated to include the new website platform.
240-242: LGTM! Added metadata description for website platform.The metadata description for the website platform is consistent with the schema defined in platform.doc.yml.
src/docs/platform.doc.yml (3)
72-72: LGTM! Added website platform to platform creation API.The platform enum has been updated to include the new website platform.
199-208: LGTM! Added metadata schema for website platform.The metadata schema for the website platform:
- Requires 'resources' array with URI format.
- Consistent with module API documentation.
241-242: LGTM! Added website platform to platform retrieval API.The platform enum has been updated to include the new website platform.
| const websiteMediaWikiMetadata = () => { | ||
| return Joi.object().keys({}); | ||
| }; |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Consider enhancing website metadata validation.
The empty object schema for website metadata might be too permissive. Consider adding validation for essential website-related fields such as:
- Base URL
- Scraping configuration
- Rate limiting parameters
| const websiteUpdateMetadata = () => { | ||
| return Joi.object().keys({ | ||
| resources: Joi.array().items(Joi.string().uri({ scheme: ['http', 'https'] })), | ||
| }); | ||
| }; |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Enhance website resource validation with additional safeguards.
While URI validation is good, consider adding:
- Rate limiting parameters to prevent aggressive scraping
- Allowed domains validation to prevent unauthorized access
- Maximum number of resources limit
Example enhancement:
const websiteMetadata = () => {
return Joi.object().keys({
resources: Joi.array()
.items(Joi.string().uri({ scheme: ['http', 'https'] }))
+ .max(100) // Prevent excessive resource lists
.required(),
+ rateLimit: Joi.object().keys({
+ requestsPerMinute: Joi.number().min(1).max(60).required(),
+ concurrency: Joi.number().min(1).max(10).required()
+ }).required(),
+ allowedDomains: Joi.array().items(Joi.string().domain()).required()
});
};Also applies to: 101-107
Summary by CodeRabbit