431 website scraper by Behzad-rabiei · Pull Request #433 · TogetherCrew/api

Behzad-rabiei · 2025-02-19T10:50:59Z

Summary by CodeRabbit

New Features
- Introduced a new “website” platform option, allowing users to integrate and manage website-related workflows seamlessly.
- Enabled automated scheduling capabilities for website operations, including creation, pausing, and deletion.
Documentation
- Updated API documentation with expanded platform options and enhanced metadata details to support website integrations.

…site platform

…nto 431-website-scraper

coderabbitai · 2025-02-19T10:51:08Z

Walkthrough

This pull request integrates a new "website" platform across various modules. It updates the dependency version of @togethercrew.dev/db in package.json and revises API documentation to include additional platform options such as "telegram" and "website." Service logic is extended to support website scheduling in module updates, with added methods in temporal and core website services. Validation schemas are updated to accommodate website metadata requirements. Minor logging improvements were also introduced in the temporal discourse service.

Changes

File(s)	Change Summary
`package.json`	Updated `@togethercrew.dev/db` dependency version from `^3.2.3` to `^3.3.0`.
`src/docs/{module,platform}.doc.yml`	Modified API docs to expand the `platform` enum and add new metadata descriptions for "telegram" and "website".
`src/services/{index, module.service.ts, platform.service.ts}`	Added website platform handling in module updates, including conditional scheduling via `websiteService` and updates to metadata key logic.
`src/services/temporal/{discourse.service.ts, website.service.ts}`	Introduced logger usage and added temporal scheduling methods for website operations.
`src/services/website/{core.service.ts, index.ts}`	Added core functions (`createWebsiteSchedule`, `deleteWebsiteSchedule`) for managing website schedules and centralized export of website services.
`src/validations/{module.validation.ts, platform.validation.ts}`	Enhanced metadata validation by adding functions for website-related schemas and updating platform metadata switch cases.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant M as ModuleService
    participant W as WebsiteCoreService
    participant T as TemporalWebsiteService
    participant P as PlatformService

    C->>M: Request update for Hivemind module (Website platform)
    M->>W: Invoke createWebsiteSchedule(platformId)
    W->>T: Call createSchedule(platformId)
    T-->>W: Return scheduleId
    W-->>M: Provide scheduleId
    M->>P: Retrieve platform by ID
    P-->>M: Return platform details
    M->>P: Update platform metadata with scheduleId
    P-->>M: Confirm platform update

Possibly related PRs

chore: update validation and mongo-lib package #400: Updates the @togethercrew.dev/db dependency version, which is closely related to this PR's dependency update.

Suggested reviewers

cyri113

Poem

I’m a rabbit hopping through code so light,
With website platforms now shining bright.
Schedules and metadata dance in the flow,
New methods and logging helping us grow.
In every hop I see improvements unfold—
Celebrating changes, brave and bold!

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (8)

src/services/temporal/website.service.ts (4)

11-25: Consider user-configurable scheduling.
By tying the schedule to the current UTC date/time (lines 13–18), every new schedule adopts the weekday, hour, and minute at which the code is invoked. If your use case eventually requires more flexible scheduling or user-defined intervals, you may want to extract this logic (e.g., to environment variables or request parameters) or allow day-of-week/hour overrides.

46-48: Consider using a custom error or logging context.
Currently, the catch block rethrows a standard Error. If desired, wrap it in your consistent error-handling strategy (similar to ApiError) to streamline error reporting and troubleshooting across the codebase.

51-55: Pause schedule error-handling.
Interacting with a non-existent or already-paused schedule might throw runtime errors. If you need to handle or ignore those specific cases gracefully, consider adding a try/catch block.

57-61: Ensure schedule deletion is idempotent.
Similar to pausing, attempting to delete a non-existent schedule can throw. If desired, handle or log such errors more gracefully.

src/services/module.service.ts (2)

66-73: Validate or safeguard platform data.
Your logic checks for updateBody.options.platforms[0].name == undefined and updates metadata accordingly. While functional, consider validating this data structure more robustly (e.g., ensuring the array is not empty, verifying metadata shape) to avoid potential runtime errors and improve maintainability.

🧰 Tools

🪛 Biome (1.9.4)

[error] 66-66: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

80-85: Handle potential errors when creating website schedule.
In this block, you invoke websiteService.coreService.createWebsiteSchedule and then attempt to save the resulting schedule ID to the platform’s metadata. If getPlatformById returns null or schedule creation fails, the subsequent code may silently do nothing. A dedicated try/catch here would allow you to manage failures (e.g., logging or reverting partial updates).

Do you want me to generate an example refactor that adds local error handling for schedule creation or platform retrieval?

src/services/website/core.service.ts (1)

21-28: Unify error handling strategy.
You use ApiError(590, ...) for failed schedule deletions, which is consistent with your create method. This is good for capturing system-level issues. Consider standardizing on a narrower error code or HTTP status if that suits your application’s design (e.g., 404 for a non-existent schedule).

src/validations/platform.validation.ts (1)

222-231: Consider adding validation for update-specific fields.

The update metadata schema could include additional fields specific to updating website resources, such as:

Last scrape timestamp

Success/failure metrics

Resource status

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e4877c and 23eb600.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (12)

package.json (1 hunks)
src/docs/module.doc.yml (2 hunks)
src/docs/platform.doc.yml (3 hunks)
src/services/index.ts (2 hunks)
src/services/module.service.ts (2 hunks)
src/services/platform.service.ts (2 hunks)
src/services/temporal/discourse.service.ts (1 hunks)
src/services/temporal/website.service.ts (1 hunks)
src/services/website/core.service.ts (1 hunks)
src/services/website/index.ts (1 hunks)
src/validations/module.validation.ts (3 hunks)
src/validations/platform.validation.ts (4 hunks)

✅ Files skipped from review due to trivial changes (1)

src/services/website/index.ts

🧰 Additional context used

🪛 Biome (1.9.4)

src/validations/module.validation.ts

[error] 117-117: Do not add then to an object.

(lint/suspicious/noThenProperty)

src/validations/platform.validation.ts

[error] 147-147: Do not add then to an object.

(lint/suspicious/noThenProperty)

[error] 151-151: Do not add then to an object.

(lint/suspicious/noThenProperty)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: ci / lint / Lint

🔇 Additional comments (14)

src/services/temporal/website.service.ts (2)

27-45: Check handling for existing schedules.
When creating a schedule with a derived “website/” ID, if a schedule with the same ID already exists, a conflict might arise. Confirm whether you want to overwrite, update, or fail in these cases. Temporal’s client.schedule.create can raise errors if the ID is in use.

64-64: Singleton export confirmed.
Exporting a single instance of TemporalWebsiteService supports straightforward usage across the codebase. Looks good!

src/services/website/core.service.ts (2)

9-19: Confirm schedule collision behaviors.
When calling temporalWebsite.createSchedule, you log the schedule ID and trigger it. If a schedule with the same ID already exists, or if any concurrency constraints fail, the request might raise exceptions that differ from typical flow. Ensure these potential collisions align with your larger application strategy (e.g., whether to update existing or fail fast).

30-33: Core service exports look consistent.
Your default export collects both scheduling functions in one place, keeping them easily discoverable. Great job!

src/services/index.ts (1)

16-16: LGTM!

The addition of websiteService follows the established pattern for service imports and exports.

Also applies to: 40-40

src/services/temporal/discourse.service.ts (1)

1-8: LGTM! Logger integration enhances observability.

The addition of logging configuration follows best practices by:

Using a child logger with module context

Maintaining separation of concerns

src/services/platform.service.ts (2)

181-182: LGTM! Added support for website platform metadata.

The metadata key 'resources' aligns with the metadata schema defined in the API documentation.

215-215: LGTM! Enhanced platform connection validation.

Added platform name check to prevent duplicate connections across communities.
package.json (1)

31-31:
Details

❓ Verification inconclusive

Verify the updated dependency version.

Let's check if version 3.3.0 of @togethercrew.dev/db exists and if there are any security advisories.

🏁 Script executed:
#!/bin/bash
# Description: Check for security advisories and latest versions of the `@togethercrew.dev/db` package.

# Check npm for latest versions
npm view @togethercrew.dev/db versions --json

# Check for security advisories
npm audit @togethercrew.dev/db@3.3.0
Length of output: 1340
Dependency Version Confirmed – Manual Security Review Needed

The available versions list confirms that version 3.3.0 of @togethercrew.dev/db exists.

The npm audit check did not complete due to a missing lockfile (error ENOLOCK). Please create a lockfile (e.g., using npm i --package-lock-only) and re-run the audit manually to verify that no security advisories affect version 3.3.0.
src/docs/module.doc.yml (2)

172-173: LGTM! Added website platform to module API.

The platform enum has been updated to include the new website platform.

240-242: LGTM! Added metadata description for website platform.

The metadata description for the website platform is consistent with the schema defined in platform.doc.yml.

src/docs/platform.doc.yml (3)

72-72: LGTM! Added website platform to platform creation API.

The platform enum has been updated to include the new website platform.

199-208: LGTM! Added metadata schema for website platform.

The metadata schema for the website platform:

Requires 'resources' array with URI format.

Consistent with module API documentation.

241-242: LGTM! Added website platform to platform retrieval API.

The platform enum has been updated to include the new website platform.

coderabbitai · 2025-02-19T10:57:57Z

src/validations/module.validation.ts

+const websiteMediaWikiMetadata = () => {
+  return Joi.object().keys({});
+};


🛠️ Refactor suggestion

Consider enhancing website metadata validation.

The empty object schema for website metadata might be too permissive. Consider adding validation for essential website-related fields such as:

Base URL

Scraping configuration

Rate limiting parameters

coderabbitai · 2025-02-19T10:57:58Z

src/validations/platform.validation.ts

+const websiteUpdateMetadata = () => {
+  return Joi.object().keys({
+    resources: Joi.array().items(Joi.string().uri({ scheme: ['http', 'https'] })),
+  });
+};


🛠️ Refactor suggestion

Enhance website resource validation with additional safeguards.

While URI validation is good, consider adding:

Rate limiting parameters to prevent aggressive scraping

Allowed domains validation to prevent unauthorized access

Maximum number of resources limit

Example enhancement:

const websiteMetadata = () => { return Joi.object().keys({ resources: Joi.array() .items(Joi.string().uri({ scheme: ['http', 'https'] })) + .max(100) // Prevent excessive resource lists .required(), + rateLimit: Joi.object().keys({ + requestsPerMinute: Joi.number().min(1).max(60).required(), + concurrency: Joi.number().min(1).max(10).required() + }).required(), + allowedDomains: Joi.array().items(Joi.string().domain()).required() }); };

Also applies to: 101-107

Behzad-rabiei added 8 commits February 16, 2025 13:46

chore: update togethercrew.dev/db package

cdf3d57

feat: update post platforms and patch module/:moduleid api to add web…

1bb7f2a

…site platform

fix: fix website platform creation bugs

7cac820

doc: update docs to add website platform

ebba36b

fix: fix bugs

41442ac

feat: add website temporal service

c53f23d

feat: trigger website temporal on hivemind module update

18cb677

Merge branch '428-move-reputation-score-from-nft-to-platform-route' i…

40f72b4

…nto 431-website-scraper

style: format code

23eb600

coderabbitai bot reviewed Feb 19, 2025

View reviewed changes

Behzad-rabiei merged commit 5a1b7b7 into main Feb 19, 2025
13 checks passed

coderabbitai bot mentioned this pull request Sep 9, 2025

fix: handle hivemind website case #482

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

431 website scraper#433

431 website scraper#433
Behzad-rabiei merged 9 commits intomainfrom
431-website-scraper

Behzad-rabiei commented Feb 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 19, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 19, 2025

Uh oh!

coderabbitai bot Feb 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Behzad-rabiei commented Feb 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Behzad-rabiei commented Feb 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 19, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)