Skip to content

Conversation

@Shrinks99
Copy link
Collaborator

@Shrinks99 Shrinks99 commented May 5, 2025

PR Type

Enhancement


Description

  • Added screenshot and PDF capture capabilities

  • Added Flash content archiving via Ruffle

  • Fixed TextEncoder initialization

  • Updated UI for archiving settings

  • Improved IPFS integration


Changes walkthrough 📝

Relevant files
Enhancement
8 files
browser-recorder.ts
Update autopilot UI color and text                                             
+2/-2     
recorder.ts
Add screenshot, PDF, and Flash archiving capabilities       
+111/-9 
api.ts
Add service worker API for WACZ file handling                       
+483/-0 
downloader.ts
Add WACZ file generation and download functionality           
+1111/-0
ipfsutils.ts
Add IPFS integration utilities                                                     
+634/-0 
keystore.ts
Add cryptographic signing capabilities for archives           
+225/-0 
recproxy.ts
Add recording proxy for capturing web content                       
+319/-0 
app.ts
Update UI with new archiving options                                         
+80/-33 
Formatting
1 files
localstorage.ts
Fix TypeScript formatting                                                               
+1/-1     
Configuration changes
2 files
globals.d.ts
Add TypeScript declarations for global constants                 
+8/-0     
main.ts
Update imports for new module structure                                   
+2/-1     
Bug fix
1 files
coll.ts
Fix download URL generation                                                           
+3/-3     
Dependencies
1 files
package.json
Bump version to 0.15.0 and update dependencies                     
+14/-11 

Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @Shrinks99 Shrinks99 merged commit 6c4534f into main May 5, 2025
    1 of 4 checks passed
    @Shrinks99 Shrinks99 deleted the webrecorder-awp-0.15.0 branch May 5, 2025 07:44
    @pr-agent-monadical
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 Security concerns

    Sensitive information exposure:
    In src/sw/ipfsutils.ts, the code includes a hardcoded web3StorageToken variable (line 15) which appears to be a configuration token. While it's likely this is replaced during build time with WEB3_STORAGE_TOKEN, exposing API tokens in client-side code is a security risk. This token should be stored securely and not exposed in the client-side code. Additionally, the IPFS integration could potentially expose archived content publicly without proper access controls.

    ⚡ Recommended focus areas for review

    Potential Memory Leak

    The saveScreenshot method creates an OffscreenCanvas but doesn't properly clean up resources. The bitmap is transferred to the canvas context but there's no explicit disposal of the bitmap or canvas after use, which could lead to memory leaks during repeated screenshot captures.

    async saveScreenshot(pageInfo: any) {
    
      // View Screenshot
      const width = 1920;
      const height = 1080;
    
      // @ts-expect-error: ignore param
      await this.send("Emulation.setDeviceMetricsOverride", {width, height, deviceScaleFactor: 0, mobile: false});
      // @ts-expect-error: ignore param
      const resp = await this.send("Page.captureScreenshot", {format: "png"});
    
      const payload = Buffer.from(resp.data, "base64");
      const blob = new Blob([payload], {type: "image/png"});
    
      await this.send("Emulation.clearDeviceMetricsOverride");
    
      const mime = "image/png";
    
      const fullData = {
        url: "urn:view:" + pageInfo.url,
        ts: new Date().getTime(),
        status: 200,
        statusText: "OK",
        pageId: pageInfo.id,
        mime,
        respHeaders: {"Content-Type": mime, "Content-Length": payload.length + ""},
        reqHeaders: {},
        payload,
        extraOpts: {resource: true},
      };
    
      const thumbWidth = 640;
      const thumbHeight = 360;
    
      const bitmap = await self.createImageBitmap(blob, {resizeWidth: thumbWidth, resizeHeight: thumbHeight});
    
      const canvas = new OffscreenCanvas(thumbWidth, thumbWidth);
      const context = canvas.getContext("bitmaprenderer")!;
      context.transferFromImageBitmap(bitmap);
    
      const resizedBlob = await canvas.convertToBlob({type: "image/png"});
    
      const thumbPayload = new Uint8Array(await resizedBlob.arrayBuffer());
    
      const thumbData = {...fullData,
        url: "urn:thumbnail:" + pageInfo.url,
        respHeaders: {"Content-Type": mime, "Content-Length": thumbPayload.length + ""},
        payload: thumbPayload
      };
    
      // @ts-expect-error - TS2339 - Property '_doAddResource' does not exist on type 'Recorder'.
      await this._doAddResource(fullData);
    
    
      // @ts-expect-error - TS2339 - Property '_doAddResource' does not exist on type 'Recorder'.
      await this._doAddResource(thumbData);
    }
    Error Handling

    The ipfsAdd function has insufficient error handling. If the IPFS connection fails, the error is caught but not properly propagated to the caller, potentially leading to silent failures. The function should have more robust error handling and reporting.

    export async function ipfsAdd(
      coll: Collection,
      downloaderOpts: DownloaderOpts,
      replayOpts: ReplayOpts = {},
      progress: (incSize: number, totalSize: number) => void,
    ) {
      if (!autoipfs) {
        autoipfs = await createAutoIPFS(autoipfsOpts);
      }
    
      const filename = replayOpts.filename || "webarchive.wacz";
    
      if (replayOpts.customSplits) {
        const ZIP = new Uint8Array([]);
        const WARC_PAYLOAD = new Uint8Array([]);
        const WARC_GROUP = new Uint8Array([]);
        downloaderOpts.markers = { ZIP, WARC_PAYLOAD, WARC_GROUP };
      }
    
      const gzip = replayOpts.gzip !== undefined ? replayOpts.gzip : true;
    
      const dl = new Downloader({ ...downloaderOpts, coll, filename, gzip });
      const dlResponse = await dl.download();
    
      if (!(dlResponse instanceof Response)) {
        throw new Error(dlResponse.error);
      }
    
      const metadata: MetadataWithIPFS = coll.config.metadata || {};
    
      if (!metadata.ipfsPins) {
        metadata.ipfsPins = [];
      }
    
      let concur;
      let shardSize;
      let capacity;
    
      if (autoipfs.type === "web3.storage") {
        // for now, web3storage only allows a single-shard uploads, so set this high.
        concur = 1;
        shardSize = 1024 * 1024 * 10000;
        capacity = 1048576 * 200;
      } else {
        concur = 3;
        shardSize = 1024 * 1024 * 5;
        // use default capacity
        // capacity = undefined;
        capacity = 1048576 * 200;
      }
    
      const { readable, writable } = new TransformStream(
        {},
        UnixFS.withCapacity(capacity),
      );
    
      const baseUrl = replayOpts.replayBaseUrl || self.location.href;
    
      const swContent = await fetchBuffer("sw.js", baseUrl);
      const uiContent = await fetchBuffer("ui.js", baseUrl);
    
      let favicon = null;
    
      try {
        favicon = await fetchBuffer("icon.png", baseUrl);
      } catch (_e) {
        console.warn("Couldn't load favicon");
      }
    
      const htmlContent = getReplayHtml(dlResponse.filename!, replayOpts);
    
      let totalSize = 0;
    
      if (coll.config.metadata?.size) {
        totalSize =
          coll.config.metadata.size +
          swContent.length +
          uiContent.length +
          (favicon ? favicon.length : 0) +
          htmlContent.length;
      }
    
      progress(0, totalSize);
    
      let url = "";
      let cid = "";
    
      let reject: ((reason?: string) => void) | null = null;
    
      const p2 = new Promise((res, rej) => (reject = rej));
    
      const p = readable
        .pipeThrough(new ShardingStream(shardSize))
        .pipeThrough(new ShardStoringStream(autoipfs, concur, reject!))
        .pipeTo(
          new WritableStream({
            write: (res: { url: string; cid: string; size: number }) => {
              if (res.url && res.cid) {
                url = res.url;
                cid = res.cid;
              }
              if (res.size) {
                progress(res.size, totalSize);
              }
            },
          }),
        );
    
      ipfsGenerateCar(
        writable,
        dlResponse.filename || "",
        dlResponse.body!,
        swContent,
        uiContent,
        htmlContent,
        replayOpts,
        downloaderOpts.markers!,
        favicon,
      ).catch((e: unknown) => console.log("generate car failed", e));
    
      await Promise.race([p, p2]);
    
      const res = { cid: cid.toString(), url };
    
      metadata.ipfsPins.push(res);
    
      console.log("ipfs cid added " + url);
    
      return res;
    Resource Efficiency

    The downloadWACZ method creates multiple file streams and processes them sequentially. This approach could be optimized by using parallel processing or streaming where possible to improve performance, especially for large archives.

    async downloadWACZ(filename: string, sizeCallback: SizeCallback | null) {
      filename = (filename || "webarchive").split(".")[0] + ".wacz";
    
      this.fileHasher = await createSHA256();
      this.recordHasher = await createSHA256();
      this.hashType = "sha256";
    
      const zip: ClientZipEntry[] = [];
    
      this.firstResources = await this.loadResourcesBlock();
    
      this.addFile(zip, "pages/pages.jsonl", this.generatePages(), sizeCallback);
      this.addFile(
        zip,
        `archive/${this.warcName}`,
        this.generateWARC(filename + `#/archive/${this.warcName}`, true),
        sizeCallback,
      );
      //this.addFile(zip, "archive/text.warc", this.generateTextWARC(filename + "#/archive/text.warc"), false);
    
      // don't use compressed index if we'll have a single block, need to have at least enough for 2 blocks
      if (this.firstResources.length < 2 * LINES_PER_BLOCK) {
        this.addFile(zip, "indexes/index.cdx", this.generateCDX(), sizeCallback);
      } else {
        this.addFile(
          zip,
          "indexes/index.cdx.gz",
          this.generateCompressedCDX("index.cdx.gz"),
          sizeCallback,
        );
        this.addFile(zip, "indexes/index.idx", this.generateIDX(), sizeCallback);
      }
    
      this.addFile(
        zip,
        DATAPACKAGE_FILENAME,
        this.generateDataPackage(),
        sizeCallback,
      );
    
      this.addFile(
        zip,
        DIGEST_FILENAME,
        this.generateDataManifest(),
        sizeCallback,
      );

    Comment on lines +572 to +602
    if (
    !this.gzip &&
    this.markers.WARC_PAYLOAD &&
    record.warcType !== "request" &&
    (chunks.length === 5 || chunks.length === 4)
    ) {
    if (chunks.length === 5) {
    yield chunks[0];
    yield chunks[1];
    yield chunks[2];
    yield this.markers.WARC_PAYLOAD;
    if (chunks[3].length) {
    yield chunks[3];
    yield this.markers.WARC_PAYLOAD;
    }
    yield chunks[4];
    } else {
    yield chunks[0];
    yield chunks[1];
    yield this.markers.WARC_PAYLOAD;
    if (chunks[2].length) {
    yield chunks[2];
    yield this.markers.WARC_PAYLOAD;
    }
    yield chunks[3];
    }
    } else {
    for (const chunk of chunks) {
    yield chunk;
    }
    }

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: The current code has a potential issue with marker placement. When inserting the WARC_PAYLOAD marker after chunks[2] in the 5-chunk case, you should check if chunks[3] has length before yielding the marker, similar to how you're doing it in the 4-chunk case. [possible issue, importance: 7]

    Suggested change
    if (
    !this.gzip &&
    this.markers.WARC_PAYLOAD &&
    record.warcType !== "request" &&
    (chunks.length === 5 || chunks.length === 4)
    ) {
    if (chunks.length === 5) {
    yield chunks[0];
    yield chunks[1];
    yield chunks[2];
    yield this.markers.WARC_PAYLOAD;
    if (chunks[3].length) {
    yield chunks[3];
    yield this.markers.WARC_PAYLOAD;
    }
    yield chunks[4];
    } else {
    yield chunks[0];
    yield chunks[1];
    yield this.markers.WARC_PAYLOAD;
    if (chunks[2].length) {
    yield chunks[2];
    yield this.markers.WARC_PAYLOAD;
    }
    yield chunks[3];
    }
    } else {
    for (const chunk of chunks) {
    yield chunk;
    }
    }
    if (
    !this.gzip &&
    this.markers.WARC_PAYLOAD &&
    record.warcType !== "request" &&
    (chunks.length === 5 || chunks.length === 4)
    ) {
    if (chunks.length === 5) {
    yield chunks[0];
    yield chunks[1];
    yield chunks[2];
    if (chunks[3].length) {
    yield this.markers.WARC_PAYLOAD;
    yield chunks[3];
    yield this.markers.WARC_PAYLOAD;
    }
    yield chunks[4];
    } else {
    yield chunks[0];
    yield chunks[1];
    yield this.markers.WARC_PAYLOAD;
    if (chunks[2].length) {
    yield chunks[2];
    yield this.markers.WARC_PAYLOAD;
    }
    yield chunks[3];
    }
    } else {
    for (const chunk of chunks) {
    yield chunk;
    }
    }

    Comment on lines +547 to +566
    async *emitRecord(
    record: WARCRecord,
    doDigest: boolean,
    output: { length: number; digest?: string },
    ) {
    const opts = { gzip: this.gzip, digest: this.digestOpts };
    const s = new WARCSerializer(record, opts);

    const chunks = [];
    if (doDigest) {
    this.recordHasher!.init();
    }

    for await (const chunk of s) {
    if (doDigest) {
    this.recordHasher!.update(chunk as Uint8Array);
    }
    chunks.push(chunk);
    output.length += chunk.length;
    }

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: The code assumes that chunk is a Uint8Array when updating the recordHasher, but it doesn't verify this. Since the type signature of the generator allows for both string and Uint8Array, you should ensure proper type handling before updating the hasher to prevent potential runtime errors. [possible issue, importance: 8]

    Suggested change
    async *emitRecord(
    record: WARCRecord,
    doDigest: boolean,
    output: { length: number; digest?: string },
    ) {
    const opts = { gzip: this.gzip, digest: this.digestOpts };
    const s = new WARCSerializer(record, opts);
    const chunks = [];
    if (doDigest) {
    this.recordHasher!.init();
    }
    for await (const chunk of s) {
    if (doDigest) {
    this.recordHasher!.update(chunk as Uint8Array);
    }
    chunks.push(chunk);
    output.length += chunk.length;
    }
    async *emitRecord(
    record: WARCRecord,
    doDigest: boolean,
    output: { length: number; digest?: string },
    ) {
    const opts = { gzip: this.gzip, digest: this.digestOpts };
    const s = new WARCSerializer(record, opts);
    const chunks = [];
    if (doDigest) {
    this.recordHasher!.init();
    }
    for await (const chunk of s) {
    if (doDigest) {
    if (typeof chunk === "string") {
    this.recordHasher!.update(encoder.encode(chunk));
    } else {
    this.recordHasher!.update(chunk as Uint8Array);
    }
    }
    chunks.push(chunk);
    output.length += chunk.length;
    }

    Comment on lines +897 to +910
    async createWARCRecord(resource: DLResourceEntry) {
    let url = resource.url;
    const date = new Date(resource.ts).toISOString();
    resource.timestamp = getTSMillis(date);
    const httpHeaders = resource.respHeaders || {};
    const warcVersion = this.warcVersion;

    const pageId = resource.pageId;

    let payload: Uint8Array | null | undefined = resource.payload;
    let type: "response" | "request" | "resource" | "revisit";

    let refersToUrl, refersToDate;
    let refersToDigest;

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: The type variable is declared but not initialized with a default value. If none of the conditions in the function that set this variable are met, it will remain undefined, potentially causing runtime errors when used later in the function. [possible issue, importance: 6]

    Suggested change
    async createWARCRecord(resource: DLResourceEntry) {
    let url = resource.url;
    const date = new Date(resource.ts).toISOString();
    resource.timestamp = getTSMillis(date);
    const httpHeaders = resource.respHeaders || {};
    const warcVersion = this.warcVersion;
    const pageId = resource.pageId;
    let payload: Uint8Array | null | undefined = resource.payload;
    let type: "response" | "request" | "resource" | "revisit";
    let refersToUrl, refersToDate;
    let refersToDigest;
    async createWARCRecord(resource: DLResourceEntry) {
    let url = resource.url;
    const date = new Date(resource.ts).toISOString();
    resource.timestamp = getTSMillis(date);
    const httpHeaders = resource.respHeaders || {};
    const warcVersion = this.warcVersion;
    const pageId = resource.pageId;
    let payload: Uint8Array | null | undefined = resource.payload;
    let type: "response" | "request" | "resource" | "revisit" = "response"; // Default value

    Comment on lines +339 to +347
    const { coll } = await this.prepareColl(collId, request);

    if (await ipfsRemove(coll)) {
    await this.collections.updateMetadata(coll.name, coll.config.metadata);
    return { removed: true };
    }

    return { removed: false };
    }

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: The method is missing error handling. If ipfsRemove(coll) throws an exception, it will propagate up and potentially crash the service worker. Add a try-catch block to handle potential errors. [possible issue, importance: 7]

    Suggested change
    const { coll } = await this.prepareColl(collId, request);
    if (await ipfsRemove(coll)) {
    await this.collections.updateMetadata(coll.name, coll.config.metadata);
    return { removed: true };
    }
    return { removed: false };
    }
    async ipfsRemove(request: Request, collId: string) {
    const { coll } = await this.prepareColl(collId, request);
    try {
    if (await ipfsRemove(coll)) {
    await this.collections.updateMetadata(coll.name, coll.config.metadata);
    return { removed: true };
    }
    return { removed: false };
    } catch (error) {
    console.error("Error removing IPFS:", error);
    return { removed: false, error: "ipfs_remove_failed" };
    }
    }

    Comment on lines +154 to +160
    if (range && !range.startsWith("bytes 0-")) {
    console.log("skip range request: " + range);
    return;
    }

    const status = response.status;
    const statusText = response.statusText;

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: The current range check is redundant since you've already checked if the range doesn't start with "bytes 0-" earlier in the function. This creates a potential inconsistency where a request might pass the first check but fail the second, leading to confusing behavior. [general, importance: 5]

    Suggested change
    if (range && !range.startsWith("bytes 0-")) {
    console.log("skip range request: " + range);
    return;
    }
    const status = response.status;
    const statusText = response.statusText;
    if (range) {
    // Range was already checked at the beginning of the function
    // This ensures we only process ranges starting with "bytes 0-"
    const expectedRange = `bytes 0-${payload.length - 1}/${payload.length}`;
    if (range !== expectedRange) {
    console.log("skip range request: " + range);
    return;
    }
    }

    Comment on lines +974 to +978
    const width = 1920;
    const height = 1080;

    // @ts-expect-error: ignore param
    await this.send("Emulation.setDeviceMetricsOverride", {width, height, deviceScaleFactor: 0, mobile: false});

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: The deviceScaleFactor is set to 0, which is an invalid value. According to Chrome DevTools Protocol, this should be a positive number where 0 is not allowed. This could lead to unexpected rendering issues in screenshots. [possible issue, importance: 8]

    Suggested change
    const width = 1920;
    const height = 1080;
    // @ts-expect-error: ignore param
    await this.send("Emulation.setDeviceMetricsOverride", {width, height, deviceScaleFactor: 0, mobile: false});
    async saveScreenshot(pageInfo: any) {
    // View Screenshot
    const width = 1920;
    const height = 1080;
    // @ts-expect-error: ignore param
    await this.send("Emulation.setDeviceMetricsOverride", {width, height, deviceScaleFactor: 1, mobile: false});
    // @ts-expect-error: ignore param
    const resp = await this.send("Page.captureScreenshot", {format: "png"});
    const payload = Buffer.from(resp.data, "base64");
    const blob = new Blob([payload], {type: "image/png"});
    await this.send("Emulation.clearDeviceMetricsOverride");
    const mime = "image/png";

    Comment on lines +125 to +136
    const ecdsaImportParams = {
    name: "ECDSA",
    namedCurve: "P-384",
    };

    const extractable = true;
    const usage = ["sign", "verify"] as KeyUsage[];

    const ecdsaSignParams = {
    name: "ECDSA",
    hash: "SHA-256",
    };

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: The code uses "SHA-256" for the hash algorithm with ECDSA P-384, which is a mismatch. For P-384 curves, it's recommended to use SHA-384 to match the security level of the curve. Using a weaker hash algorithm reduces the overall security of the signature. [security, importance: 9]

    Suggested change
    const ecdsaImportParams = {
    name: "ECDSA",
    namedCurve: "P-384",
    };
    const extractable = true;
    const usage = ["sign", "verify"] as KeyUsage[];
    const ecdsaSignParams = {
    name: "ECDSA",
    hash: "SHA-256",
    };
    async sign(string: string, created: string): Promise<DataSignature> {
    let keyPair: CryptoKeyPair;
    let keys = await this.loadKeys();
    const ecdsaImportParams = {
    name: "ECDSA",
    namedCurve: "P-384",
    };
    const extractable = true;
    const usage = ["sign", "verify"] as KeyUsage[];
    const ecdsaSignParams = {
    name: "ECDSA",
    hash: "SHA-384",
    };

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants