Skip to content

feat: Enable parsing of rich (smart) tags in google docs #215

Open
MrwanBaghdad wants to merge 3 commits intogemini-cli-extensions:mainfrom
MrwanBaghdad:main
Open

feat: Enable parsing of rich (smart) tags in google docs #215
MrwanBaghdad wants to merge 3 commits intogemini-cli-extensions:mainfrom
MrwanBaghdad:main

Conversation

@MrwanBaghdad
Copy link

Currently docs.getText skip over smart chip elements within Google docs.
Leaving out information.
This commit adds support to render smart chip elements in google docs to
simple text

Currently docs.getText skip over smart chip elements within Google docs.
Leaving out information.
This commit adds support to render smart chip elements in google docs to
simple text
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @MrwanBaghdad, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the DocsService by enabling it to correctly interpret and extract text from Google Docs containing rich (smart) chip elements. Previously, these elements were skipped, leading to incomplete text representations. The changes ensure that information from date elements, person mentions, and rich links is properly incorporated into the extracted text, providing a more comprehensive and accurate document content retrieval.

Highlights

  • Smart Chip Parsing: Implemented logic within the DocsService to correctly parse and extract text content from Google Docs 'smart chip' elements, including person, richLink, and dateElement types.
  • Fallback Mechanisms: Added robust fallback mechanisms for smart chips: using email if a person's name is unavailable, using the URI if a rich link's title is missing, and using the timestamp if a date element's display text is absent.
  • Comprehensive Testing: Introduced new unit tests to validate the accurate parsing of various smart chip types and to ensure the fallback logic functions as expected under different data conditions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • workspace-server/src/tests/services/DocsService.test.ts
    • Added a test case to verify text extraction from a combination of date, person, and rich link smart chips.
    • Added a test case to confirm fallback to email when a person's name is not provided.
    • Added a test case to ensure fallback to URI when a rich link's title is unavailable.
    • Added a test case to validate fallback to timestamp when a date element's display text is missing.
  • workspace-server/src/services/DocsService.ts
    • Modified the text extraction logic to process person smart chip elements, prioritizing the name and falling back to email.
    • Modified the text extraction logic to process richLink smart chip elements, prioritizing the title and falling back to the URI.
    • Modified the text extraction logic to process dateElement smart chip elements, prioritizing the display text and falling back to the timestamp.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully adds support for parsing rich 'smart tags' (like person, date, and rich link elements) from Google Docs, which was previously a missing feature. The implementation correctly extracts text from these elements and includes fallback logic. The accompanying tests are thorough and cover the new functionality and its edge cases.

I've provided a couple of suggestions to improve code consistency and reduce duplication in tests, which should enhance long-term maintainability.

Comment on lines 445 to 457
if (pElement.textRun && pElement.textRun.content) {
text += pElement.textRun.content;
} else if (pElement.person?.personProperties) {
const { name, email } = pElement.person.personProperties;
text += name || email || '';
} else if (pElement.richLink?.richLinkProperties) {
const { title, uri } = pElement.richLink.richLinkProperties;
text += title || uri || '';
} else if (pElement.dateElement?.dateElementProperties) {
const { displayText, timestamp } =
pElement.dateElement.dateElementProperties;
text += displayText || timestamp || '';
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved consistency and robustness, consider using optional chaining (?.) for pElement.textRun and the nullish coalescing operator (??) for fallbacks.

Using pElement.textRun?.content aligns with how other optional properties are checked in this block.

Using ?? instead of || is generally safer as it only falls back for null or undefined, not for other falsy values like an empty string (''). While || works in this specific case, ?? is more explicit about the intended behavior and prevents potential bugs if an empty string becomes a valid, distinct value in the future.

Suggested change
if (pElement.textRun && pElement.textRun.content) {
text += pElement.textRun.content;
} else if (pElement.person?.personProperties) {
const { name, email } = pElement.person.personProperties;
text += name || email || '';
} else if (pElement.richLink?.richLinkProperties) {
const { title, uri } = pElement.richLink.richLinkProperties;
text += title || uri || '';
} else if (pElement.dateElement?.dateElementProperties) {
const { displayText, timestamp } =
pElement.dateElement.dateElementProperties;
text += displayText || timestamp || '';
}
if (pElement.textRun?.content) {
text += pElement.textRun.content;
} else if (pElement.person?.personProperties) {
const { name, email } = pElement.person.personProperties;
text += name ?? email ?? '';
} else if (pElement.richLink?.richLinkProperties) {
const { title, uri } = pElement.richLink.richLinkProperties;
text += title ?? uri ?? '';
} else if (pElement.dateElement?.dateElementProperties) {
const { displayText, timestamp } =
pElement.dateElement.dateElementProperties;
text += displayText ?? timestamp ?? '';
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check your understanding from here, most of these fields "always exist" according to the client library
https://github.com/googleapis/google-api-nodejs-client/blob/main/src/apis/docs/v1.ts#L2204

Comment on lines 576 to 681
it('should fall back to email when person name is not available', async () => {
const mockDoc = {
data: {
tabs: [
{
documentTab: {
body: {
content: [
{
paragraph: {
elements: [
{
person: {
personProperties: {
email: 'jane@example.com',
},
},
},
],
},
},
],
},
},
},
],
},
};
mockDocsAPI.documents.get.mockResolvedValue(mockDoc);

const result = await docsService.getText({ documentId: 'test-doc-id' });

expect(result.content[0].text).toBe('jane@example.com');
});

it('should fall back to uri when rich link title is not available', async () => {
const mockDoc = {
data: {
tabs: [
{
documentTab: {
body: {
content: [
{
paragraph: {
elements: [
{
richLink: {
richLinkProperties: {
uri: 'https://docs.google.com/spreadsheets/d/xyz',
},
},
},
],
},
},
],
},
},
},
],
},
};
mockDocsAPI.documents.get.mockResolvedValue(mockDoc);

const result = await docsService.getText({ documentId: 'test-doc-id' });

expect(result.content[0].text).toBe(
'https://docs.google.com/spreadsheets/d/xyz',
);
});

it('should fall back to timestamp when date displayText is not available', async () => {
const mockDoc = {
data: {
tabs: [
{
documentTab: {
body: {
content: [
{
paragraph: {
elements: [
{
dateElement: {
dateElementProperties: {
timestamp: '1736899200',
},
},
},
],
},
},
],
},
},
},
],
},
};
mockDocsAPI.documents.get.mockResolvedValue(mockDoc);

const result = await docsService.getText({ documentId: 'test-doc-id' });

expect(result.content[0].text).toBe('1736899200');
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The tests for fallback scenarios are quite similar and contain a lot of boilerplate code. To improve maintainability and reduce duplication, you could use Jest's it.each to parameterize these tests. This would make the test suite more concise and easier to extend with new fallback cases in the future.

    it.each([
      {
        description: 'should fall back to email when person name is not available',
        elements: [
          {
            person: {
              personProperties: {
                email: 'jane@example.com',
              },
            },
          },
        ],
        expectedText: 'jane@example.com',
      },
      {
        description: 'should fall back to uri when rich link title is not available',
        elements: [
          {
            richLink: {
              richLinkProperties: {
                uri: 'https://docs.google.com/spreadsheets/d/xyz',
              },
            },
          },
        ],
        expectedText: 'https://docs.google.com/spreadsheets/d/xyz',
      },
      {
        description: 'should fall back to timestamp when date displayText is not available',
        elements: [
          {
            dateElement: {
              dateElementProperties: {
                timestamp: '1736899200',
              },
            },
          },
        ],
        expectedText: '1736899200',
      },
    ])('$description', async ({ elements, expectedText }) => {
      const mockDoc = {
        data: {
          tabs: [
            {
              documentTab: {
                body: {
                  content: [
                    {
                      paragraph: {
                        elements,
                      },
                    },
                  ],
                },
              },
            },
          ],
        },
      };
      mockDocsAPI.documents.get.mockResolvedValue(mockDoc);

      const result = await docsService.getText({ documentId: 'test-doc-id' });

      expect(result.content[0].text).toBe(expectedText);
    });

@MrwanBaghdad MrwanBaghdad changed the title Fix: Enable parsing of rich (smart) tags in google docs feat: Enable parsing of rich (smart) tags in google docs Feb 12, 2026
@MrwanBaghdad
Copy link
Author

Hey @allenhutchison I am submitting this change as it's a required feature by my organisation. Please let me know if you have any comments or feedbacks. I'm happy to incorporate them in the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant