Group DOM nodes content for page translation

# The problem

Look at this translation

<img width="990" height="316" alt="Image" src="https://github.com/user-attachments/assets/3d71f3a5-b282-462c-aec9-ba1b4e090efe" />

The original text is
> Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.[1] The reverse process is speech recognition.

This text looks ridiculously wrong translated:
> Синтез речи является искусственным производством человеческого речь.

It looks much better if I select this text segment and translate it:
> Синтез речи - это искусственное воспроизведение человеческой речи

So the problem not in translator here. The problem is words have been translated separately and translator don't get the context.

# Potential solutions

This text is placed inside a block element:

<img width="1202" height="898" alt="Image" src="https://github.com/user-attachments/assets/f1e93547-7db9-4081-b9da-7df6741e9e7a" />

We could schedule translation better, to translate whole text in block in single context.

Example algorithm is
- to find a parent block container of target node for translation
- wait some time to collect more neighborhoods (i mean nodes inside the same block, not a direct siblings of target node)
- form single string that includes all texts in block and have special markup that may be split back
- translate this special string
- parse translated string back to segments and validate
- in case validation is fail, and we have incorrect number of elements - run translation step again, maybe with modified string, to get another result
- resolve each translation request in chunk with a proper string

## Example

For the source HTML below
```html
Speech synthesis is the artificial production of human <a href="/wiki/Speech" title="Speech">speech</a>. A computer system used for this purpose is called a speech synthesizer, and can be implemented in <a href="/wiki/Software" title="Software">software</a> or <a href="/wiki/Computer_hardware" title="Computer hardware">hardware</a> products. A text-to-speech (TTS) system converts normal language text into speech; other systems render <a href="/wiki/Symbolic_linguistic_representation" title="Symbolic linguistic representation">symbolic linguistic representations</a> like <a href="/wiki/Phonetic_transcription" title="Phonetic transcription">phonetic transcriptions</a> into speech.<a href="#cite_note-1">[1]</a> The reverse process is <a href="/wiki/Speech_recognition" title="Speech recognition">speech recognition</a>.

```

We could generate next markup
```html
<pre>
 <pre>Speech synthesis</pre><pre> is the artificial production of human</pre>
 <pre>speech</pre><pre>. A computer system used for
 this purpose is called a speech synthesizer, and can be implemented in</pre>
 <pre>software</pre><pre> or</pre>
 <pre>hardware</pre><pre> products. A </pre><pre>text-to-speech</pre><pre> (</pre>TTS<pre>) system converts normal language
 text into speech; other systems render</pre>
 <pre>symbolic linguistic representations</pre><pre> like</pre><pre>phonetic transcriptions</pre><pre> into speech.</pre><pre><pre><pre>[</pre>1<pre>]</pre></pre></pre><pre> The reverse process is</pre>
 <pre>speech recognition</pre><pre>.</pre>
</pre>
```

that would be translated to

```html
<предварительный>
 <pre>Синтез речи</pre><pre> - это искусственное воспроизведение человеческой</pre>
 <pre>речи</pre><pre>. Компьютерная система, используемая для 
 этой цели, называется синтезатором речи и может быть реализована в</pre>
 <pre>Программное обеспечение</pre><pre> или</pre>
 <pre>аппаратное обеспечение</pre><pre> продукты. A </pre><pre>преобразование текста в речь</pre><pre> (</pre>TTS<pre>) система преобразует обычный язык
 текст в речь; другие системы воспроизводят</pre>
 <pre>символические лингвистические представления</pre><pre> Нравится</pre><pre>фонетические транскрипции</pre><pre> в речь.</pre><pre><pre><pre>[</pre>1<pre>]</pre></pre></pre><pre> Обратный процесс - это</pre>
 <pre>распознавание речи</pre><pre>.</pre>
>
```

As we can see, the translated result is broken, because first tag `<pre>` is translated as `<предварительный>` so we have to find a proper format to translate text batches.

# The goals

We have to find a way how to translate many texts with single request and then parse text back.

The methodology must describe an approach and algorithm how to implement such approach, including text formats and parsing approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Group DOM nodes content for page translation #563

The problem

Potential solutions

Example

The goals

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Group DOM nodes content for page translation #563

Description

The problem

Potential solutions

Example

The goals

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions