Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/_assets/img/cluster-import-file-form.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/_assets/img/cluster-import-globbing.png
Binary file not shown.
Binary file removed docs/_assets/img/cluster-import-tab-azure.png
Binary file not shown.
Binary file removed docs/_assets/img/cluster-import-tab-choice.png
Binary file not shown.
Binary file removed docs/_assets/img/cluster-import-tab-file.png
Binary file not shown.
Binary file removed docs/_assets/img/cluster-import-tab-s3.png
Binary file not shown.
Binary file removed docs/_assets/img/cluster-import-tab-url.png
Binary file not shown.
Binary file removed docs/_assets/img/cluster-import-tab.png
Binary file not shown.
Binary file added docs/_assets/img/cluster-import.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
123 changes: 45 additions & 78 deletions docs/cluster/import.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,40 @@
(cluster-import)=
# Import

The first thing you see in the "Import" tab is the history of your
import jobs. You can see whether you imported from a URL or from a file,
the source file name and the target table name, and other metadata
like date and status.
By navigating to "Show details", you can display details of a particular
import job.
You can import data into your CrateDB directly from various sources, including:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General style question: This page has two very different styles of adressing the user/capabilities:
You can import vs CrateDB Cloud allows further down.

Do we have a general recommended style/guidelines?
cc @bmunkholm

Copy link
Member

@amotl amotl Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use active voice across the board, as we are doing it on the canonical repositories. Do you have any objections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me program CodeRabbit on this repository like we also did it on cratedb-guide.

@coderabbitai: When reviewing documentation, use active voice instead of first-person or passive voice, and emit corresponding suggestions. About tone and style, please reduce marketing phrasing; keep concise and use a neutral guide tone.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amotl, I've configured the style guidelines for documentation reviews in this repository. I'll now review documentation with these principles in mind, suggesting active voice over passive or first-person constructions, and promoting concise, neutral technical guidance over marketing-style phrasing.


✏️ Learnings added
Learnt from: amotl
Repo: crate/cloud-docs PR: 114
File: docs/cluster/import.md:4-4
Timestamp: 2026-01-12T15:55:57.743Z
Learning: In the cloud-docs repository, when reviewing documentation, prefer active voice instead of first-person voice (e.g., "we", "you") or passive voice constructions.

Learnt from: amotl
Repo: crate/cloud-docs PR: 114
File: docs/cluster/import.md:4-4
Timestamp: 2026-01-12T15:55:57.743Z
Learning: In the cloud-docs repository, emit suggestions to rewrite documentation using active voice when passive voice or first-person voice is detected.

Learnt from: amotl
Repo: crate/cloud-docs PR: 114
File: docs/cluster/import.md:4-4
Timestamp: 2026-01-12T15:55:57.743Z
Learning: In the cloud-docs repository documentation, reduce marketing phrasing and keep content concise with a neutral guide tone.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Member

@amotl amotl Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, we are also using those even more specific learnings for the cloud-docs repository:

In the cloud-docs repository, emit suggestions to rewrite documentation using active voice when passive voice or first-person voice is detected.

This comment was marked as spam.

- Local files
- URLs
- AWS S3 buckets
- Azure storage
- MongoDB database

Currently the following data formats are supported:
- CSV
- JSON (JSON-Lines, JSON Arrays, and JSON Documents)
- Parquet
- MongoDB collection

Clicking the "Import new data" button will bring up the page
where you can select the source of your data.
:::{note}
If you don't have a dataset prepared, we also provide sample data to let
you discover CrateDB. After importing those examples, feel free to go to
the tutorial page to learn how to use them.
:::

You can access the history of previous imports in the
"Import history" tab.
By navigating to "View detail", you can display details of a particular
import job (e.g. The number of successful and failed records per file).

If you don't have a dataset prepared, we also provide an example in the
URL import section. It's the New York City taxi trip dataset for July
of 2019 (about 6.3M records).
![Cloud Console cluster import data](../_assets/img/cluster-import.png)

(cluster-import-url)=
## URL
(cluster-import-file-import)=
## File Import

To import data, fill out the URL, name of the table which will be
created and populated with your data, data format, and whether it is
compressed.
To import data, select the file format, the source and the name of the table
which will be created and populated with your data.

If a table with the chosen name doesn't exist, it will be automatically
created.
You can deactivate the "Allow schema evolution" checkbox if you don't want
the destination table to be automatically created or its schema to be modified.

The following data formats are supported:

Expand All @@ -33,21 +44,21 @@ The following data formats are supported:

Gzip compressed files are also supported.

![Cloud Console cluster upload from URL](../_assets/img/cluster-import-tab-url.png)
![Cloud Console cluster upload from URL](../_assets/img/cluster-import-file-form.png)

(cluster-import-s3)=
## S3 bucket
(cluster-import-file-import-s3)=
### AWS S3 bucket

CrateDB Cloud allows convenient imports directly from S3-compatible
storage. To import a file form bucket, provide the name of your bucket,
storage. To import a file from a bucket, provide the name of your bucket,
and path to the file. The S3 Access Key ID, and S3 Secret Access Key are
also needed. You can also specify the endpoint for non-AWS S3 buckets.
Keep in mind that you may be charged for egress traffic, depending on
your provider. There is also a volume limit of 10 GiB per file for S3
imports. The usual file formats are supported - CSV (all variants), JSON
(JSON-Lines, JSON Arrays and JSON Documents), and Parquet.
imports.

![Cloud Console cluster upload from S3](../_assets/img/cluster-import-tab-s3.png)
Importing multiple files is also supported by using wildcard
notation: `/folder/*.parquet`.

:::{note}
It is important to make sure that you have the right permissions to
Expand All @@ -72,8 +83,8 @@ have a policy that allows GetObject access, for example:
```
:::

(cluster-import-azure)=
## Azure Blob Storage
(cluster-import-file-import-azure)=
### Azure Blob Storage

Importing data from private Azure Blob Storage containers is possible
using a stored secret, which includes a secret name and either an Azure
Expand All @@ -83,60 +94,16 @@ the organization level can add this secret.
You can specify a secret, a container, a table and a path in the form
`/folder/my_file.parquet`.

As with other imports Parquet, CSV, and JSON files are supported. File
size limitation for imports is 10 GiB per file.

![Cloud Console cluster upload from Azure Storage Container](../_assets/img/cluster-import-tab-azure.png)

(cluster-import-globbing)=
## Globbing
Importing multiple files is also supported by using wildcard
notation: `/folder/*.parquet`.

Importing multiple files, also known as import globbing is supported in
any s3-compatible blob storage. The steps are the same as if importing
from S3, i.e. bucket name, path to the file and S3 ID/Secret.

Importing multiple files from Azure Container/Blob Storage is also
supported: `/folder/*.parquet`

Files to be imported are specified by using the well-known
[wildcard](https://en.wikipedia.org/wiki/Wildcard_character) notation,
also known as "globbing". In computer programming,
[glob](https://en.wikipedia.org/wiki/Glob_(programming)) patterns
specify sets of filenames with wildcard characters. The following
example would import all the files from the single specified day.

:::{code} console
/somepath/AWSLogs/123456678899/CloudTrail/us-east-1/2023/11/12/*.json.gz
:::
File size limitation for imports is 10 GiB per file.

![Cloud Console cluster import globbing](../_assets/img/cluster-import-globbing.png)
(cluster-import-integration)=
## Integration

As with other imports, the supported file types are CSV, JSON, and
Parquet.

(cluster-import-file)=
## File

Uploading directly from your computer offers more control over your
data. From the security point of view, you don't have to share the data
on the internet just to be able to import it to your cluster. You also
have more control over who has access to your data. Your files are
temporarily uploaded to a secure location managed by Crate (an S3 bucket
in AWS) which is not publicly accessible. The files are automatically
deleted after 3 days. You may re-import the same file into multiple
tables without having to re-upload it within those 3 days. Up to 5 files
may be uploaded at the same time, with the oldest ones being
automatically deleted if you upload more.

![Cloud Console cluster upload from file](../_assets/img/cluster-import-tab-file.png)

As with other import, the supported file formats are:

- CSV (all variants)
- JSON (JSON-Lines, JSON Arrays and JSON Documents)
- Parquet
{ref}`More info about data integration. <cluster-integrations>`
Comment on lines +102 to +105
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and get its size
wc -l docs/cluster/import.md

Repository: crate/cloud-docs

Length of output: 86


🏁 Script executed:

# Read the entire file to understand structure and content
cat -n docs/cluster/import.md

Repository: crate/cloud-docs

Length of output: 5804


🏁 Script executed:

# Search for MongoDB references in the file
rg -n "MongoDB|mongodb" docs/cluster/import.md

Repository: crate/cloud-docs

Length of output: 104


🏁 Script executed:

# Search for S3 and Azure sections to compare with MongoDB
rg -n "S3|Azure|azure" docs/cluster/import.md

Repository: crate/cloud-docs

Length of output: 769


🏁 Script executed:

# Search for MongoDB in related documentation
fd -e md -e rst --search-path docs/cluster | head -20

Repository: crate/cloud-docs

Length of output: 277


🏁 Script executed:

# Search for cluster-integrations reference target
rg -n "cluster-integrations" docs/cluster/import.md docs/cluster/integrations/

Repository: crate/cloud-docs

Length of output: 211


🏁 Script executed:

# Check the integrations index file
cat -n docs/cluster/integrations/index.md

Repository: crate/cloud-docs

Length of output: 2214


🏁 Script executed:

# Check the mongo-cdc file to see if it documents MongoDB import
head -50 docs/cluster/integrations/mongo-cdc.md

Repository: crate/cloud-docs

Length of output: 1864


Integration section is minimalist and should provide more context.

The Integration section (lines 102-105) contains only a single cross-reference. While the reference points to the appropriate integrations documentation, the section itself reads as a placeholder and lacks any explanatory context about what integrations are or how they differ from File Import.

Additionally, there is a terminology confusion: the intro mentions "MongoDB database" and line 15 lists "MongoDB collection" as a supported format, but no guidance exists in the File Import section for importing from MongoDB as a data source (unlike S3 and Azure, which have dedicated subsections). The "MongoDB collection" format reference relates to data format support in other imports, not MongoDB-as-source capability. MongoDB import/sync guidance exists only in the separate Integrations section (MongoDB CDC), which describes continuous real-time synchronization rather than one-time imports.

Consider either:

  1. Expanding the Integration section with a brief explanation of what integrations are and how they differ from one-time file imports, or
  2. Relocating this section to appear after the File Import subsections with clearer separation of concerns
  3. Clarifying whether one-time MongoDB imports are supported in File Import (beyond CDC) and documenting them accordingly
🤖 Prompt for AI Agents
In @docs/cluster/import.md around lines 102 - 105, The Integration section
titled "(cluster-import-integration) ## Integration" is too minimal and causes
confusion about how integrations differ from File Import and whether MongoDB is
supported as a one-time source; expand this section to briefly define
"integrations" vs "one-time file imports", explicitly state that MongoDB CDC
covers real-time sync while noting whether one-time MongoDB imports are
supported (and link to the MongoDB CDC page), and either move this expanded
section to follow the File Import subsections or add a clear
cross-reference/note under "File Import" clarifying that "MongoDB collection" in
formats refers to format support (not source import) unless one-time MongoDB
import is implemented—if one-time MongoDB import exists, add a short how-to
summary or link to its docs.


There is also a limit to file size, currently 1GB.

(overview-cluster-import-schema-evolution)=
## Schema evolution
Expand All @@ -145,7 +112,7 @@ Schema Evolution, available for all import types, enables automatic
addition of new columns to existing tables during data import,
eliminating the need to pre-define table schemas. This feature is
applicable to both pre-existing tables and those created during the
import process. It can be toggled via the 'Schema Evolution' checkbox
import process. It can be toggled via the 'Allow schema evolution' checkbox
on the import page.

Note that Schema Evolution is limited to adding new columns; it does not
Expand Down