Skip to content

Conversation

@gaobinlong
Copy link

@gaobinlong gaobinlong commented Jul 22, 2025

Description

From the document of ML command, it shows that ml supports category_field command, but actually it doesn't work. This PR makes ML command supports category_field parameter.

Request:

POST _plugins/_ppl?format=jdbc
{
  "query":"source = abcd_test | eval value = cast(value as double) | fields value, category | ml action='trainandpredict' algorithm='rcf' input='value' category_field='category'"
}

Response:

{
  "schema": [
    {
      "name": "value",
      "type": "double"
    },
    {
      "name": "category",
      "type": "string"
    },
    {
      "name": "score",
      "type": "double"
    },
    {
      "name": "anomalous",
      "type": "boolean"
    }
  ],
  "datarows": [
    [
      1,
      "a",
      0,
      false
    ],
    [
      2,
      "b",
      0,
      false
    ]
  ],
  "total": 2,
  "size": 2
}

Related Issues

#3406

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Binlong Gao <gbinlong@amazon.com>
@gaobinlong
Copy link
Author

@LantaoJin @qianheng-aws @songkant-aws please help to review this PR, thanks!

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@songkant-aws
Copy link
Contributor

LGTM

@songkant-aws
Copy link
Contributor

@LantaoJin @qianheng-aws @yuancu Need other reviews.

String categoryField =
arguments.containsKey(CATEGORY_FIELD)
? (String) arguments.get(CATEGORY_FIELD).getValue()
: null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

categoryField is null will throw NPE in generateCategorizedInputDataset

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? generateCategorizedInputDataset has null checking:

ExprValue categoryValue = categoryField == null ? null : tupleValue.get(categoryField);

If we want, we can add a @Nullable annotation to that field to document that contract in the signature

@LantaoJin LantaoJin added enhancement New feature or request and removed stalled labels Oct 15, 2025
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 2 weeks with no activity.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 2 weeks with no activity.

@Swiddis
Copy link
Collaborator

Swiddis commented Nov 25, 2025

@LantaoJin can you re-review?

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 2 weeks with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants