Skip to content

ML Questions and Thoughts #3

@chopchop505

Description

@chopchop505
  1. Which AWS Account should we deploy (FireCARES/StatEngine)?

  2. What is the preferred dump format for training data from Elasticsearch. The easiest is a line delimited JSON file via elasticdump. It would be an entire dump (all fields, all departments). In your notebook (in the section Upload the data for traning section), you could then retrieve this data dump and do pre-processing/subsetting for the model in question.

  3. Do you want to continuously updated training data? If you'll be tweaking models frequently, is it best practice to use the same static training data set or continuous add to the training dataset. Doesn't matter to me, we can export up to daily, but that might be overkill.

  4. Do you plan on using different models for each departments, or single model that takes the FireCARES ID as a heavily weighted feature? A single model obviously makes deployment easier, but probably complicates the model significantly (I don't know enough about ML).

  5. For deployment, its cheaper to do batch predictions, but easier to do on-demand (especially for future models). Not a question, but just something we should chat about.

  6. We probably want to think about how to manage multiple experiments sooner rather than later. See example here: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/search/ml_experiment_management_using_search.ipynb

This was a great example, and we could literally have this in production tomorrow!

Deploying your custom model is going to take a bit of lifting (https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html), but should be doable. It may be easier to translate your model to a supported framework like SciKitLearn/TensorFlow, instead of building a custom container?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions