training ending prematurely

I am trying to run axon on a Supervisly dataset with default training parameters. Here is the command line output from the axon container on docker: 

[1] Successfully created the TFRecord: /wpi-data/projects/8b397942-b1e5-4dde-a69a-7a4c940396a8/train.recordLABELS ['cargo_red', 'cargo_blue', 'cargo_reflect_red', 'cargo_reflect_blue']

[1] .

[1] Successfully created the TFRecord: /wpi-data/projects/8b397942-b1e5-4dde-a69a-7a4c940396a8/eval.recordoutput_pbtxt in parse_meta.py: /wpi-data/projects/8b397942-b1e5-4dde-a69a-7a4c940396a8/map.pbtxt

[1] <open file '/wpi-data/projects/8b397942-b1e5-4dde-a69a-7a4c940396a8/map.pbtxt', mode 'w+' at 0x7f13f825be40>

[1] .

[1] Records generated

[1] 8b397942-b1e5-4dde-a69a-7a4c940396a8: Trainer extracted dataset

[1] 8b397942-b1e5-4dde-a69a-7a4c940396a8: Launching container wpilib/axon-metrics

[1] 8b397942-b1e5-4dde-a69a-7a4c940396a8: Launching container wpilib/axon-training

[1] /tensorflow/models/research/object_detection/utils/visualization_utils.py:26: UserWarning: 

[1] This call to matplotlib.use() has no effect because the backend has already

[1] been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,

[1] or matplotlib.backends is imported for the first time.

[1] 

[1] The backend was *originally* set to 'TkAgg' by the following code:

[1]   File "train.py", line 7, in <module>

[1]     import modularized_model_main

[1]   File "/tensorflow/models/research/modularized_model_main.py", line 10, in <module>

[1]     from object_detection import model_lib

[1]   File "/tensorflow/models/research/object_detection/model_lib.py", line 27, in <module>

[1]     from object_detection import eval_util

[1]   File "/tensorflow/models/research/object_detection/eval_util.py", line 27, in <module>

[1]     from object_detection.metrics import coco_evaluation

[1]   File "/tensorflow/models/research/object_detection/metrics/coco_evaluation.py", line 20, in <module>

[1]     from object_detection.metrics import coco_tools

[1]   File "/tensorflow/models/research/object_detection/metrics/coco_tools.py", line 47, in <module>

[1]     from pycocotools import coco

[1]   File "/tensorflow/models/research/pycocotools/coco.py", line 49, in <module>

[1]     import matplotlib.pyplot as plt

[1]   File "/usr/local/lib/python2.7/dist-packages/matplotlib/pyplot.py", line 71, in <module>

[1]     from matplotlib.backends import pylab_setup

[1]   File "/usr/local/lib/python2.7/dist-packages/matplotlib/backends/__init__.py", line 16, in <module>

[1]     line for line in traceback.format_stack()

[1] 

[1] 

[1]   import matplotlib; matplotlib.use('Agg')  # pylint: disable=multiple-statements

[1] TensorBoard 1.12.0 at http://8975446b1d29:6006 (Press CTRL+C to quit)

[1] WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.

[1] WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.

[1] WARNING:tensorflow:Estimator's model_fn (<function model_fn at 0x7f6555f308c0>) includes params argument, but params are not passed to Estimator.

[1] WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.

[1] WARNING:tensorflow:From /tensorflow/models/research/object_detection/builders/dataset_builder.py:80: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.

[1] Instructions for updating:

[1] Use `tf.data.experimental.parallel_interleave(...)`.

[1] WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.

[1] Instructions for updating:

[1] Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.

[1] WARNING:tensorflow:From /tensorflow/models/research/object_detection/core/preprocessor.py:1218: calling squeeze (from tensorflow.python.ops.array_ops) with squeeze_dims is deprecated and will be removed in a future version.

[1] Instructions for updating:

[1] Use the `axis` argument instead

[1] WARNING:tensorflow:From /tensorflow/models/research/object_detection/builders/dataset_builder.py:148: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.

[1] Instructions for updating:

[1] Use `tf.data.Dataset.batch(..., drop_remainder=True)`.

[1] WARNING:root:Variable [BoxPredictor_0/ClassPredictor/biases] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[273]], model variable shape: [[15]]. This variable will not be initialized from the checkpoint.

[1] WARNING:root:Variable [BoxPredictor_0/ClassPredictor/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 576, 273]], model variable shape: [[1, 1, 576, 15]]. This variable will not be initialized from the checkpoint.

[1] WARNING:root:Variable [BoxPredictor_1/ClassPredictor/biases] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[546]], model variable shape: [[30]]. This variable will not be initialized from the checkpoint.

[1] WARNING:root:Variable [BoxPredictor_1/ClassPredictor/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 1280, 546]], model variable shape: [[1, 1, 1280, 30]]. This variable will not be initialized from the checkpoint.

[1] WARNING:root:Variable [BoxPredictor_2/ClassPredictor/biases] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[546]], model variable shape: [[30]]. This variable will not be initialized from the checkpoint.

[1] WARNING:root:Variable [BoxPredictor_2/ClassPredictor/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 512, 546]], model variable shape: [[1, 1, 512, 30]]. This variable will not be initialized from the checkpoint.

[1] WARNING:root:Variable [BoxPredictor_3/ClassPredictor/biases] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[546]], model variable shape: [[30]]. This variable will not be initialized from the checkpoint.

[1] WARNING:root:Variable [BoxPredictor_3/ClassPredictor/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 256, 546]], model variable shape: [[1, 1, 256, 30]]. This variable will not be initialized from the checkpoint.

[1] WARNING:root:Variable [BoxPredictor_4/ClassPredictor/biases] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[546]], model variable shape: [[30]]. This variable will not be initialized from the checkpoint.

[1] WARNING:root:Variable [BoxPredictor_4/ClassPredictor/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 256, 546]], model variable shape: [[1, 1, 256, 30]]. This variable will not be initialized from the checkpoint.

[1] WARNING:root:Variable [BoxPredictor_5/ClassPredictor/biases] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[546]], model variable shape: [[30]]. This variable will not be initialized from the checkpoint.

[1] WARNING:root:Variable [BoxPredictor_5/ClassPredictor/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 546]], model variable shape: [[1, 1, 128, 30]]. This variable will not be initialized from the checkpoint.

[1] WARNING:root:Variable [global_step] is not available in checkpoint

[1] 2022-01-18 00:53:38.619242: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

[1] W0118 00:53:59.300209 Reloader plugin_event_accumulator.py:286] Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.

[1] W0118 00:53:59.300209 140395273336576 plugin_event_accumulator.py:286] Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.

[1] 2022-01-18 00:54:19.306182: W tensorflow/core/framework/allocator.cc:122] Allocation of 46080000 exceeds 10% of system memory.

[1] 2022-01-18 00:54:19.641319: W tensorflow/core/framework/allocator.cc:122] Allocation of 46080000 exceeds 10% of system memory.

[1] 2022-01-18 00:54:19.814552: W tensorflow/core/framework/allocator.cc:122] Allocation of 46080000 exceeds 10% of system memory.

[1] 2022-01-18 00:54:19.847873: W tensorflow/core/framework/allocator.cc:122] Allocation of 46080000 exceeds 10% of system memory.

[1] 2022-01-18 00:54:20.033703: W tensorflow/core/framework/allocator.cc:122] Allocation of 46080000 exceeds 10% of system memory.

[1] 8b397942-b1e5-4dde-a69a-7a4c940396a8: Training complete

[1] Checkpoint update routine terminated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training ending prematurely #289

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

training ending prematurely #289

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions