-
Notifications
You must be signed in to change notification settings - Fork 106
Open
Description
Hey! I setup the conda env, repository and pretrained model as required. However, when I try to train the model for an extra epoch (as a sanity check), I see Nan values in the loss and updates.
I increase max_epochs to 151 but keep other params the same!
train:
loss: "xentropy" # must be either xentropy or iou
max_epochs: 151
lr: 0.05 # sgd learning rate
wup_epochs: 0 # warmup during first XX epochs (can be float)
momentum: 0.9 # sgd momentum
lr_decay: 0.99 # learning rate decay per epoch after initial cycle (from min lr)
w_decay: 0.0001 # weight decay
batch_size: 1 # batch size
report_batch: 50 # every x batches, report loss
report_epoch: 1 # every x epochs, report validation set
epsilon_w: 0.001 # class weight w = 1 / (content + epsilon_w)
save_summary: False # Summary of weight histograms for tensorboard
save_scans: True # False doesn't save anything, True saves some
# sample images (one per batch of the last calculated batch)
# in log folder
show_scans: False # show scans during training
workers: 1
Here's the log I see:
Lr: 1.326e-03 | Update: 9.066e-01 mean,2.853e-01 std | Epoch: [150][0/4541] | Time 771.238 (771.238) | Data 0.124 (0.124) | Loss 8.3961 (8.3961) | acc 0.138 (0.138) | IoU 0.016 (0.016) | [40 days, 12:46:27]
../../tasks/semantic/modules/trainer.py:453: RuntimeWarning: invalid value encountered in float_scalars
update_ratios.append(update / max(w, 1e-10))
Lr: 2.236e-03 | Update: nan mean,nan std | Epoch: [150][50/4541] | Time 0.266 (15.390) | Data 0.039 (0.047) | Loss nan (nan) | acc 0.000 (0.004) | IoU 0.000 (0.000) | [19:15:13]
Lr: 2.236e-03 | Update: nan mean,nan std | Epoch: [150][100/4541] | Time 0.292 (7.906) | Data 0.044 (0.045) | Loss nan (nan) | acc 0.000 (0.002) | IoU 0.000 (0.000) | [9:48:24]
Lr: 2.235e-03 | Update: nan mean,nan std | Epoch: [150][150/4541] | Time 0.285 (5.380) | Data 0.050 (0.046) | Loss nan (nan) | acc 0.000 (0.001) | IoU 0.000 (0.000) | [6:36:59]
Any ideas?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels