Skip to content

"DataLoader worker exited unexpectedly" #22

@robertskmiles

Description

@robertskmiles

Related to #13, in the sense that this issue is made worse by the indexing process not being resumable.

When indexing a large directory of various types of files (with 69834 images), I get this error:

Traceback (most recent call last):
  File "/home/rob/.local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 872, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/usr/lib/python3.9/multiprocessing/queues.py", line 113, in get
    if not self._poll(timeout):
  File "/usr/lib/python3.9/multiprocessing/connection.py", line 262, in poll
    return self._poll(timeout)
  File "/usr/lib/python3.9/multiprocessing/connection.py", line 429, in _poll
    r = wait([self], timeout)
  File "/usr/lib/python3.9/multiprocessing/connection.py", line 936, in wait
    ready = selector.select(timeout)
  File "/usr/lib/python3.9/selectors.py", line 416, in select
    fd_event_list = self._selector.poll(timeout)
  File "/home/rob/.local/lib/python3.9/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 541564) is killed by signal: Killed. 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/rob/.local/bin/memery", line 8, in <module>
    sys.exit(__main__())
  File "/home/rob/.local/lib/python3.9/site-packages/memery/cli.py", line 30, in __main__
    app()
  File "/usr/lib/python3.9/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/home/rob/.local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/rob/.local/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/rob/.local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/rob/.local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/rob/.local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/home/rob/.local/lib/python3.9/site-packages/memery/cli.py", line 17, in recall
    ranked = memery.core.queryFlow(path, query=query)
  File "/home/rob/.local/lib/python3.9/site-packages/memery/core.py", line 59, in queryFlow
    dbpath, treepath = indexFlow(root)
  File "/home/rob/.local/lib/python3.9/site-packages/memery/core.py", line 31, in indexFlow
    new_embeddings = image_encoder(crafted_files, device)
  File "/home/rob/.local/lib/python3.9/site-packages/memery/encoder.py", line 18, in image_encoder
    for images, labels in tqdm(img_loader):
  File "/home/rob/.local/lib/python3.9/site-packages/tqdm/std.py", line 1133, in __iter__
    for obj in iterable:
  File "/home/rob/.local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/home/rob/.local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1068, in _next_data
    idx, data = self._get_data()
  File "/home/rob/.local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1034, in _get_data
    success, data = self._try_get_data()
  File "/home/rob/.local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 885, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 541564) exited unexpectedly

I'm guessing that the DataLoader process is being killed by the Linux OOM killer? I have no idea what I can do about that though.

Let me know if there's any other information that would help

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions