-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Description of the problem
Hi,
I am using HOI to evaluate several metrics of High-order stuff. I have to do it in a for loop for many sets of simulated data and also including surrogate data. The problem is that successive evaluations make the use of memory to increase, even if I try to release memory by deleting variables and using garbage collection.
The code attached will produce the following output when calculating surrogate O-information:
Get list of multiplets
surrogate 0 ready. Memory 399
surrogate 1 ready. Memory 408
surrogate 2 ready. Memory 426
surrogate 3 ready. Memory 437
(...)
surrogate 36 ready. Memory 739
surrogate 37 ready. Memory 747
surrogate 38 ready. Memory 754
surrogate 39 ready. Memory 764
and if I execute again within the same kernel, memory continues to increase:
Get list of multiplets
surrogate 0 ready. Memory 802
surrogate 1 ready. Memory 809
surrogate 2 ready. Memory 822
(...)
surrogate 37 ready. Memory 1106
surrogate 38 ready. Memory 1114
surrogate 39 ready. Memory 1121
The previous is with data of dimension 3 If I do the same with dimension-4 data, I get:
Get list of multiplets
surrogate 0 ready. Memory 6724
surrogate 1 ready. Memory 6747
(...)
surrogate 18 ready. Memory 6916
surrogate 19 ready. Memory 6925
And after a couple of times the thing escalates to
(...)
surrogate 16 ready. Memory 7511
surrogate 17 ready. Memory 7517
surrogate 18 ready. Memory 7527
surrogate 19 ready. Memory 7534
I have seen this under Windows and Linux, in a Desktop PC and in a HPC server. The problem is that, in the HPC server I am running several instances in parallel and after some hours the full RAM of 256Gb collapses.
I know I can use alternative approaches like killing the process and starting a new Python kernel after some time, but still I think this is a dangerous memory leak that should be taken into account in some way.
Steps to reproduce
import numpy as np
import hoi
import os, psutil
import gc
proc=psutil.Process(os.getpid())
def randomShift(data,N=50,cols=True):
"""
Random shifting of time series
Parameters
----------
data : 2D numpy array
T x S numpy array (S x T if cols == False).
N : int, optional
Number of surrogates to generate. The default is 50.
cols : boolean, optional
If true, the first dimension of the array is time. The default is True.
Returns
-------
outSeries : numpy array
N x T x S. (N x S x T if cols==False)
N = number of surrogates, T = time points, S = number of series
"""
if cols:
data2=data.T
else:
data2=np.copy(data)
D,L=data2.shape
outSeries=[]
for i in range(N):
shifts=np.random.randint(1,L,D)
serie=[np.r_[d[s:],d[:s]] for d,s in zip(data2,shifts)]
outSeries.append(serie)
outSeries=np.array(outSeries)
if cols:
outSeries=np.swapaxes(outSeries,1,2)
return outSeries
est = 'kernel'
x_t=hoi.simulation.simulate_hoi_gauss(n_samples=2000)
ent = hoi.core.get_entropy(method=est)
H = ent(x_t.T)
Oinf = hoi.metrics.Oinfo(x_t,verbose=True).fit(minsize=3, method=est).squeeze()
Sinfo = hoi.metrics.Sinfo(x_t,verbose=False).fit(minsize=3, method=est).squeeze()
TC = (Sinfo + Oinf) / 2
DTC = (Sinfo - Oinf) / 2
#%%
surrX=randomShift(x_t,N=40)
surrOinf = []
for s,serie in enumerate(surrX):
OinfoScalc = hoi.metrics.Oinfo(serie,verbose=False)
surrOinf.append(OinfoScalc.fit(minsize=3, method=est).squeeze())
print(f"surrogate {s} ready. Memory",proc.memory_info()[0]//1024//1024)
del OinfoScalc
gc.collect()Expected results
More or less constant memory usage
Actual results
(given above)
Additional information
n/a