Skip to content

Succesive evaluations of metrics eating RAM #82

@patoorio

Description

@patoorio

Description of the problem

Hi,

I am using HOI to evaluate several metrics of High-order stuff. I have to do it in a for loop for many sets of simulated data and also including surrogate data. The problem is that successive evaluations make the use of memory to increase, even if I try to release memory by deleting variables and using garbage collection.
The code attached will produce the following output when calculating surrogate O-information:

Get list of multiplets
surrogate 0  ready. Memory 399                             
surrogate 1  ready. Memory 408
surrogate 2  ready. Memory 426
surrogate 3  ready. Memory 437

(...)

surrogate 36  ready. Memory 739
surrogate 37  ready. Memory 747
surrogate 38  ready. Memory 754
surrogate 39  ready. Memory 764

and if I execute again within the same kernel, memory continues to increase:

Get list of multiplets
surrogate 0  ready. Memory 802                             
surrogate 1  ready. Memory 809
surrogate 2  ready. Memory 822
(...)
surrogate 37  ready. Memory 1106
surrogate 38  ready. Memory 1114
surrogate 39  ready. Memory 1121

The previous is with data of dimension 3 If I do the same with dimension-4 data, I get:

Get list of multiplets
surrogate 0  ready. Memory 6724
surrogate 1  ready. Memory 6747
(...)
surrogate 18  ready. Memory 6916
surrogate 19  ready. Memory 6925

And after a couple of times the thing escalates to

(...)
surrogate 16  ready. Memory 7511
surrogate 17  ready. Memory 7517
surrogate 18  ready. Memory 7527
surrogate 19  ready. Memory 7534

I have seen this under Windows and Linux, in a Desktop PC and in a HPC server. The problem is that, in the HPC server I am running several instances in parallel and after some hours the full RAM of 256Gb collapses.
I know I can use alternative approaches like killing the process and starting a new Python kernel after some time, but still I think this is a dangerous memory leak that should be taken into account in some way.

Steps to reproduce

import numpy as np
import hoi
import os, psutil
import gc

proc=psutil.Process(os.getpid())

def randomShift(data,N=50,cols=True):
    """
    Random shifting of time series

    Parameters
    ----------
    data : 2D numpy array
        T x S numpy array (S x T if cols == False).
    N : int, optional
        Number of surrogates to generate. The default is 50.
    cols : boolean, optional
        If true, the first dimension of the array is time. The default is True.

    Returns
    -------
    outSeries : numpy array
        N x T x S.  (N x S x T if cols==False)
        N = number of surrogates, T = time points, S = number of series

    """
    if cols:
        data2=data.T
    else:
        data2=np.copy(data)
    D,L=data2.shape
    outSeries=[]
    for i in range(N):
        shifts=np.random.randint(1,L,D)
        serie=[np.r_[d[s:],d[:s]] for d,s in zip(data2,shifts)]
        outSeries.append(serie)
    outSeries=np.array(outSeries)
    if cols:
        outSeries=np.swapaxes(outSeries,1,2)
    return outSeries 
   
est = 'kernel'

x_t=hoi.simulation.simulate_hoi_gauss(n_samples=2000)

ent = hoi.core.get_entropy(method=est)
H = ent(x_t.T)
Oinf = hoi.metrics.Oinfo(x_t,verbose=True).fit(minsize=3, method=est).squeeze()
Sinfo = hoi.metrics.Sinfo(x_t,verbose=False).fit(minsize=3, method=est).squeeze()

TC = (Sinfo + Oinf) / 2
DTC = (Sinfo - Oinf) / 2


#%%
surrX=randomShift(x_t,N=40)
surrOinf = []

for s,serie in enumerate(surrX):
    OinfoScalc = hoi.metrics.Oinfo(serie,verbose=False)
    surrOinf.append(OinfoScalc.fit(minsize=3, method=est).squeeze())
    print(f"surrogate {s}  ready. Memory",proc.memory_info()[0]//1024//1024)
    
    del OinfoScalc
    gc.collect()

Expected results

More or less constant memory usage

Actual results

(given above)

Additional information

n/a

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions