`sampenc` implementation

Hi,

I am trying to understand the logic behind `sampenc`.
I figured it is a modification of douglas lake's code?
But there seems to be 2 issues:
1. the line `p <- A[2]/B[1]` I think should be `p<-A[M]/B[M-1]`

issue 2 is, I don't quite get lake's code (is it correct?).
I tried to follow his argument on https://www.physionet.org/content/sampen/1.0.0/ especially 

> If a particular run ends up being of length 4, for example, then that means that 1 is added to the count for template matches of length 4. In addition, there are 2 template matches of length 3, 3 of length 2, and 4 of length 1 that need to be added to the corresponding counts. 

But I couldn't verify this behaviour using some simple simulated series.

So I 

1. re-implement Richman and Moorman original algorithm _literally_
2. use `pracma::sample_entropy`
3. re-implement Lake's matlab _literally_
4. modify 'tsfeatures::sampenc` according to point 1 above

and did a comparison
```
my_sam_en=function(ts,M,r){  ## richmand and moormand literally
  N=length(ts)
  subts=function(.ts,l,exclude_last=T){
    N=length(.ts)
    if(exclude_last)return(lapply(1:(N-l),function(i).ts[i:(i+l-1)]))
    lapply(1:(N-l+1),function(i).ts[i:(i+l-1)])
  }
  ## all length M sequences, up to begining from N-M
  x_M=subts(ts,M,exclude_last = T)
  x_Mp1=subts(ts,M+1,exclude_last=F)
  
  count_r=function(xx){
    # browser()
    d=Vectorize(function(a,b)max(abs(xx[[a]]-xx[[b]])),c('a','b'))
    N=length(xx)
    O=outer(1:N,1:N,d)
    sum(O<(r-1e-10))-nrow(O)
  }
  C_M_r=count_r(x_M)
  C_Mp1_r=count_r(x_Mp1)
  
  -log(C_Mp1_r/C_M_r)  
}

modified_sampenc=function (y, M = 6, r = 0.3)   ## tsfeatures::sampenc
{
  N <- length(y)
  lastrun <- numeric(N)
  run <- numeric(N)
  A <- numeric(M)
  B <- numeric(M)
  for (i in 1:(N - 1)) {
    y1 <- y[i]
    for (jj in 1:(N - i)) {
      j <- i + jj
      if (isTRUE(abs(y[j] - y1) < r)) {
        run[jj] <- lastrun[jj] + 1
        M1 <- min(M, run[jj])
        A[1:M1] <- A[1:M1] + 1
        if (j < N) 
          B[1:M1] <- B[1:M1] + 1
      }
      else {
        run[jj] <- 0
      }
    }
    for (j in 1:(N - i)) {
      lastrun[j] <- run[j]
    }
  }
  p <- A[M]/B[M-1]
  e <- -log(p)
  return(e)
}
lake=function(y,M,r){
  n=length(y);
  lastrun=rep(0,n);
  run=rep(0,n);
  A=rep(0,M);
  B=rep(0,M);
  p=rep(0,M);
  e=rep(0,M);
  for(i in 1:(n-1)){
    nj=n-i;
    y1=y[i];
    for(jj in 1:nj){
      j=jj+i;      
      if(abs(y[j]-y1)<r){
        run[jj]=lastrun[jj]+1;
        M1=min(M,run[jj]);
        for(m in 1:M1){
          A[m]=A[m]+1;
          if(j<n)B[m]=B[m]+1;
        }           
      }else{
        run[jj]=0;
      }
    }
    for(j in 1:nj){
      lastrun[j]=run[j];    
    }
  }
  N=n*(n-1)/2;
  B=c(N,B[1:(M-1)]);
  p=A/B;
  e=-log(p);
  e[length(e)] ## only return the last value
}
y=sin(1:100)
M=2;r=0.3
c(pracma::sample_entropy(sin(1:100),edim=M,r=r),
  my_sam_en(y,M,r),
  modified_sampenc(y,M,r),
  lake(y,M,r),
  tsfeatures::sampenc(y,M,r))
# [1] 0.1006652 0.1006652 0.8065081 0.8065081 0.8065081
 
M=3;r=0.2
c(pracma::sample_entropy(sin(1:100),edim=M,r=r),
  my_sam_en(y,M,r),
  modified_sampenc(y,M,r),
  lake(y,M,r),
  tsfeatures::sampenc(y,M,r))
# [1] 0.0000000 0.0000000 0.1459539 0.1459539 0.9808293
```

So `pracma:;sample_entropy` (and my translation of the original paper) more or less agree while strongly disagreeing with lake's code (and thus `tsfeatures::sampenc`).

But again, I couldn't follow lake's algorithm at all. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`sampenc` implementation #27

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

sampenc implementation #27

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`sampenc` implementation #27