-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Hi,
I am trying to understand the logic behind sampenc.
I figured it is a modification of douglas lake's code?
But there seems to be 2 issues:
- the line
p <- A[2]/B[1]I think should bep<-A[M]/B[M-1]
issue 2 is, I don't quite get lake's code (is it correct?).
I tried to follow his argument on https://www.physionet.org/content/sampen/1.0.0/ especially
If a particular run ends up being of length 4, for example, then that means that 1 is added to the count for template matches of length 4. In addition, there are 2 template matches of length 3, 3 of length 2, and 4 of length 1 that need to be added to the corresponding counts.
But I couldn't verify this behaviour using some simple simulated series.
So I
- re-implement Richman and Moorman original algorithm literally
- use
pracma::sample_entropy - re-implement Lake's matlab literally
- modify 'tsfeatures::sampenc` according to point 1 above
and did a comparison
my_sam_en=function(ts,M,r){ ## richmand and moormand literally
N=length(ts)
subts=function(.ts,l,exclude_last=T){
N=length(.ts)
if(exclude_last)return(lapply(1:(N-l),function(i).ts[i:(i+l-1)]))
lapply(1:(N-l+1),function(i).ts[i:(i+l-1)])
}
## all length M sequences, up to begining from N-M
x_M=subts(ts,M,exclude_last = T)
x_Mp1=subts(ts,M+1,exclude_last=F)
count_r=function(xx){
# browser()
d=Vectorize(function(a,b)max(abs(xx[[a]]-xx[[b]])),c('a','b'))
N=length(xx)
O=outer(1:N,1:N,d)
sum(O<(r-1e-10))-nrow(O)
}
C_M_r=count_r(x_M)
C_Mp1_r=count_r(x_Mp1)
-log(C_Mp1_r/C_M_r)
}
modified_sampenc=function (y, M = 6, r = 0.3) ## tsfeatures::sampenc
{
N <- length(y)
lastrun <- numeric(N)
run <- numeric(N)
A <- numeric(M)
B <- numeric(M)
for (i in 1:(N - 1)) {
y1 <- y[i]
for (jj in 1:(N - i)) {
j <- i + jj
if (isTRUE(abs(y[j] - y1) < r)) {
run[jj] <- lastrun[jj] + 1
M1 <- min(M, run[jj])
A[1:M1] <- A[1:M1] + 1
if (j < N)
B[1:M1] <- B[1:M1] + 1
}
else {
run[jj] <- 0
}
}
for (j in 1:(N - i)) {
lastrun[j] <- run[j]
}
}
p <- A[M]/B[M-1]
e <- -log(p)
return(e)
}
lake=function(y,M,r){
n=length(y);
lastrun=rep(0,n);
run=rep(0,n);
A=rep(0,M);
B=rep(0,M);
p=rep(0,M);
e=rep(0,M);
for(i in 1:(n-1)){
nj=n-i;
y1=y[i];
for(jj in 1:nj){
j=jj+i;
if(abs(y[j]-y1)<r){
run[jj]=lastrun[jj]+1;
M1=min(M,run[jj]);
for(m in 1:M1){
A[m]=A[m]+1;
if(j<n)B[m]=B[m]+1;
}
}else{
run[jj]=0;
}
}
for(j in 1:nj){
lastrun[j]=run[j];
}
}
N=n*(n-1)/2;
B=c(N,B[1:(M-1)]);
p=A/B;
e=-log(p);
e[length(e)] ## only return the last value
}
y=sin(1:100)
M=2;r=0.3
c(pracma::sample_entropy(sin(1:100),edim=M,r=r),
my_sam_en(y,M,r),
modified_sampenc(y,M,r),
lake(y,M,r),
tsfeatures::sampenc(y,M,r))
# [1] 0.1006652 0.1006652 0.8065081 0.8065081 0.8065081
M=3;r=0.2
c(pracma::sample_entropy(sin(1:100),edim=M,r=r),
my_sam_en(y,M,r),
modified_sampenc(y,M,r),
lake(y,M,r),
tsfeatures::sampenc(y,M,r))
# [1] 0.0000000 0.0000000 0.1459539 0.1459539 0.9808293
So pracma:;sample_entropy (and my translation of the original paper) more or less agree while strongly disagreeing with lake's code (and thus tsfeatures::sampenc).
But again, I couldn't follow lake's algorithm at all.