Skip to content

Comments

Fix topFeatures()#70

Merged
cvanderaa merged 3 commits intostatOmics:masterfrom
alyst:fix_top_features
May 28, 2025
Merged

Fix topFeatures()#70
cvanderaa merged 3 commits intostatOmics:masterfrom
alyst:fix_top_features

Conversation

@alyst
Copy link

@alyst alyst commented Apr 7, 2025

  • Fix contrast matrix filtering, so that it does not loose dimensions.
  • Significance filtering is not skipped when sorting is disabled

Copy link
Collaborator

@cvanderaa cvanderaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alyst,

Thank you very much for your PR and sorry for the delayed reaction. I do agree on your fix for the alpha filtering. However, it is not clear to me what problem you attempt to solve by preserving the matrix dimensions (see also my comment in the PR below). Could you provide an example of the issue that you are solving?

R/topFeatures.R Outdated
Comment on lines 52 to 53
contrast <- contrast[rowSums(contrast) != 0, , drop = FALSE]
}

contrast <- contrast[contrast !=0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you attempt to solve here. contrast can only contain 1 column, so why does it matter to use drop = FALSE?

Also, you moved the filtering out of 0 inside the if statement, meaning that now zeros are removed only if contrast is a matrix. I'm not sure whether removing 0 is really necessary, but if it is, this should happen also when contrast is a vector.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you attempt to solve here.

I can provide the exact errors we get before the fix, but I have thought it is quite obvious.

Essentially, it keeps rows filtering introduced in #55, but keeps contrast a matrix as it was before #55.
R automatically squeezes singleton dimensions, so without drop = FALSE contrast matrix becomes a vector and gets stripped of the row names.
The latter is critical -- while the methods like getContrast() attempt to convert the L vector back to the matrix, they still fail, because the names are lost.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is clear. I can reproduce your issue:

data(pe)
pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein")
pe <- msqrob(pe, i = "protein", formula = ~condition)
getCoef(rowData(pe[["protein"]])$msqrobModels[[1]])
## Generate a contrast matrix with only 1 column
L <- makeContrast("conditionc - conditionb=0", c("conditionb", "conditionc"))
topDeProteins <- topFeatures(rowData(pe[["protein"]])$msqrobModels, L)

It returns a table filled with NAs. So, good catch! 🪲

However, regarding your solution, I would be more in favor of converting the matrix into a vector first, and then perform filtering, regardless whether the contrast was first a vector or a matrix. So here is my suggestion:

...
if (is(contrast, "matrix")) {
    if (ncol(contrast) > 1) {
        stop("Argument contrast is matrix with more than one column, only one contrast is allowed")
    }
    ## convert 1-column matrix into a vector (preserving names)
    contrast <- contrast[, 1]
}
## remove unused coefficients
contrast <- contrast[contrast != 0]
...

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's quite standard for R to have contrasts either as a matrix or as a formula(s), so your suggestion would diverge from that practice.
Also, getContrast() and varContrast(), which are called for each model later in the code, will convert the contrasts vector back to a matrix generating some overhead.

Actually, I was going to suggest that it would be easy for topFeatures to support multiple contrasts
(every model-contrast pair generates a row in the unfiltered output; p-value adjustments are done per-contrast or globally).
That's quite frequent use case for more complex experimental designs, and handling it in topFeatures() will reduce overhead of calling getContrast() and varContrast() for each contrast + save the users from writing boilerplate code.
But it would require reverting matrix-to-vector conversion.

Copy link
Collaborator

@cvanderaa cvanderaa May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, again I agree with your reasoning, but then we should enforce contrast to be a matrix. Something like:

...
if (!inherits(contrast, "matrix")) {
    contrast <- as.matrix(contrast)
}
if (ncol(contrast) > 1) {
    stop("Argument contrast is matrix with more than one column, only one contrast is allowed")
}
## remove unused coefficients
contrast <- contrast[contrast != 0, , drop = FALSE]
...

My point is that is doesn't make sense to me to treat a vector or a matrix differently. If we remove 0s, that should happen for both cases. Therefore, I moved the subsetting of 0s outside the if statement.

Regarding testing multiple contrasts, hypothesisTest() (which internally calls topFeatures()) already allows this. The different tables are then stored in the rowData of your input object. But let's discuss this in a dedicated issue if that's not what you meant.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I've updated the PR, so that contrast is converted to a matrix.

@cvanderaa cvanderaa merged commit 83338f7 into statOmics:master May 28, 2025
4 checks passed
@cvanderaa
Copy link
Collaborator

Once more, thanks a lot @alyst for your contribution!

@alyst
Copy link
Author

alyst commented May 28, 2025

@cvanderaa Happy to contribute to msqrob2, thank you for the careful in-depth review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants