Skip to content

Include features that have 0 lines added/changed/modified as part of reporting to VUMC REDCap project #363

Open
aldefouw wants to merge 9 commits intovanderbilt-redcap:stagingfrom
aldefouw:add_zero_changes
Open

Include features that have 0 lines added/changed/modified as part of reporting to VUMC REDCap project #363
aldefouw wants to merge 9 commits intovanderbilt-redcap:stagingfrom
aldefouw:add_zero_changes

Conversation

@aldefouw
Copy link
Collaborator

@aldefouw aldefouw commented Aug 14, 2025

Pre-flight Checklist

  • Are all the features in this PR tested?
    They are not. One gap is the API upload. @mmcev106 - can you please review before I try this against API?.

I didn't want to fire off API requests to REDCap projects without another pair of eyes first. If this looks good to you, I can try a test run on one of the REDCap validation projects. My concern was I didn't want to overwrite existing data if I made a mistake.

  • Has the code been formatted for consistency and readability?
  • [N/A] Did you also update related documentation and tooling, such as .readme or tests?

Overview

I incorporated this change so that we'd also push up results where there are zero line changes on a given feature.

The approach was to look at all of the current feature files, feed those into the awk script as an array from a temporary file and loop through those instead of relying on the git log only.

Context

This is something that Alex pointed out as a gap in the current script. This is necessary because the absence of change in a feature is a data point we need for a given release. These are cases where there are no line changes are signs of a stable feature test (or perhaps stable functionality in REDCap).

Screenshots

There are now features with 0s across the board for features that have no changes present .. so it appears to be working :
Screenshot 2025-08-14 at 3 38 57 PM

@aldefouw aldefouw marked this pull request as draft August 14, 2025 20:48
@aldefouw
Copy link
Collaborator Author

aldefouw commented Aug 15, 2025

@mmcev106 - One thing I am debating in hindsight - and indeed this haunted me in the middle of the night last night - is strategy to select the feature files. I am interested to hear your input.

Current strategy was to only find the files that are currently tracked in the staging area and working tree at HEAD:

git ls-files '*.feature'

https://github.com/vanderbilt-redcap/redcap_rsvc/pull/363/files#diff-f87efe6daa40c85affdc015827e182dbeea58cadca17b433d15d441e592667b4R215

I am torn as to whether that is correct. I think it would cover cases moving forward. I was wondering about backwards compatibility for the older versions of REDCap where we also need to push data.

Would it make more sense to include ALL files that were ever feature files and then sort them out through the process I had before?

For example:
git log --pretty=format: --name-only --diff-filter=A | grep '\.feature$' | sort -u

Using git log instead would find all files in git history versus just what you see on screen ....

I think my concern is whether we will run into problems when we attempt to push to a record. Any thoughts?

mmcev106
mmcev106 previously approved these changes Aug 15, 2025
Copy link
Collaborator

@mmcev106 mmcev106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aldefouw, these changes look reasonable to me. Am I understanding your comment correctly that your main concern is with using the API to push data into projects for previous LTSes? If so, that does give me pause as well... Once this is tested & working on our current LTS project, I'd be happy to jump through whatever hoops we need to complete a one-time manual import for older projects.

@aldefouw
Copy link
Collaborator Author

@mmcev106 - The concern is about the data fidelity and whether the strategy is backwards compatible given the feature names change at times.

I think the answer is that it will work - but you have to be careful to checkout the correct Git tag (i.e. V14.5.5-ABC) for the project before you push anything.

Here's a specific scenario I was thinking of:

Picture a feature test that was named
X.Y.Z.1000 in tag V14.5.5-ABC was renamed to A.B.C.1000 in tag V15.0.9-ABC

(The rename is completely unrealistic and arbitrary - so don't get caught up on that.)

The point is this: in Git, that file is now tracked as A.B.C.1000.

So, when you run git ls-files '*.feature', the file will be A.B.C.1000. Therefore, the data associated with that feature file will also push to a record in REDCap called A.B.C.1000.

Why that is a problem:
If we want to go back and push data for V14.5.5-ABC, the record is called X.Y.Z.1000. So we have to be careful not to check that data from the head of the branch.

Instead, we need to checkout the tag respective first.

Reason being: If we are on the newest branch, the script will track the record as A.B.C.1000, which is a problem because it will push up data for a record that did not exist at that point in time.

I think the answer here is that the script is fine and perhaps the best we can do - but you have to be careful. I think if you check out the tag for V14.5.5-ABC before running the script, you will get the data you need.

I mostly just want a sanity check on what I'm saying and making sure I'm not overlooking something obvious.

And, yes, perhaps manually importing the line changes to the legacy projects might make sense. Alternatively, we could try pushing to a test project and see if the records match the legacy project.

@mmcev106
Copy link
Collaborator

@aldefouw, this is the type of thing for which I simply can't rely on my memory...which is why I'm quicker than most to automate such things 😂 🤦. Once this script is complete, should we use the API to pull the project name and make sure the LTS version matches the checked out tag before doing anything? If you like that idea, I'd be happy to implement it if you'd prefer.

@aldefouw
Copy link
Collaborator Author

@aldefouw, this is the type of thing for which I simply can't rely on my memory...which is why I'm quicker than most to automate such things 😂 🤦. Once this script is complete, should we use the API to pull the project name and make sure the LTS version matches the checked out tag before doing anything? If you like that idea, I'd be happy to implement it if you'd prefer.

@mmcev106 - I like that idea, and I completely agree we should automate.

If the scripts are made to follow the desired workflow it should reduce the chances of accidental error.

I think the line changes code is correct - so, if you'd like to take a crack at the REDCap project integration piece, I'm 100% in favor. It probably makes sense for you to do that piece since you'll be assisting with that component moving forward.

Do you want me to merge this or do you want to figure out that piece before we create a merge?

@mmcev106
Copy link
Collaborator

mmcev106 commented Aug 18, 2025

@aldefouw, I'm happy to take a crack at it and have you review it if you don't mind. How would you feel about me removing the Dates mode/choice so that we must use tags (to enforce verifications against the projects we're populating)?

@aldefouw
Copy link
Collaborator Author

@mmcev106 - I'm fine with removing the dates option if you think that's best. Dates seemed like a good idea but perhaps it isn't and sticking to tags is better.

I assume you'd move away from the --since= --until= to something like the following?

git log V14.5.5-ABC..V15.0.9-ABC --pretty=tformat: --numstat --ignore-space-change

Side note: I found that if you want to see what FILES were renamed between two tags, this command is useful:

git diff --name-status --diff-filter=R V14.5.5-ABC V15.0.9-ABC

I wonder if letting people know what files were renamed is useful? That was something I wondered about anyway.

@mmcev106
Copy link
Collaborator

@aldefouw, do the changes in #388 address all the concerns here? If so, is this ready for review?

@mmcev106
Copy link
Collaborator

@aldefouw, to more directly answer your question about file renaming, I'm just not sure. Rather than try to imagine all possibilities (and likely fail), I figured a fail fast approach on the unexpected scenario where a record didn't exist when expected would be a good safety net that would catch this type of issue in real-time so that we could consider actual cases when they arise (see #388). I'm curious what you think.

@aldefouw
Copy link
Collaborator Author

@mmcev106 - Great thought on the "Fail Fast" approach - completely appropriate here. I think the code you added bolsters the robustness of these scripts in a way that should block my concerns but doesn't add huge and complex technical debt for future authors to wade through.

Am I right in assuming that your proposed changes would need to be merged with this current PR in order for the "0 lines added/changed/modified" functionality to be present on Staging?

I think you took care of my concern with uploading bad data (or irrelevant data) to a REDCap project, but I wanted to ensure I understood next steps to complete this work.

I was picturing something like this:

  1. Your merge your PR to Staging
  2. I Rebase my PR against latest version of Staging (to pull in the functionality)
  3. I submit finalized PR for merge request and it's merged into Staging

Does that sound about right ? Open to a different workflow if I'm missing something.

@mmcev106
Copy link
Collaborator

@aldefouw, merging #388 then this PR makes sense to me. After #388 is merged, it's probably prudent to merge staging into this branch, but even that might be optional if there are no conflicts 🤞.

@mmcev106 mmcev106 changed the base branch from staging to v15.5.7 August 22, 2025 13:46
@mmcev106 mmcev106 changed the base branch from v15.5.7 to staging August 22, 2025 13:46
@mmcev106
Copy link
Collaborator

@aldefouw, #388 has been merged. I then merged staging into this branch and it didn't have any conflicts. I'm a little confused as to why github still says there are conflicts above 🤷. Anyway, I think this branch is ready to finalize.

@aldefouw aldefouw marked this pull request as ready for review August 22, 2025 20:53
@mmcev106
Copy link
Collaborator

I see the latest merged fixed the conflict! I must have forgotten to pull before merging staging 🤦.

mmcev106 and others added 4 commits August 27, 2025 08:41
This is POSIX-compliant so should work across all OSes.

Printf always interprets the escapes correctly.  `echo -e` on the Mac just prints "-e" to screen so this is a better approach.
Adding three spaces should do the trick in terms of aligning the column since we're counting the number of features (currently 248).  I would imagine that features should stay in the 3 digit range for a while.
@aldefouw
Copy link
Collaborator Author

@mmcev106 - I think our script is doing reasonable things now.

However, I'm running into a problem.

In my local tests (against a local VM copy of the REDCap VUMC project), I found that your fail fast approach caught a problem (which is great).

The problem caught is related to one of the _NewManual suffixed feature tests.

Error:

Failing since record with ID C.6.11.0400. not found.  
There may be a bug in this script trying to create a record that shouldn't exist in this project.

In the project, you can see this record:
Screenshot 2025-08-27 at 9 17 51 AM

Our script is expecting a record for C.6.11.0400. - however there is no record for C.6.11.0400. ... which explains why we are getting the error.

Next Steps?

I am not clear on what the fix is here. I do notice that C.6.11.0400. is a REDUNDANT test. Without more context, my initial thought is we might be able to ignore pushing data to those feature tests since they are not actually run as part of automation ....

Regardless, it appears that in most cases (if not all?) there is a "regular" record for a feature and then a duplicate version that is suffixed with _NewManual.

For example - both of these records exist in the project:
C.3.30.0900.
C.3.30.0900._NewManual

However, this is not the case for C.6.11.0400.

Do we know why that might be? Is that simply something that was missed or is that intentional part of the process?

This seems like it's quickly veering into questions that are perhaps in the realm of Baker and Bosler rather than developers ... thoughts?

@mmcev106
Copy link
Collaborator

@aldefouw, I believe the project should also contain normal records for all NewManual records. @4bbakers, it looks like we might be missing a normal record for C.3.30.0400., which is preventing our changed lines upload script from finishing. Is it OK to create that record now? If so, would you me to give it a go? I could try to complete it similarly to C.3.30.0300., which is also redundant.

@4bbakers
Copy link
Collaborator

Thanks for catching this. For C.6.11.0400., that’s expected — we didn’t release automation for that package, so there is no “normal” record (only the _NewManual). The convention is:

  • If a feature was released in automation → it will have both the normal and _NewManual records.
  • If it was not released in automation → only the _NewManual record exists.

So for now, it’s fine to skip this one. It will show up in the next package release once automation is included.

4bbakers
4bbakers previously approved these changes Aug 28, 2025
This should give us the zeroes across the board we're after for no change features
@mmcev106
Copy link
Collaborator

mmcev106 commented Oct 6, 2025

This conversation is a bit out of date. We ran this by @4bbakers again. After hearing how it affects us, she believes we should be able to find a way to avoid having NewManual records next time, and just store that information in a field on the normal record for each feature instead.

@aldefouw, let me know if there's anything I can do to help with this.

@aldefouw
Copy link
Collaborator Author

aldefouw commented Oct 6, 2025

@mmcev106 - That is good news - I was unaware that decision had changed.

Would it make sense to remove the ugly brute force conditional that is currently part of this PR then?

d0f11f9#diff-f87efe6daa40c85affdc015827e182dbeea58cadca17b433d15d441e592667b4R154

It seems like if the script fails (and "NewManual" convention is out of the discussion), we should instead fix the REDCap project records themselves - or at least investigate why it failed - rather than brute forcing it again.

@mmcev106
Copy link
Collaborator

mmcev106 commented Oct 6, 2025

@aldefouw, when we discussed this last I believe we had decided to brute force it this time around, but plan to rework the project to avoid needing new NewManuals for next the LTS.

@mmcev106
Copy link
Collaborator

mmcev106 commented Nov 3, 2025

@aldefouw, after our last weekly call I suspect NewManuals might be here to stay. I believe we can simply ignore them and only interact with records without that suffix. Let me know if you have any questions/concerns about that. Is there anything else I can do to help finalize these changes?

@mmcev106
Copy link
Collaborator

mmcev106 commented Dec 2, 2025

@aldefouw, no rush on this, but let me know if you'd like me to take it over.

@aldefouw
Copy link
Collaborator Author

aldefouw commented Dec 2, 2025

@aldefouw, no rush on this, but let me know if you'd like me to take it over.

@mmcev106 - Please feel free. I'm very underwater from the special project here at UW so I'll put this in your extremely capable hands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants