Skip to content

Comments

Fix empty abstract prevents file input#161

Open
saimouu wants to merge 3 commits intomainfrom
fix/empty-abstract-prevents-file-input
Open

Fix empty abstract prevents file input#161
saimouu wants to merge 3 commits intomainfrom
fix/empty-abstract-prevents-file-input

Conversation

@saimouu
Copy link
Collaborator

@saimouu saimouu commented Feb 19, 2026

  • Change validation logic to allow empty abstracts
  • Empty abstracts default to NO_ABSTRACT
  • Empty asbtract count is returned on file upload
  • Warning toast is displayed in front if empty abstracts were detected

- Change validation logic to allow empty abstracts
- Empty abstracts default to NO_ABSTRACT
- Empty asbtract count is returned on file upload
- Warning toast if displayed in front if empty abstracts were detected
@saimouu saimouu linked an issue Feb 19, 2026 that may be closed by this pull request
Copy link
Collaborator

@alehuo alehuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. One minor comment

Comment on lines 1 to 66
@@ -51,4 +63,4 @@ def validate_csv(file_obj: BinaryIO, filename: str) -> List[FileError]:
file_obj.seek(0)
except Exception:
pass
return errors
return errors, empty_abstract_count
Copy link
Collaborator

@alehuo alehuo Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done without the for loop? With large CSV files it can get slow quick. With about 10 000 rows a for loop can take ~3-4 seconds, vectorized operations take milliseconds.

I would do it like this:

mask = df["something"].isna()
count_of_something_na = mask.sum()
df.loc[mask, "something"] = None

Also - NaN vs None. I'm not sure which one is better in this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense. I'll look into it.

The same df.iterrows() loop is done in process_files().

Could also try to remove the duplicate file reading, maybe just modifying validate_csv() to also return the data could make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Empty Abstract Prevent File input

2 participants