-
Notifications
You must be signed in to change notification settings - Fork 56
Description
Hey, not sure if you're aware but there's really a lot of garbage there, as OpenSNP is probably not checking what users are uploading.
Here's a normalized list of file types I've found in your db:
- 7-zip
- Apple binary property list
- ASCII text
- bgzip
- bzip2
- Composite-documents
- CSV
- data?
- empty
- Excel
- EXE (???)
- gzip
- JPEG
- Word
- PNG
- RAR
- RSID sidtune (?!)
- Unicode Text
- VCF
- Word
- XML
- Zip
- zlib
I was curious about the EXEs, at least they don't seem to contain virus. One of them are from a tool called "MyHeritage Family Builder Genealogy Software" and all the rest are called "23andme to FASTA".
It shouldn't be too hard to clean it and to put some checks after people are uploading something. I did this analysis using the file linux utility, I think it could probably be done on the server side as well? Watch out for command injection in case. A neat improvement would be to have all the files in the same format.
I'm attaching a list of files with their format: file_type.csv
Also the phenotype section doesn't seem very well monitored as someone created a "naked body phenotype" to use it to share a naked picture of himself. Not sure about the scientific value of that lol