Skip to content

Nabil-Islam/DNA-lib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

DNA Assesment

Description: A program that takes a Nucleic Acid input from the user, verifies the sequence as DNA or RNA and returns possible positions of the open reading frame which may help determine if the sequence belongs to a gene, this can help determine if, let's say an RNA molecule submitted is a transcript for a protein coding gene, non-coding RNA or some sort of abherrant RNA transcript which may indicate some sort of malfunction or mutation in the cell's transcription complexes. If the user sets the molecule type as DNA, then the main function will first verify that the base pairs are valid DNA nucleotides, and then will provide the RNA transcript by transcribing the inputted DNA sequence, this is then followed by information regarding the position of the start codon and stop codon in the given sequence. If the user inputs an RNA sequence, then the program will verify the RNA sequence in a similar fashion to the DNA verification, however, this function will look for Uracil bases instead of Thymine bases, and then will provide the start and stop codons of the transcript. This information will help automate the process of Nucleic Acid sequence analysis and will aid the user to make inferences about the coding-nature of the given sequence.

The main function works quite simply and can be migrated to web application format. Firstly it prompts the user for the sequence to be analysed, and then follows to replace the all the whitespace in the given sequence with nothing, so when verifyig the sequence, the verify_DNA function does not wrongly return False. Followed by that, the main function prompts the user for the "NA_type" which stands for the Nucleic Acid type and it's main function is to store whether the given function should be analysed as DNA or RNA molecule. Then the main function checks if verify_DNA function returns False, which it would then prompt the user to recheck their input and ensure that it includes all the valid DNA nucleotides. If verify_dna return false, then the main function prints to the terminal that the DNA sequence is valid by printing "True". Afterwards, the DNA sequence is provided to the "transcribe" function where the DNA sequence is converted to an RNA sequence by switching the all the thymine nucleotides to uracil. Afterwards this RNA sequence is inputted to the "find_start" and "find_stop" functions which return the locations of the first start codon detect and the first stop codon detected to the user and states that the open reading frame is likely between these positions in the given nucleic acid sequence. In the case that an RNA molecule is submitted, the main function will similarly verify the RNA molecule and then follow on to find the start and stop codons, skipping the transcribe function as this is already an RNA sequence.

Under the class defined as "NA" abbreviated from Nucleic Acid, the first function is Verify_DNA. The arguments of this function is the sequence string and the NA_type string. Firstly the program checks if the NA_type is DNA, returning false if the wrong NA_type is provided. Then the sequence is reassigned using the .upper() method to provide case insensitivity to the function. Then the function initialises a counter, and iterates through all the characters in sequence. In the event that a base in the given sequence is not a valid DNA nucleotide, the counter will increment by 1, otherwise the it will stay at the value it previously was. Followed by this the function will check if the counter has a positive non-zero value, in the event that it is, the function will return false, informing the user that the sequence they inputted is not valid. The verify_RNA function works exactly the same, except in the array of nucleotides, the 'T' in verify_DNA is replaced with 'U', the function will return True given a valid RNA sequence and False in the case of a DNA sequence.

The transcribe function mainly works to convert the DNA string to an RNA string. The function will replace all 'T' characters with 'U'. The start and stop codon function work very similarly and use a sliding window algorithm of 3 (3 because the length of a codon is 3 nucleotides) and then follows to find the locations of the start and stop codons within the seqeunce and return the position of the codons.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages