-
Notifications
You must be signed in to change notification settings - Fork 0
Basic Linux Unix Commands
Vanderll edited this page Oct 16, 2012
·
6 revisions
- Made for people that don't have a clue when it comes to computers (e.g. Lauren V)
- View the top 15 rows of your file (note if you leave out -15, the default is top 10 rows)
head -15 filename.psl
- Motivating Example: The Rat genome was updated and BLAT aligned the probes to the genome. Need to extract probe identifier and location so a mask can be applied to the probes
- Our data looks like:
- Using awk to create a bed file, want to get the chromosome, start position, stop position, probeset identifiers (probe ID, then x and y coordinated on array), and strand
awk 'NR>5{split($14,a,";"); split(a[1],b,":"); split(a[2],c,":"); print $10"\t"$12"\t"$13"\t"b[3]"\t"c[1]"\t"c[2]"\t"$9}' output4.psl > output4.bed
- Go through command step by step
- awk: call the library awk
- NR>5: NR stands for number of records > 5 (i.e. our headers from the psl file take up the first 5 rows, so read data after that)
- split($14, a, ";") make an array called a by splitting column 14 via semicolon. This leaves us with an array with 2 columns.
- split(a[1], b, ":") make an array called b by splitting array a column 1 by colon. This leaves us with 3 columns. The first 2 are all the same (ID the array) and the last is the probe ID
- split(a[2], c, ":") make an array called c by splitting array a column 2 by colon. This leaves us with 2 columns, one for the x and one for the y coordinates of probe on the array
- print $10"\t"...rest of code. This combines the columns and extracted info we want into 1 file names output4.bed
- If it takes a long time to load data into program and just want to check if you have the correct number of rows use:
wc -l filename.bed
- wc stands for word count
- - l stands for lines
chmod a+rwx filename
- chmod stands for change mode
- a+ stands for all users
- r read
- w write
- x executable