tr 'a-z' 'A-Z' <allmysequences
MacOS, Windows, Unix uses different newlines characters that are non compatible.
- UNIX: LF (aka \n)
- Windows: CR+LF (aka \r\n)
- Mac OSX: LF (aka \n)
- Macos: CR (aka \r)
those CR characters when viewing files may display as ^M or <cr> at the end of each line or as a second line break
this can cause problems to your programs that may be unable to interpret newlines correctly.
you can use tr in order to change all newlines to restore your file.
tr '\r' '\n' < crappy_file > correct_file
tr -d '\r' < crappy_file > correct_file
cat crappy_file | tr '\r' '\n' > correct_file
cat crappy_file | tr -d '\r' > correct_file
or the program dos2unix
dos2unix crappy_file
dos2unix transforms the file inplace.
sed read data (file or from pipe) and return the filtered data on the stdout.
the sed command is very versatile, so we just cover a few of its features.
The three commands to remember are:
sed 's/chromosome/chr/' arrayAnnot.txt
replaces only the 1st occurence of chromosome by 'chr' in each line of arrayAnnot.txtsed 's/chromosome/chr/' arrayAnnot.txt
does the same.sed 's/chromosome/chr/g' arrayAnnot.txt
replaces all occurences of chromosome by 'chr' in arrayAnnot.txt\w
match word (alphanumeric and _)sed -r 's/.*\t(\w*\|\w*\|\w*).*/\1/g' blast2.txt
AK1BA_HUMAN sp|O08782|ALD2_CRIGR 83.23 316 53 0 1 316 1 316 0.0 537
output:
sp:ALD2_CRIGR
We want to create a file containing the sequences from the 10 most similar sequences to il2_human and align them (first step to modelize a sequence by homology).
blastall -p blastp -d uniprot_sprot -i the-input -m8
)clustalw -align -infile=filename
)usually used in combination with -n option
The "-n" option will not print anything unless an explicit request to print is found
sed -n -e'/pattern/p' will print only line containing pattern
Sed commands can be given with no addresses, in which case the command will be executed for all input lines otherwise command will only be executed for input lines which match that address.
sed_play.txt
.
Transform brca.example.illumina.0.1.fastq fastaq file in fasta (try your sed expression on test.fastaq before to use it on the real file)
step by step
filename are strings that can be manipulated thru previous tools (tr and sed) but Unix provides you some builtin tools in order to manipulate and transform filenames in an easy way.
strip directory and suffix from filenames
syntax:
basename filename [suffix]
remove any directory componements from filename. if suffix is specified, also remove the trailing suffix
examples:
basename file.txt => file.txt
basename /xxxxx/yyyyy/zzzzz/file.txt => file.txt
basename /xxxxx/yyyyy/zzzzz/file.txt .txt => file
strip non-directory suffix from file name
examples:
dirname /some/directory/path/to/file => /some/directory/path/to