print lines matching a pattern
pattern is expressed must be an exact text chain or a regular expression describing text in a formalised syntax that consists in a sequence of characters that define a search pattern.
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
grep -E or egrep
Regular Expressions: sequences of characters to express a pattern
useful metacharacters:
- . any character
- [] bracket expressions: defines a set of characters
- [abc] : matches character "a" or "b" or "c"
- [a-z] : matches on character in the range from "a" to "z", ie "a" or "b" ... or "z"
- [:alnum:] : matches any letter or numberic characters
- [:alpha:] : matches any letter
- [:digit:] : matches any number
- [:lower:] : matches any letter in range [a-z]
- [:upper:] : matches any letter in range [A-Z]
- [:punct:] : matches any punctuation character
- [:space:] : matches any space character (space, tab, ...)
- Anchoring.
- ^ represent start of the string.
- $ represent end of the string.
- Repetition.
- ? match at most once the preceding element. ie A? represent zero or 1 A => "" or "A"
- + match one or more times the preceding element. ie A+ represent 1 A to any number of successive A => "A" or "AA" or "AAA" etc ...
- * match zero or more times the preceding element. ie a* represent 0 to any number of successive A => "" or "A" or "AA" or "AAA" etc ...
- {n} match exactly n times the preceding element. ie A{5} represent exactly 5 successives A => "AAAAA"
- {m,n} match m to n times the preceding element. ie A{2,5} represent from 2 to 5 successives A => "AA" or "AAA" or "AAAA" or "AAAAA"
- Logical OR.
- (one|two) match the strings "one" or "two"
- more information on regular expression in man grep
warning
- remember some characters have a special meaning for the shell, eg >, >>, 2>, *, ?, |, ...
- remember shell evaluation is executed before command execution
- example: what do you expect in the following wrong examples:
- grep > file
- grep * file
regular expression and shell share some metacharacters that do not have the same meaning, when you want to search for those characters you have to protect them.
hint: always protect your search pattern by enclosing it in double quotes "
cut OPTION... [FILE]
who | cut -d ' ' -f 1
find . -atime 0 | cut -d '/' -f 2,3
sort [OPTION]... [FILE]
who | cut -d ' ' -f 1 | sort | uniq -c | sort -k 1n,2
We ran a blast with -m8 output. So the following fields are displayed
separated by tab
.
report or omit repeated lines (Filter adjacent matching lines)