I'm trying to remove all instances of '_(one number)' from certain strings in a file. So tig00000003_1 should become tig00000003 This is what my test file looks like:
##sequence-region tig00000001_732 1 630
tig00000003_1 Name=tig00000003_1;I've triedsed -E 's/(tig[0-9]{8}\_[0-9]{1})/ \1(tig[0-9]{8}) /' my_test.txt , which gives:
##sequence-region tig00000001_7(tig[0-9]{8}) 32 1 630 tig00000003_1(tig[0-9]{8}) Name=tig00000003_1;and this is what I want:
##sequence-region tig00000001_732 1 630
tig00000003 Name=tig00000003;how can I remove the matched pattern in the capture group,or alternately only keep the match within the capture group?
2 Answers
You could simply replace the '_(one number)' with nothing on any lines that are not comments like so:
sed '/^[^#]/ s/\_[0-9]//g' your_fileThe way it works is as follows:
- Lines not matching comments are identified as those that start with (
^) any non # symbol ([^#]) - Then on those lines substitute any underscore + digit (
_[0-9]) with nothing(//)every time that pattern is found on the line (g)
You're pretty close. Use capturing parentheses around the "tig" number
sed -E '/^#/n; s/(tig[0-9]{8})\_[0-9]/\1/g' my_test.txt
# ...............^^^^^^^^^^^^^........^^