Delete pattern matching regex from sed capture group

I'm trying to remove all instances of '_(one number)' from certain strings in a file. So tig00000003_1 should become tig00000003 This is what my test file looks like:

##sequence-region tig00000001_732 1 630
tig00000003_1 Name=tig00000003_1;

I've triedsed -E 's/(tig[0-9]{8}\_[0-9]{1})/ \1(tig[0-9]{8}) /' my_test.txt , which gives:

##sequence-region tig00000001_7(tig[0-9]{8}) 32 1 630 tig00000003_1(tig[0-9]{8}) Name=tig00000003_1;

and this is what I want:

##sequence-region tig00000001_732 1 630
tig00000003 Name=tig00000003;

how can I remove the matched pattern in the capture group,or alternately only keep the match within the capture group?

2 Answers

You could simply replace the '_(one number)' with nothing on any lines that are not comments like so:

sed '/^[^#]/ s/\_[0-9]//g' your_file

The way it works is as follows:

  • Lines not matching comments are identified as those that start with (^) any non # symbol ([^#])
  • Then on those lines substitute any underscore + digit (_[0-9]) with nothing (//) every time that pattern is found on the line (g)

You're pretty close. Use capturing parentheses around the "tig" number

sed -E '/^#/n; s/(tig[0-9]{8})\_[0-9]/\1/g' my_test.txt
# ...............^^^^^^^^^^^^^........^^

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like