how to split a string from a column using awk

I am a noob in Linux. I have a file like this:

 col1 col2 col3 ID1234567-DNA_A01 chr1_10203040_T/C gene 0 ID1234568-DNA_A02 chr1_10203050_T/A gene 0 ID1234569-DNA_A03 chr1_10203060_A/G gene 0 ID1234570-DNA_A04 chr1_10203070_C/T gene 0

I want to use only the first column and divide each line into 4 columns:

 #CHROM POS REF ALT 1 10203040 T C 1 10203050 T A 1 10203060 A G 1 10203070 C T

I tried to make:

 awk 'BEGIN{OFS="\t";FS="\t"; print"#CHROM","POS","REF","ALT"} | cut -d' ' -f2- {print substr($1,4,1),substr($1,6}' old_file > new_file

I know I did wrong, but any suggestion would be helpful!Thanks

3 Answers

Maybe you can try like like this:

cut -d " " -f 2 test.txt | awk -F '[_,/]' 'BEGIN{printf "#CHROM \tPOS\tREF\tALT\n"} {printf ("%s\t %s\t %s\t %s\n" ,$1, $2, $3, $4)}'

Here test.txt is name of your file. And if you want to redirect output to file just add > new_file.txt at end of the command.

I'd go with:

awk 'NR>1 {print $2}' file \
| awk -F'[_/]' 'BEGIN{OFS="\t"; print "#CHROM","POS","REF","ALT"}{$1=$1}1'
  • First awk, output the second field only.
  • Second awk, choose [_/] as field separator, print the new Header and the fields. $1=$1 triggers reorganisation of fields, which is necessary as we change the output field separator to \t.
  • You may add | column -t to make the columns in line.

We could do it in one go, but then you need to use split which is more complicated I think.


Output:

#CHROM POS REF ALT
chr1 10203040 T C
chr1 10203050 T A
chr1 10203060 A G
chr1 10203070 C T
0

If you have GNU awk (gawk), then - notwithstanding the advice here - you could consider capturing the parts you want using a regular expression rather than a string split:

$ gawk ' BEGIN{OFS="\t"; print "#CHROM","POS","REF","ALT"} match($2,/chr([0-9])_([0-9]+)_([ACGT])[/]([ACGT])/,a) {print a[1],a[2],a[3],a[4]} ' old_file
#CHROM POS REF ALT
1 10203040 T C
1 10203050 T A
1 10203060 A G
1 10203070 C T

(Other awk implementations have the match function, but the GNU version extends that with a capture group array.)

1

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like