Using sed or awk to remove near-duplicates

I currently use the following to get as close as I can do to a file

cut -d ' ' -f 3- /var/log/issues.log | sed -E 's/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}//g' | sort -u

So far it gets rid of the timestamp at the start of each line and removes the IP address.

However I'm still left with dozens of line of the format(s)

Failed login from for A
Failed login from for B
Failed login from for C
Failed login from for D
Failed login from for E
Invalid heartbeat 'A' from
Invalid heartbeat 'B' from
Invalid heartbeat 'C' from
Invalid heartbeat 'D' from
Invalid heartbeat 'E' from

How would I further amend my command to take these "near" duplicates away leaving only. A, B, C, D and E could be any string.

Failed login from for
Invalid heartbeat from 

Thanks

2 Reset to default

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like