Using sed or awk to remove near-duplicates

I currently use the following to get as close as I can do to a file

cut -d ' ' -f 3- /var/log/issues.log | sed -E 's/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}//g' | sort -u

So far it gets rid of the timestamp at the start of each line and removes the IP address.

However I'm still left with dozens of line of the format(s)

Failed login from for A
Failed login from for B
Failed login from for C
Failed login from for D
Failed login from for E
Invalid heartbeat 'A' from
Invalid heartbeat 'B' from
Invalid heartbeat 'C' from
Invalid heartbeat 'D' from
Invalid heartbeat 'E' from

How would I further amend my command to take these "near" duplicates away leaving only. A, B, C, D and E could be any string.

Failed login from for
Invalid heartbeat from

Thanks

2 Sorted by: Reset to default

Fame Burst

Using sed or awk to remove near-duplicates

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

You Might Also Like

Battery Indicator fixes answer clarification

Defeating Moonlight Butterfly in Dark Souls

Is there a way to create a private lobby?

Can I invite friends to Steam?