You can include the chars which may be part of your word in the sed ignore block.
cat filename | sed -e 's/[^a-zA-Z0-9.]/ /g' | xargs | tr ' ' '\n' | sort | uniq -c | sort -nr -k1,2
voila you should get the most occured word on top.
This is output for the above text.
4 the
2 xargs
2 word
2 to
2 space
2 sort
2 sed
2 or
2 one
2 on
2 of
2 chars
1 zA
1 your
1 you
1 words
1 which
1 voila
1 uniq
1 trying
1 trims
1 tr
1 top.
1 they
1 than
1 split
1 spaces
1 should
1 separately
1 s
1 replaced
1 part
1 occured
1 nr
1 normal
1 non
1 n
1 most
1 more
1 may
1 include
1 in
1 ignore
1 get
1 g
1 filename
1 enterkey
1 e
1 dot
1 char.
1 cat
1 can
1 c
1 by
1 block.
1 be
1 basis
1 are
1 aplhanumeric
1 and
1 am
1 a
1 Z0
1 You
1 I
1 Here
1 9.
No comments:
Post a Comment