What about textonyms? What is being counted? Read https://rosettacode.org/wiki/Talk:Textonyms
As to performance, you have other considerations such as is this some embedded Java, single or many core and so on.
https://www.google.com/search?q=java+duplicate+words finds dozens of examples for you to test and report results.