lib-text
A little text processing library for Scala.

OverviewThis is a little text processing library which supports language identification, tokenization, stopword filtering and provides some useful helper functions. The tokenization has been tuned to work well with text conventions commonly used in social media such as Twitter, and supports URLs, emoji, hashtags, emails and @-mentions cleanly. Stopword filtering is currently supported for
- German
- English
- Spanish
- French
- Indonesian
- Japanese
- Malay
- Dutch
- Portuguese
- Swedish
- Turkish
- Arabic
More to come.
UsageAdd to your project dependencies:
resolvers += "peoplepattern" at "https://dl.bintray.com/peoplepattern/maven/"libraryDependencies += "com.peoplepattern" %% "lib-text" % "0.3"