When the text is presented in digital form, it is relatively easy to find words as we can split the stream on non-word characters. This becomes more complex in spoken language analysis. In this case, segmenters try to optimize a metric, for example, to minimize the number of distinct words in the lexicon and the length or complexity of the phrase (Natural Language Processing with Python by Steven Bird et al, O'Reilly Media Inc, 2009).
Annotation usually refers to parts-of-speech tagging. In English, these are nouns, pronouns, verbs, adjectives, adverbs, articles, prepositions, conjunctions, and interjections. For example, in the phrase we saw the yellow dog, we is a pronoun, saw is a verb, the is an article, yellow is an adjective, and dog is a noun.
In some languages, the chunking and annotation depends on context. For example, in Chinese, 爱江山人 literally translates to love country person and can mean either country-loving person or love country-person...