17. 11. 2020

Foundations of Statistical Natural Language Processing

by Christopher D. Manning

Sometimes it felt a bit out-dated but the explanations of various algorithms and principles was very good and understanable.

(Church and Mercer 1993: 1

Virginia Electronic Text Center (see the website)
(Friday, December 05, 2014, 03:37 PM, page 66)

learning algorithms can be found in (Dietterich 1998). A good case study, for the exampleof word sense disambiguation, is (Mooney 1996)

Bell et al. (1990) and Witten and Bell (1991) introduce a number of smoothing algorithms for the goal of improving text compression

Chen and Goodman (1996, 1998) presentextensive evaluations of different smoothing algorithms. The conclusionsof (Chen and Goodman 1998) are that a variant of Kneser-Ney back-off smoothing that they develop normally gives the best performance.

only consider coarse-grained distinctions, for example only those that manifest themselves across languages (Resnik and Yarowsky 1998

giving more context contributes little to human disambiguation performance

expressed aptly by Mercer (1993): "one cannot learn a new language by reading a bilingual dictionary

It has been estimated that the average educated person reads on the order of one million words in a year, but hears ten times as many words spoken

It has been estimated that the average educated person reads on the order of one million words in a year, but hears ten times as many words spoken (Sunday, December 14, 2014, 07:15 AM, page 337)

Mu&v t?rocesses/chains/models were first developed by Andrei A. Markov (a student of Chebyshev). Their first use was actually for a linguistic purpose - modeling the letter sequences in works of Russian literature (Markov 1913) - but Markov models were then VISIBLE MARKOV developed as a general statistical tool (Sunday, December 14, 2014, 07:20 AM, page 341)