Machine Translation: Mindmap

Basic concepts

  • homonymy, ambiguity, polysemy, coreference, anaphora, word order, presupposition, hypothesis, garden path, FAHQMT

History of MT

  • Georgetown experiment
  • ALPAC report

Rule-based MT (RBMT)

  • Classification
    • Direct
      • METEO
    • Transfer
      • Transfer rules
      • PC translator, SYSTRAN
    • Interlingua (KBMT ‒ Knowledge-based MT)
      • Rosetta, KBMT-89
  • Vauquois’ triangle
  • Tokenization
    • Scriptio continua
  • Sentence segmentation
  • Morphology level
    • Morpheme, stem, root, lemma, suffix, infix, prefix, wordform, grammeme
    • Morphological analysis
      • Tagset
      • Universal PoS tags
      • Guesser
    • Morphological disambiguation
      • Tagger
    • Stemming
    • Morphological segmentation
  • Lexical level
    • Listeme, principle of compositionality, homonymy, polysemy
    • Multiword expression (MWE)
    • Named entity (NE)
    • Word sense disambiguation (WSD)
      • Lesk’s algorithm
    • Word sense representation
      • Discrete representation
        • Granularity
        • WordNet, FrameNet, VerbaLex
      • Continuous representation
        • Distributional thesaurus
        • Distributional semantics
    • Lexicons
    • Statistical dictionary
      • Bilingual extraction from parallel data
      • Co-occurrence statistics
      • LogDice
  • Syntax level
    • Grammar
    • Syntactic analysis / parsing
      • Top-down analysis
      • Bottom-up analysis
    • Syntax representation / formalism
      • Constituent (phrasal) structure
      • Dependency
      • Context-free grammar
      • Tree-adnoining grammar
    • Garden path
  • Semantic level
    • Pragmatics
      • Intention
    • Suprasentential features
      • Anaphora
    • Semantic role
    • Frame (FrameNet)
    • Prague Dependency Treebank (PDT)
      • Tectogrammatical level
      • TectoMT

Statistical MT

  • Noisy channel principle
  • Zipf’s law, probability distribution, conditional probability, Bayes’s rule
  • Language model
    • N-grams
    • Chain rule
    • Markov’s rule (assumption)
    • Maximum likelihood estimation
    • Entropy, cross entropy
    • Perplexity
    • Shannon’s game
    • Smoothing
      • Add-one (Laplace), Add-α, Deleted estimation, Good-Turing
    • Interpolation, back-off
    • Hapax legomenon, singleton
    • Out-of-vocabulary, zero-frequency, rare word
  • Translation model
    • Word alignment
    • Lexical translation
    • IBM models I-V
      • Fertility model
      • Null token
      • Expectation-Maximization algorithm
    • Phrase-based translation model
      • Consistent phrase
  • Decoding
    • Beam search
    • Moses
  • Parallel corpora
    • Sentence alignment
      • Gale-Church, Hunalign, Bleualign
    • Word alignment (Giza++, Moses)
    • EUR-Lex, OPUS, Hansards, Europarl, Acquis communautaire, InterCorp, Tatoeba
    • Translation memories (TMX, XLIFF)
      • DGT, MyMemory
    • Comparable corpora

Neural MT

  • 1-of-V coding, one-hot representation
  • Word embeddings, distributed representation
    • skip-gram, CBOW, fasttext
    • sentence embeddings
    • document embeddings
  • Feed-forward model
  • Recurrent NN, bi-RNN
  • Attention mechanism
  • Long short term memory (LSTM)
  • Encoder-decoder model

Hybrid MT

Computer-assisted translation

  • Translation memory
  • SDL Trados

Evaluation of MT

  • Fluency, adequacy, accuracy, intelligibility, correlation, metrics, post-editing, reference translation
  • Interannotator agreement (IAA)
  • Manual evaluation
  • Automatic evaluation
    • BLEU
    • NEVA
    • WAFT
    • TER / HTER
    • Meteor
December 29, 2019 |