Machine Translation – Rule-based Systems

Go back to Introduction or forward to Statistical MT.

Rule-based Machine Translation

Knowledge-based Machine Translation

KBMT classification

The only types of MT until 90s.

Direct translation

Direct translation

MT with interlingua

Rosetta

It should be stressed that the isomorphy and not the interlinguality is the primary characteristic of our approach.

Two sentences are considered translations of each other if they have the same semantic derivation trees, i.e. corresponding syntactic derivation trees.

Rosetta2Rosetta3Rosetta4
release198519881991
speed1-3 words/sec??
dictionary5,00090,000?
SLNL, ENNL, ENNL, EN
TLNL, ENNL, EN, ESNL, EN, ES

KBMT-89

kbmt-89-scheme

Nirenburg, Sergei. Knowledge-based machine translation. Machine Translation 4.1 (1989): 5-24.

Transfer translation

PC translator (LangSoft)

img

Systran

img

Interlingua vs. transfer

img

Source language analysis

Tokenization

Obstacles of tokenization

Scriptio continua

Thai

What is a word?

Tokenization

Sentence segmentation

Obstacles of sentence segmentation

Morphological level

Morphology

Morphologic level

Morphologic analysis

Morphological tags, tagset

DEMO: tag list

BNC tags

head -n 10000 VERT |\
grep -v "^<" |\
cut -f3 |\
sort |\
uniq -c |\
sort -rn

Morphological polysemy

Morphological disambiguation

Statistical disambiguation

Rule-based disambiguation

Morphologic segmentation

Guesser

Morphological disambiguation—example

slovoanalýzydisambiguace
Pravidelnék2eAgMnPc4d1, k2eAgInPc1d1, k2eAgInPc4d1, k2eAgInPc5d1, k2eAgFnSc2d1, k2eAgFnSc3d1, k2eAgFnSc6d1, k2eAgFnPc1d1, k2eAgFnPc4d1, k2eAgFnPc5d1, k2eAgNnSc1d1, k2eAgNnSc4d1, k2eAgNnSc5d1, ... (+ 5)k2eAgNnSc1d1
krmeník2eAgMnPc1d1, k2eAgMnPc5d1, k1gNnSc1, k1gNnSc4, k1gNnSc5, k1gNnSc6, k1gNnSc3, k1gNnSc2, k1gNnPc2, k1gNnPc1, k1gNnPc4, k1gNnPc5k1gNnSc1
jek5eAaImIp3nS, k3p3gMnPc4, k3p3gInPc4, k3p3gNnSc4, k3p3gNnPc4, k3p3gFnPc4, k0k5eAaImIp3nS
prok7c4k7c4
správnýk2eAgMnSc1d1, k2eAgMnSc5d1, k2eAgInSc1d1, k2eAgInSc4d1, k2eAgInSc5d1, ... (+ 18)k2eAgInSc4d1
růstk5eAaImF, k1gInSc1, k1gInSc4k1gInSc4
důležiték2eAgMnPc4d1, k2eAgInPc1d1, k2eAgInPc4d1, k2eAgInPc5d1, k2eAgFnSc2d1, k2eAgFnSc3d1, k2eAgFnSc6d1, k2eAgFnPc1d1, k2eAgFnPc4d1, k2eAgFnPc5d1, k2eAgNnSc1d1, k2eAgNnSc4d1, k2eAgNnSc5d1, ... (+ 5)k2eAgNnSc1d1

Universal POS tags

TAGMeaning
VERBverbs (all tenses and modes)
NOUNnouns (common and proper)
PRONpronouns
ADJadjectives
ADVadverbs
ADPadpositions (prepositions and postpositions)
CONJconjunctions
DETdeterminers
NUMcardinal numbers
PRTparticles or other functional words
Xother: foreign words, typos, abbreviations
.punctuation

Mapping for cca 25 languages (with tree banks)

Guessing POSes from gramemes

ENCZmeaning
-s3rd person, sing., present simple
-ed-al, -l, -en.past tense
-ing-(ov)ánípresent continuous
-en-en(.)past participle
-s-y, -i, -ové, -aplural
-’sov(o, a, y)possession
-er-šícomparative
-estnej-, -šísuperlative
you-’spronoun

A problem: myší, west, fotbal, … → myšám, wer, fotbala, božit

Brill’s tagger

Problems of MA, POSes

Morphology—summary

Lexical level

Dictionaries in MT

Polysemy in dictionaries

Smooth sense transitions


log

log chair

chair

Polysemy on several levels

Meaning representation

sem types

Semantic network—WordNet

wordnet

VerbaLex

Word sense disambiguation

WSD: deep methods

WSD: shallow methods

Granularity: cat

WordNet

Granularity: oko

Granularity: dát

VerbaLex states 32 (!) senses (irreflexive variants).

Granularity: malý

Granularity for MT

The granularity of translation dictionaries may be enough: a word $w$ has exactly the number of senses as it has equivalents in a dictionary.

What is the most polysemous word in English?

Answers from

wordnik, PDEV

Lexica: summary

Kilgarriff, Adam. I don’t believe in word senses. Computers and the Humanities 31.2 (1997): 91–113.

Syntactic level

Syntactic analysis

Context-free grammar

Context-free grammar

Grammars

Types of analyses

Why syntactic analysis?

Syntactic ambiguity

Partial syntactic polysemy—garden path

…cognitive plausibility of parsing.

Phrase structure

Example

S   -> NP VP
VP  -> ADV V | V ADV
NP  -> DET N
DET -> the | a | an
N   -> cat | dog
...

Analyse: the dog runs fast (bottom-up and top-down)

Phrasal tree

Phrasal tree

Constituency (phrasal structure)

Dependency structure

Dependency tree

Dependency tree

Dependency

Hybrid trees

Hybrid tree I

Hybrid tree (SET)

Evaluation of parsing quality

Transfer translation

Transfer scheme

Example of transfer rules I

Example of transfer rules II

From Arturo Trujillo, Translation Engines: Techniques for Machine Translation.

Writing rules

You like her. x Ella te gusta.

Transfer syntax

Classes of rules

Semantic level / analysis

Semantic roles

  Dítě     škádlí  lvíče.
  AG/SUBJ  V       PAT/OBJ

  A child (SUBJ)    teases (V) a lion cub (OBJ).
  A lion cub (SUBJ) teases (V) a child (OBJ).

Errors propagated from below

zatímco trhal prsty svého pstruha

(George R. R. Martin, Hostina pro vrány)

FrameNet

FrameNet: Closure

Framenet

An Agent manipulates a Fastener to open or close a Containing_object (e.g. coat, jar). Sometimes an Enclosed_region or a Container_portal may be expressed. Since the Manipulator is syntactically omissible, many verbs in this frame incorporate the Fastener.

Mary closed her coat with a belt.

Prague Dependency TreeBank 2.0

PDT layers

TectoMT

TectoMT: a simple block

English negative particles → verb attributes

sub process_document {
  my ($self,$document) = @_;

  foreach my $bundle ($document->get_bundles()) {
    my $a_root = $bundle->get_tree('SEnglishA');

    foreach my $a_node ($a_root->get_descendants) {
      my ($eff_parent) = $a_node->get_eff_parents;
      if ($a_node->get_attr('m/lemma')=~/^(not|n't)$/
          and $eff_parent->get_attr('m/tag')=~/^V/ ) {
        $a_node->set_attr('is_aux_to_parent',1);
      }
    }
  }
}

Tecto-align

Analysis in RBMT

Synthesis in RBMT, issues

Rule-based systems: conclusion

published: 2017-03-01
last modified: 2023-11-20

https://vit.baisa.cz/notes/learn/mt-rules/