Differences

This shows you the differences between two versions of the page.

Link to this comparison view

shallow_parsing_on_word_lattices [2014/03/15 13:02] (current)
bziolko created
Line 1: Line 1:
 +Bartosz Zaborowski (Poland)
  
 +**Shallow parsing on word lattices**
 +
 +Institiutions: Institute of Computer Science, Polish Academy of Sciences
 +
 +The article presents a new algorithm for shallow parsing of a natural language. The main innovation is that it allows efficient processing of word lattices. This makes it possible to test a new rule-based approach to automatic speech recognition: with scoring sentence candidates for a given utterance with regard to their accordance with grammar of a particular natural language. The algorithm is being implemented in a shallow parsing and morphosyntactic disambiguation system Spejd. The Spejd formalism consists of a cascade of several handwritten rules. Each of the rules is a regular grammar, with a set of operations modifying matched text. The encoded text being searched under the hood contains not only orthographic forms of words, but also morphosyntactic information and syntactic structures. The available RHS operations include e.g. morphosyntactic disambiguation by enforcing agreement of particular morphosyntactic attributes and construction of syntactic structures.
 +
 +The Spejd formalism is designed to process a single (linear) sentence at once. The naive baseline approach to the task of scoring each of sentence candidates in a word lattice would be to process each of the candidates sequentially, after extracting from the lattice. Unfortunately, the time complexity of the naive approach is linear with regard to the total number of sentence candidates encoded in a lattice, which is exponential in the size of the lattice. Hence, the basic approach is not applicable to real-life data.
 +
 +The rule matching algorithm which is presented in the article uses Finite-State techniques to process the whole word lattice at once in polynomial time and space. It combines optimizations of processing of linear sentences used in the existing Spejd implementation with new ideas, specific to searching for regular patterns in word lattice. The improvements follow the idea of a Nondeterministic Finite-State Automaton simulation without backtracking presented by Ville Laurikari. This idea was extended to handle not only a bunch of NFA threads at once, but also to simultaneously process multiple paths passing through a particular word (a lattice node).
 +
 +The article also discusses an efficient application of RHS operations directly on a word lattice, which turns out not to be trivial due to the high expressive power of the Spejd formalism.
 +
 +The article is supplemented by a speed evaluation of the new implementation.
 +
 +The research was funded by the National Science Centre allocated on the basis of a decision DEC-2011/03/D/ST6/00914.   
Copyright © XXII PVC Organizing Committee 2013. All Rights Reserved.