Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language
Boros, Tiberiu and Ion, Radu and Tufis, Dan

Article Structure

Abstract

Standard methods for part-of-speech tagging suffer from data sparseness when used on highly inflectional languages (which require large lexical tagset inventories).

Topics

hidden layer

Appears in 20 sentences as: hidden layer (20) hidden layers (4)
In Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language
  1. In our experiments, we used a fully connected, feed forward neural network with 3 layers (1 input layer, 1 hidden layer and 1 output layer)
    Page 4, “Abstract”
  2. In order to fully characterize our system, we took into account the following parameters: accuracy, runtime speed, training speed, hidden layer configuration and the number of optimal training iterations.
    Page 4, “Abstract”
  3. For example, the accuracy, the optimal number of training iterations, the training and the runtime speed are all highly dependent on the hidden layer configuration.
    Page 4, “Abstract”
  4. Small hidden layer give high training and runtime speeds, but often under-fit the data.
    Page 4, “Abstract”
  5. If the hidden layer is too large, it can easily over-fit the data and also has a negative impact on the training and runtime speed.
    Page 4, “Abstract”
  6. The number of optimal training iterations changes with the size of the hidden layer (larger layers usually require more training iterations).
    Page 4, “Abstract”
  7. The first experiment was designed to determine the tradeoff between the runtime speed and the size of the hidden layer .
    Page 5, “Abstract”
  8. Because, for a given number of neurons in the hidden layer , the tagging speed is independent on the tagging accuracy, we partially trained (using one iteration and only 1000 training sentences) several network configurations.
    Page 5, “Abstract”
  9. The first network only had 50 neurons in the hidden layer and for the next networks, we incremented the hidden layer size by 20 neurons until we reached 310 neurons.
    Page 5, “Abstract”
  10. Determining the optimal size of the hidden layer is a very delicate subject and there are no perfect solutions, most of them being based on trial and error: small-sized hidden layers lead to under-fitting, while large hidden layers usually cause over-fitting.
    Page 5, “Abstract”
  11. Also, because of the tradeoff between runtime speed and the size of hidden layers, and if runtime speed is an important factor in a particular NLP application, then hidden layers with smaller number of neurons are preferable, as they surely do not over-fit the data and offer a noticeable speed boost.
    Page 5, “Abstract”

See all papers in Proc. ACL 2013 that mention hidden layer.

See all papers in Proc. ACL that mention hidden layer.

Back to top.

neural network

Appears in 18 sentences as: Neural Network (3) neural network (7) Neural Networks (2) neural networks (7)
In Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language
  1. In this paper we present an alternative method to Tiered Tagging, based on local optimizations with Neural Networks and we show how, by properly encoding the input sequence in a general Neural Network architecture, we achieve results similar to the Tiered Tagging methodology, significantly faster and without requiring extensive linguistic knowledge as implied by the previously mentioned method.
    Page 1, “Abstract”
  2. In this article, we propose an alternative solution based on local optimizations with feed-forward neural networks .
    Page 2, “Abstract”
  3. 2 Large tagset part-of—speech tagging with feed-forward neural networks
    Page 2, “Abstract”
  4. Our proposal to deal With large tagsets Without removing contextually useful information is based on Feed Forward Neural Networks (FFNN).
    Page 2, “Abstract”
  5. Several neural network architectures have been proposed for the task of part-of-speech tagging.
    Page 2, “Abstract”
  6. In his paper, he argues that neural networks are preferable to other methods, when the training set is small.
    Page 2, “Abstract”
  7. The power of neural networks results mainly from their ability to attain activation functions over different patterns via their learning algorithm.
    Page 3, “Abstract”
  8. The tagger automatically assigns four dummy tokens (two at the beginning and two at the end) to the target utterance and the neural network is trained to automatically assign a MSD given the context (two previously assigned tags and the possible tags for the current and following two words) of the current word (see below for details).
    Page 3, “Abstract”
  9. In our experiments, we used a fully connected, feed forward neural network with 3 layers (1 input layer, 1 hidden layer and 1 output layer)
    Page 4, “Abstract”
  10. While other network architectures such as recurrent neural networks may prove to be more suitable for this task, they are extremely hard to train, thus, we traded the advantages of such architectures for the robustness and simplicity of the feed-forward networks.
    Page 4, “Abstract”
  11. So, the problem comes to determining which is the most stable configuration of the neural network (i.e.
    Page 6, “Abstract”

See all papers in Proc. ACL 2013 that mention neural network.

See all papers in Proc. ACL that mention neural network.

Back to top.

data sparseness

Appears in 3 sentences as: data sparseness (3)
In Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language
  1. Standard methods for part-of-speech tagging suffer from data sparseness when used on highly inflectional languages (which require large lexical tagset inventories).
    Page 1, “Abstract”
  2. The standard tagging methods, using such large tagsets, face serious data sparseness problems due to lack of statistical evidence, manifested by the non-robustness of the language models.
    Page 1, “Abstract”
  3. The previously proposed methods still suffer from the same issue of data sparseness when applied to MSD tagging.
    Page 3, “Abstract”

See all papers in Proc. ACL 2013 that mention data sparseness.

See all papers in Proc. ACL that mention data sparseness.

Back to top.

part-of-speech

Appears in 3 sentences as: part-of-speech (3)
In Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language
  1. Standard methods for part-of-speech tagging suffer from data sparseness when used on highly inflectional languages (which require large lexical tagset inventories).
    Page 1, “Abstract”
  2. Several neural network architectures have been proposed for the task of part-of-speech tagging.
    Page 2, “Abstract”
  3. We presented a new approach for large tagset part-of-speech tagging using neural networks.
    Page 8, “Abstract”

See all papers in Proc. ACL 2013 that mention part-of-speech.

See all papers in Proc. ACL that mention part-of-speech.

Back to top.

part-of-speech tagging

Appears in 3 sentences as: part-of-speech tagging (3)
In Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language
  1. Standard methods for part-of-speech tagging suffer from data sparseness when used on highly inflectional languages (which require large lexical tagset inventories).
    Page 1, “Abstract”
  2. Several neural network architectures have been proposed for the task of part-of-speech tagging .
    Page 2, “Abstract”
  3. We presented a new approach for large tagset part-of-speech tagging using neural networks.
    Page 8, “Abstract”

See all papers in Proc. ACL 2013 that mention part-of-speech tagging.

See all papers in Proc. ACL that mention part-of-speech tagging.

Back to top.

POS tagger

Appears in 3 sentences as: POS tagger (2) POS tagging (1)
In Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language
  1. When tagging with CTAGS, one can use any statistical POS tagging method such as HMMs, Maximum Entropy Classifiers, Bayesian Networks, CRFs, etc., followed by the CTAG to MSD recovery.
    Page 2, “Abstract”
  2. Manual+automatic Tmmmg a POS tagger rules for MSD recovery I I I I Tagging i i Input data Labeling with CTAGS —> MSD Recovery Output data
    Page 2, “Abstract”
  3. Also, our POS tagger detected cases where the annotation in the Gold Standard was erroneous.
    Page 6, “Abstract”

See all papers in Proc. ACL 2013 that mention POS tagger.

See all papers in Proc. ACL that mention POS tagger.

Back to top.