Index of papers in Proc. ACL that mention
  • baseline system
Li, Zhifei and Yarowsky, David
Abstract
We integrate our method into a state-of-the-art baseline translation system and show that it consistently improves the performance of the baseline system on various NIST MT test sets.
Introduction
For example, if the baseline system knows that the translation for “EWE )E‘L’EX” is “Hong Kong Governor”, and it also knows that “7% E” is an abbreviation of “éfi , then it can translate “7%?” to “Hong Kong Governor”.
Introduction
We also need to make sure that the baseline system has at least one valid translation for the full-form phrase.
Introduction
Moreover, our approach integrates the abbreviation translation component into the baseline system in a natural way, and thus is able to make use of the minimum-error-rate training (Och, 2003) to automatically adjust the model parameters to reflect the change of the integrated system over the baseline system .
Unsupervised Translation Induction for Chinese Abbreviations
o Step-5: augment the baseline system with translation entries obtained in Step-4.
Unsupervised Translation Induction for Chinese Abbreviations
Moreover, obtaining a list using a dedicated tagger does not guarantee that the baseline system knows how to translate the list.
Unsupervised Translation Induction for Chinese Abbreviations
On the contrary, in our approach, since the Chinese entities are translation outputs for the English entities, it is ensured that the baseline system has translations for these Chinese entities.
baseline system is mentioned in 21 sentences in this paper.
Topics mentioned in this paper:
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Evaluation
Further examination of the differences between the two systems yielded that most of the improvements are due to better bigrams and trigrams, as indicated by the breakdown of the BLEU score precision per n-gram, and primarily leverages higher quality generated candidates from the baseline system .
Evaluation
We experimented with two extreme setups that differed in the data assumed parallel, from which we built our baseline system , and the data treated as monolingual, from which we built our source and target graphs.
Evaluation
In the second setup, we train a baseline system using the data in Table 2, augmented with the noisy parallel text:
Generation & Propagation
Instead, by intelligently expanding the target space using linguistic information such as morphology (Toutanova et al., 2008; Chahuneau et al., 2013), or relying on the baseline system to generate candidates similar to self-training (McClosky et al., 2006), we can tractably propose novel translation candidates (white nodes in Fig.
Generation & Propagation
To generate new translation candidates using the baseline system , we decode each unlabeled source bigram to generate its m-best translations.
Generation & Propagation
The generated candidates for the unlabeled phrase — the ones from the baseline system’s
baseline system is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen
Abstract
First, a sequence of weak translation systems is generated from a baseline system in an iterative manner.
Abstract
We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems , including a phrase-based system, a hierarchical phrase-based system and a syntax-based system.
Abstract
The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems .
Background
5.1 Baseline Systems
Background
In this work, baseline system refers to the system produced by the boosting-based system combination when the number of iterations (i.e.
Background
To obtain satisfactory baseline performance, we train each SMT system for 5 times using MERT with different initial values of feature weights to generate a group of baseline candidates, and then select the best-performing one from this group as the final baseline system (i.e.
Introduction
In this method, a sequence of weak translation systems is generated from a baseline system in an iterative manner.
Introduction
Experimental results show that our method leads to significant improvements in translation accuracy over the baseline systems .
baseline system is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Wang, Aobo and Kan, Min-Yen
Abstract
Our joint inference method significantly outperforms baseline systems that conduct the tasks individually or sequentially.
Experiment
We discuss the dataset, baseline systems and experiments results in detail in the following.
Experiment
3.2 Baseline Systems
Experiment
We implemented several baseline systems to compare with proposed FCRF joint inference method.
Introduction
In Section 3, we first describe the details of our dataset and baseline systems , followed by demonstrating two sets of experiments for CWS and IWR, respectively.
baseline system is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Mi, Haitao and Liu, Qun
Experiments
Our baseline system is a state-of-the-art forest-based constituency-to-string model (Mi et al., 2008), or forest 625 for short, which translates a source forest into a target string by pattern-matching the
Experiments
The baseline system extracts 31.9M 625 rules, 77.9M 525 rules respectively and achieves a BLEU score of 34.17 on the test set3.
Experiments
At first, we investigate the influence of different rule sets on the performance of baseline system .
baseline system is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Jia, Zhongye and Zhao, Hai
Experiments
4.3 Baseline System without Typo Correction
Experiments
Firstly we build a baseline system without typo correction which is a pipeline of pinyin syllable segmentation and PTC conversion.
Experiments
The baseline system takes a pinyin input sequence, segments it into syllables, and then converts it to Chinese character sequence.
baseline system is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Darwish, Kareem
Baseline Arabic NER System
For the baseline system , we used the CRF++1 implementation of CRF sequence labeling with default parameters.
Conclusion
For Arabic NER, the new features yielded an improvement of 5.5% over a strong baseline system on a standard dataset, with 10.7% gain in recall and negligible change in precision.
Cross-lingual Features
Table 4 reports on the results of the baseline system with the capitalization feature on the three datasets.
Cross-lingual Features
Table 5 reports on the results using the baseline system with the transliteration mining feature.
Cross-lingual Features
Table 6 reports on the results of using the baseline system with the two DBpedia features.
Introduction
The remainder of the paper is organized as follows: Section 2 provides related work; Section 3 describes the baseline system ; Section 4 introduces the cross-lingual features and reports on their effectiveness; and Section 5 concludes the paper.
Related Work
We used their simplified features in our baseline system .
baseline system is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Abstract
Furthermore, our system improves significantly over a baseline system when applied to text from a different domain, and it reduces the sample complexity of sequence labeling.
Experiments
As expected, the drop-off in the baseline system’s performance from all words to rare words is impressive for both tasks.
Experiments
in F1 over the baseline system on all words, it in fact outperforms our baseline NP chunker on the WSJ data.
Experiments
This chunker achieves 0.91 F1 on OANC data, and 0.93 F1 on WSJ data, outperforming the baseline system in both cases.
baseline system is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Kaufmann, Tobias and Pfister, Beat
Abstract
The language model is applied by means of an N -best rescoring step, which allows to directly measure the performance gains relative to the baseline system without rescoring.
Abstract
We report a significant reduction in word error rate compared to a state-of-the-art baseline system .
Experiments
For a given test set we could then compare the word error rate of the baseline system with that of the extended system employing the grammar-based language model.
Experiments
Our primary aim was to design a task which allows us to investigate the properties of our grammar-based approach and to compare its performance with that of a competitive baseline system .
Experiments
As shown in Table l, the grammar-based language model reduced the word error rate by 9.2% relative over the baseline system .
Introduction
Besides proposing an improved language model, this paper presents experimental results for a much more difficult and realistic task and compares them to the performance of a state-of-the-art baseline system .
baseline system is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Liao, Shasha and Grishman, Ralph
Cross-event Approach
5.1 Sentence-level Baseline System
Cross-event Approach
To use document-level information, we need to collect information based on the sentence-level baseline system .
Cross-event Approach
To this end, we set different thresholds from 0.1 to 1.0 in the baseline system output, and only evaluate triggers, arguments or roles whose confidence score is above the threshold.
Motivation
The sentence level baseline system finds event triggers like “founded” (trigger of Start-Org), “elected” (trigger of Elect), and “appointment” (trigger of Start-Position), which are easier to identify because these triggers have more specific meanings.
baseline system is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Kummerfeld, Jonathan K. and Roesner, Jessika and Dawborn, Tim and Haggerty, James and Curran, James R. and Clark, Stephen
Conclusion
The fastest model parsed sentences 1.85 times as fast and was as accurate as the baseline system .
Data
Both sets of annotations were produced by manually correcting the output of the baseline system .
Introduction
By increasing the ambiguity level of the adaptive models to match the baseline system , we can also slightly increase supertagging accuracy, which can lead to higher parsing accuracy.
Introduction
Using an adapted supertagger with ambiguity levels tuned to match the baseline system , we were also able to increase F-score on labelled grammatical relations by 0.75%.
Results
As Table 8 shows, in all cases the use of supertagger-annotated data led to poorer performance than the baseline system , while the use of parser-annotated data led to an improvement in F-score.
Results
However, on the corpus of the extra data, the performance of the adapted models is comparable to the baseline model, which means the parser is probably still be receiving the same categories that it used from the sets provided by the baseline system .
baseline system is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Introduction
We use an open source CRF software package to implement our CRF models.1 We use words, POS tags, chunk labels, and the predicate label at the preceding and following nodes as features for our Baseline system .
Introduction
For predicates that never or rarely appear in training, the HMM features increase Fl by 4.2, and they increase the overall F1 of the system by 3.5 to 93.5, which approaches the F1 of 94.7 that the Baseline system achieves on the in-domain WSJ test set.
Introduction
Table 2 shows the performance of our three baseline systems .
baseline system is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin
Experiments
4.4 Baseline Systems
Experiments
As described above, by using the NiuTrans toolkit, we have built two baseline systems to fulfill “863” SLT task in our experiments.
Experiments
These two baseline systems are equipped with the same language model which is trained on large-scale monolingual target language corpus.
baseline system is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Liu, Zhanyi and Wang, Haifeng and Wu, Hua and Li, Sheng
Abstract
As compared to baseline systems , we achieve absolute improvements of 2.40 BLEU score on a phrase-based SMT system and 1.76 BLEU score on a parsing-based SMT system.
Conclusion
When we also used phrase collocation probabilities as additional features, the phrase-based SMT performance is finally improved by 2.40 BLEU score as compared with the baseline system .
Experiments on Phrase-Based SMT
From the results of Table 4, it can be seen that the systems using the improved bidirectional alignments achieve higher quality of translation than the baseline system .
Experiments on Phrase-Based SMT
Figure 3 shows an example: T1 is generated by the system where the phrase collocation probabilities are used and T2 is generated by the baseline system .
Experiments on Phrase-Based SMT
As compared with the baseline system , an absolute improvement of 2.40 BLEU score is achieved.
baseline system is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Ma, Xuezhe and Xia, Fei
Experiments
Table 3 and Table 4 shows the parsing results of our approach, together with the results of the baseline systems and the oracle, on version 1.0 and version 2.0 of Google Universal Treebanks, respectively.
Experiments
Our approaches significantly outperform all the baseline systems across all the seven target languages.
Experiments
to those five baseline systems and the oracle (OR).
baseline system is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Rudzicz, Frank
Abstract
Experiments compare this with two baseline systems , namely an acoustic hidden Markov model and a dynamic Bayes network augmented with discretized representations of the vocal tract.
Baseline systems
We examine two baseline systems .
Baseline systems
Figure 3: Baseline systems : (a) acoustic hidden Markov model and (b) articulatory dynamic Bayes network.
Experiments
For each of our baseline systems , we calculate the phoneme-error—rate (PER) and word-error-rate (WER) after training.
Experiments
Table 1: Phoneme- and Word-Error-Rate (PER and WER) for different parameterizations of the baseline systems .
baseline system is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Feng, Minwei and Peter, Jan-Thorsten and Ney, Hermann
Abstract
Results on five Chinese-English NIST tasks show that our model improves the baseline system by 1.32 BLEU and 1.53 TER on average.
Conclusion
Experimental results show that our model is stable and improves the baseline system by 0.98 BLEU and 1.21 TER (trained by CRFs) and 1.32 BLEU and 1.53 TER (trained by RNN).
Conclusion
We also show that the proposed model is able to improve a very strong baseline system .
Experiments
The reordering model for the baseline system is the distance-based jump model which uses linear distance.
Experiments
The results show that our proposed idea improves the baseline system and RNN trained model performs better than CRFs trained model, in terms of both automatic measure and significance test.
baseline system is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Sauper, Christina and Haghighi, Aria and Barzilay, Regina
Experiments
While MUC has a deficiency in that putting everything into a single cluster will artificially inflate the score, parameters on our model are set so that the model uses the same number of clusters as the baseline system .
Experiments
While it would be possible to artificially inflate the score by putting everything into a single cluster, the parameters on our model and the likelihood objective are such that the model prefers to use all available clusters, the same number as the baseline system .
Experiments
While our system does suffer on precision in comparison to the baseline system , the recall gains far outweigh this loss, for a total error reduction of 20% on the MUC measure.
baseline system is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Berg-Kirkpatrick, Taylor and Durrett, Greg and Klein, Dan
Conclusion
Our system achieves state-of-the-art results, significantly outperforming two state-of-the-art baseline systems .
Experiments
We evaluate the output of our system and the baseline systems using two metrics: character error rate (CER) and word error rate (WER).
Experiments
We compare with two baseline systems : Google’s open source OCR system, Tessearact, and a state-of-the-art commercial system, ABBYY FineReader.
Results and Analysis
This represents a substantial error reduction compared to both baseline systems .
Results and Analysis
The baseline systems do not have special provisions for the long 3 glyph.
baseline system is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop
Collocational Lexicon Induction
2.1 Baseline System
Collocational Lexicon Induction
We reimplemented this collocational approach for finding translations for oovs and used it as a baseline system .
Experiments & Results 4.1 Experimental Setup
Table 6 reports the Bleu scores for different domains when the oov translations from the graph propagation is added to the phrase-table and compares them with the baseline system (i.e.
Introduction
(2009) showed that this method improves over the baseline system where oovs are untranslated.
baseline system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Persing, Isaac and Ng, Vincent
Error Classification
Our Baseline system for error classification employs two types of features.
Evaluation
Our Baseline system , which only uses word n-gram and random indexing features, seems to perform uniformly poorly across both micro and macro F-scores (F and F; see row 1).
Evaluation
As we progressed, adding each new feature type to the baseline system , there was no definite and consistent pattern to how the pre-cisions and recalls changed in order to produce the universal increases in the F-scores that we observed for each new system.
Evaluation
We see that the thesis clarity score predicting variation of the Baseline system , which employs as features only word n-grams and random indexing features, predicts the wrong score 65.8% of the time.
baseline system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Salloum, Wael and Elfardy, Heba and Alamir-Salloum, Linda and Habash, Nizar and Diab, Mona
MT System Selection
For baseline system selection, we use the classification decision of Elfardy and Diab (2013)’s sentence-level dialect identification system to decide on the target MT system.
MT System Selection
baseline systems .
MT System Selection
The first part of Table 2 repeats the best baseline system and the four-system oracle combination from Table l for convenience.
Machine Translation Experiments
In this section, we present our MT experimental setup and the four baseline systems we built, and we evaluate their performance and the potential of their combination.
baseline system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhang, Chunliang
A Skeleton-based Approach to MT 2.1 Skeleton Identification
For language modeling, lm is the standard n-gram language model adopted in the baseline system .
Evaluation
Row s-space of Table 1 shows the BLEU and TER results of restricting the baseline system to the space of skeleton-consistent derivations, i.e., we remove both the skeleton-based translation model and language model from the SBMT system.
Evaluation
We see that the limited search space is a little harmful to the baseline system .
Evaluation
Further, we regarded skeleton-consistent derivations as an indicator feature and introduced it into the baseline system .
baseline system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cohn, Trevor and Haffari, Gholamreza
Abstract
Our experiments on Arabic, Urdu and Farsi to English demonstrate improvements over competitive baseline systems .
Analysis
Our experiments on Urdu-English, Arabic-English, and Farsi-English translation tasks all demonstrate improvements over competitive baseline systems .
Experiments
Our baseline system uses the latter.
Related Work
Our approach improves upon theirs in terms of the model and inference, and critically, this is borne out in our experiments where we show uniform improvements in translation quality over a baseline system , as compared to their almost entirely negative results.
baseline system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
He, Xiaodong and Deng, Li
Abstract
Build the baseline system , estimate { 0, k }.
Abstract
the baseline system , compute BLE U (En, El).
Abstract
Other models used in the baseline system include lexicalized ordering model, word count and phrase count, and a 3-gram LM trained on the English side of the parallel training corpus.
baseline system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim
MT performance results
Propagating the uncertainty of the baseline system by using more input hypotheses consistently improves performance across the different methods, with an additional improvement of between .2 and .4 BLEU points.
MT performance results
In all scenarios, two human judges (native speakers of these languages) evaluated 100 sentences that had different translations by the baseline system and our model.
MT performance results
The judges were given the reference translations but not the source sentences, and were asked to classify each sentence pair into three categories: (1) the baseline system is better (score=-1), (2) the output of our model is better (score=l), or (3) they are of the same quality (score=0).
baseline system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wu, Xianchao and Matsuzaki, Takuya and Tsujii, Jun'ichi
Abstract
Extensive experiments involving large-scale English-to-Japanese translation revealed a significant improvement of 1.8 points in BLEU score, as compared with a strong forest-to-string baseline system .
Conclusion
Extensive experiments on large-scale English-to-Japanese translation resulted in a significant improvement in BLEU score of 1.8 points (p < 0.01), as compared with our implementation of a strong forest-to-string baseline system (Mi et al., 2008; Mi and Huang, 2008).
Experiments
We implemented the forest-to-string decoder described in (Mi et al., 2008) that makes use of forest-based translation rules (Mi and Huang, 2008) as the baseline system for translating English HPSG forests into Japanese sentences.
Experiments
Joshua V1.3 (Li et al., 2009), which is a freely available decoder for hierarchical phrase-based SMT (Chiang, 2005), is used as an external baseline system for comparison.
baseline system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Sun, Ang and Grishman, Ralph and Sekine, Satoshi
Abstract
When training on different sizes of data, our semi-supervised approach consistently outperformed a state-of-the-art supervised baseline system .
Experiments
Nonetheless, we believe our baseline system has achieved very competitive performance.
Feature Based Relation Extraction
We now describe a supervised baseline system with a very large set of features and its learning strategy.
Introduction
Section 4 describes in detail a state-of-the-art supervised baseline system .
baseline system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Yeniterzi, Reyyan and Oflazer, Kemal
Experimental Setup and Results
3.2.1 The Baseline Systems
Experimental Setup and Results
As a baseline system , we built a standard phrase-based system, using the surface forms of the words without any transformations, and with a 3—gram LM in the decoder.
Experimental Setup and Results
We also built a second baseline system with a factored model.
baseline system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hui and Zhang, Min and Li, Haizhou and Aw, Aiti and Tan, Chew Lim
Abstract
Experimental results on the NIST MT-2003 Chinese-English translation task show that our method statistically significantly outperforms the four baseline systems .
Conclusion
Experimental results show that our model greatly outperforms the four baseline systems .
Experiment
We use the first three syntax-based systems (TT2S, TTS2S, FT2S) and Moses (Koehn et al., 2007), the state-of-the-art phrase-based system, as our baseline systems .
Experiment
3) Our model statistically significantly outperforms all the baselines system .
baseline system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Oh, Jong-Hoon and Torisawa, Kentaro and Hashimoto, Chikara and Sano, Motoki and De Saeger, Stijn and Ohtake, Kiyonori
Experiments
the result for our baseline system that recognizes a causal relation by simply taking the two phrases adjacent to a c-marker (i.e., before and after) as cause and effect parts of the causal relation.
Experiments
From these results, we confirmed that our method recognized both intra- and inter-sentential causal relations with over 80% precision, and it significantly outperformed our baseline system in both precision and recall rates.
Experiments
In this experiment, we compared five systems: four baseline systems (MURATA, OURCF, OH and OH+PREVCF) and our proposed method (PROPOSED).
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ganchev, Kuzman and Graça, João V. and Taskar, Ben
Introduction
We propose a heuristic for tuning posterior decoding in the absence of annotated alignment data and show improvements over baseline systems for six different
Phrase-based machine translation
The baseline system uses GIZA model 4 alignments and the open source Moses phrase-based machine translation toolkit2, and performed close to the best at the competition last year.
Phrase-based machine translation
We report BLEU scores using a script available with the baseline system .
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Saha, Sujan Kumar and Mitra, Pabitra and Sarkar, Sudeshna
Conclusion
It is observed that significant enhancement of accuracy over the baseline system which use word features is obtained.
Evaluation of NE Recognition
But in the baseline system addition of word features (wi_2 and 212,42) over the same feature decrease the f-value from 75.6 to 72.65.
Maximum Entropy Based Model for Hindi NER
The best accuracy (75.6 f-value) of the baseline system is obtained using the binary NomPSP feature along with word feature (wi_1, wi+1), suffix and digit information.
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Qian, Longhua and Hui, Haotian and Hu, Ya'nan and Zhou, Guodong and Zhu, Qiaoming
Abstract
Section 2 reviews the previous work on relation extraction while Section 3 describes our baseline systems .
Abstract
3 Baseline Systems
Abstract
Particularly, SL—MO is used as the baseline system against which deficiency scores for other methods are computed.
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Vickrey, David and Koller, Daphne
Experiments
For these arguments, we simply filled in using our baseline system (specifically, any non-core argument which did not overlap an argument predicted by our model was added to the labeling).
Experiments
achieving a statistically significant increase over the Baseline system (according to confidence intervals calculated for the Conll-2005 results).
Experiments
The Transforms model correctly labels the arguments of “buy”, while the Baseline system misses the ARGO.
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Min and Jiang, Hongfei and Aw, Aiti and Li, Haizhou and Tan, Chew Lim and Li, Sheng
Abstract
Experimental results on the NIST MT-2005 Chinese-English translation task show that our method statistically significantly outperforms the baseline systems .
Experiments
We set three baseline systems : Moses (Koehn et al., 2007), and SCFG-based and STSG-based tree-to-tree translation models (Zhang et al., 2007).
Experiments
In this subsection, we first report the rule distributions and compare our model with the three baseline systems .
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming
Experiments and Results
We compare our phrase pair embedding methods and our proposed RZNN with baseline system , in Table 2.
Experiments and Results
We can see that, our RZNN models with WEPPE and TCBPPE are both better than the baseline system .
Introduction
We conduct experiments on a Chinese-to-English translation task to test our proposed methods, and we get about 1.5 BLEU points improvement, compared with a state-of-the-art baseline system .
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Matsubayashi, Yuichiroh and Okazaki, Naoaki and Tsujii, Jun'ichi
Conclusion
Since we used the latest release of FrameNet in order to use a greater number of hierarchical role-to-role relations, we could not make a direct comparison of performance with that of existing systems; however we may say that the 89.00% F1 micro-average of our baseline system is roughly comparable to the 88.93% value of Bejan and Hathaway (2007) for SemEval-2007 (Baker et al., 2007).
Experiment and Discussion
The baseline system achieved 89.00% with respect to the micro-averaged F1.
Experiment and Discussion
Table 6 reports the precision, recall, and micro-averaged F1 scores of semantic roles with respect to each coreness type.4 In general, semantic roles of the core coreness were easily identified by all of the grouping criteria; even the baseline system obtained an F1 score of 91.93.
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Stoyanov, Veselin and Gilbert, Nathan and Cardie, Claire and Riloff, Ellen
Coreference Subtask Analysis
3.2 Baseline System Results
Coreference Subtask Analysis
In all remaining experiments, we learn the threshold from the training set as in the BASELINE system .
Coreference Subtask Analysis
Comparison to the BASELINE system (box 2) shows that using gold standard NEs leads to improvements on all data sets with the exception of ACE2 and ACEOS, on which performance is virtually unchanged.
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Gao, Jianfeng and Micol, Daniel and Quirk, Chris
Abstract
Results show that the system using the phrase-based error model outperforms significantly its baseline systems .
Clickthrough Data and Spelling Correction
One possible reason is that our baseline system , which does not use any error model learned from the clickthrough data, is already able to correct these basic, obvious spelling mistakes.
Introduction
In particular, the speller system incorporating a phrase-based error model significantly outperforms its baseline systems .
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Abu-Jbara, Amjad and Dasigi, Pradeep and Diab, Mona and Radev, Dragomir
Evaluation
First, we compare our system to baseline systems .
Evaluation
4.1 Comparison to Baseline Systems
Evaluation
Table 5: Comparison to baseline systems
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Manshadi, Mehdi and Gildea, Daniel and Allen, James
Experiments
However, in order to have a fair comparison, we have used the output of the Stanford parser to automatically generate the same features that MAll have hand-annotated.14 In order to run the baseline system on implicit universals, we take the feature vector of a plural NP and add a feature to indicate that this feature vector represents the implicit universal of the corresponding chunk.
Experiments
Once again, in order to have a fair comparison, we apply a similar modification to the baseline system .
Experiments
We also use the exact same classifier as used in MAl 1.15 Figure 5(a) compares the performance of our model, which we refer to as RPC-SVM-l3, with the baseline system , but only on explicit NP chunks.16 The goal for running this experiment has been to compare the performance of our model to the baseline sys-token, as described by Manshadi et a1.
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen
Experiments
For better comparison with NAMT, besides the original baseline, we develop the other baseline system by adding name translation table into the phrase table (NPhrase).
Experiments
We can see that except for the BOLT3 data set with BLEU metric, our NAMT approach consistently outperformed the baseline system for all data sets with all metrics, and provided up to 23.6% relative error reduction on name translation.
Experiments
In order to investigate the correlation between name-aware BLEU scores and human judgment results, we asked three bilingual speakers to judge our translation output from the baseline system and the NAMT system, on a Chinese subset of 250 sentences (each sentence has two corresponding translations from baseline and NAMT) extracted randomly from 7 test corpora.
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Fader, Anthony and Zettlemoyer, Luke and Etzioni, Oren
Introduction
On known-answerable questions, the approach achieved 42% recall, with 77% precision, more than quadrupling the recall over a baseline system .
Introduction
0 We evaluate PARALEX on the end-task of answering questions from WikiAnswers using a database of web extractions, and show that it outperforms baseline systems .
Results
PARALEX outperforms the baseline systems in terms of both F1 and MAP.
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Clifton, Ann and Sarkar, Anoop
Experimental Results
For our word-based Baseline system , we trained a word-based model using the same Moses system with identical settings.
Experimental Results
For evaluation against segmented translation systems in segmented forms before word reconstruction, we also segmented the baseline system’s word-based output.
Experimental Results
So, we ran the word-based baseline system , the segmented model (Unsup L—match), and the prediction model (CRF—LM) outputs, along with the reference translation through the supervised morphological analyzer Omorfi (Piri—nen and Listenmaa, 2007).
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yang, Nan and Li, Mu and Zhang, Dongdong and Yu, Nenghai
Experiments
5.3.1 Baseline System
Experiments
We use a BTG phrase-based system with a Max-Ent based leXicalized reordering model (Wu, 1997; Xiong et al., 2006) as our baseline system for
Experiments
From Table 2, we can see our ranking reordering model significantly improves the performance for both English-to-Japanese and Japanese-to-English experiments over the BTG baseline system .
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Vaswani, Ashish and Huang, Liang and Chiang, David
Experiments
We found that adding word classes improved alignment quality a little, but more so for the baseline system (see Table 3).
Experiments
Table 3: Adding word classes improves the F-score in both directions for Arabic-English alignment by a little, for the baseline system more so than ours.
Experiments
In particular, the baseline system demonstrates typical “garbage collection” behavior (Moore, 2004) in all four examples.
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Konstas, Ioannis and Lapata, Mirella
Experimental Design
5Since the addition of these features, essentially incurs reranking, it follows that the systems would exhibit the exact same performance as the baseline system with l—best lists.
Introduction
The performance of this baseline system could be potentially further improved using discriminative reranking (Collins, 2000).
Introduction
baseline system .
baseline system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: