Data: MWEs in Dependency Trees | MWEs are first classified as regular or irregular, using regular expressions over the sequence of parts-of-speech within the MWE. |
Data: MWEs in Dependency Trees | To define the regular expressions, we grouped gold MWEs according to the pair [global POS of the MWE + sequence of POS of the MWE components], and designed regular expressions to match the most frequent patterns that looked regular according to our linguistic knowledge. |
Data: MWEs in Dependency Trees | 5 The six regular expressions that we obtained cover nominal, prepositional, adverbial and verbal compounds. |
Methodology | In order to obtain lexical patterns, we can define regular expressions with POS tags 2 and apply the regular expressions on POS tagged texts. |
Methodology | Algorithm 1: New word detection algorithm Input: D: a large set of POS tagged posts W5: a set of seed words kp: the number of patterns chosen at each iteration kc: the number of patterns in the candidate pattern set kw: the number of words added at each iteration K: the number of words returned Output: A list of ranked new words W 1 Obtain all lexical patterns using regular expressions on 1); Count the frequency of each lexical pattern and extract words matched by each pattern ; 3 Obtain top kc frequent patterns as candidate pattern set ’PC and top 5,000 frequent words as candidate word set We ; |
Related Work | For example, Justeson and Katz (1995) extracted technical terminologies from documents using a regular expression . |
Abstract | This paper introduces a new technique for phrase-structure parser analysis, categorizing possible treebank structures by integrating regular expressions into derivation trees. |
Framework for analyzing parsing performance | 2.1 Use of regular expressions |
Introduction | Second, we use a set of regular expressions (henceforth “regexes”) that categorize the possible structures in the treebank. |
Methods | We define a word using the following regular expression: |
Methods | Equation 1 may need to be modified for other languages or segmentation schemes, but our techniques generalize to any definition that can be written as a regular expression . |
Methods | A chain is valid if it emits the beginning of a word as defined by the regular expression in Equation 1. |