Introduction | The standard process for pattem-based relation extraction is to start with hand-selected patterns or word pairs expressing a particular relationship, and iteratively scan the corpus for co-appearances of word pairs in patterns and for patterns that contain known word pairs . |
Introduction | We also propose a way to label each cluster by word pairs that represent it best. |
Pattern Clustering Algorithm | (Alfonseca et al., 2006) for extracting general relations starting from given seed word pairs . |
Pattern Clustering Algorithm | Unlike most previous work, our hook words are not provided in advance but selected randomly; the goal in those papers is to discover relationships between given word pairs , while we use hook words in order to discover relationships that generally occur in the corpus. |
Pattern Clustering Algorithm | To label pattern clusters we define a HITS measure that reflects the affinity of a given word pair to a given cluster. |
Related Work | Several recent papers discovered relations on the web using seed patterns (Pantel et al., 2004), rules (Etzioni et al., 2004), and word pairs (Pasca et al., 2006; Alfonseca et al., 2006). |
SAT-based Evaluation | We addressed the evaluation questions above using a SAT-like analogy test automatically generated from word pairs captured by our clusters (see below in this section). |
SAT-based Evaluation | The header of the question is a word pair that is one of the label pairs of the cluster. |
SAT-based Evaluation | In our sample there were no word pairs assigned as labels to more than one cluster4. |
Experimental Setup | To enrich the set of given word pairs and patterns as described in Section 4.1 and to perform clarifying queries, we utilize the Yahoo API for web queries. |
Experimental Setup | If only several links were found for a given word pair we perform local crawling to depth 3 in an attempt to discover more instances. |
Introduction | The standard classification process is to find in an auxiliary corpus a set of patterns in which a given training word pair co-appears, and use pattern-word pair co-appearance statistics as features for machine learning algorithms. |
Relationship Classification | Co-appearance of nominal pairs can be very rare (in fact, some word pairs in the Task 4 set co-appear only once in Yahoo web search). |