Learning to Extract International Relations from Political Context
O'Connor, Brendan and Stewart, Brandon M. and Smith, Noah A.

Article Structure

Abstract

We describe a new probabilistic model for extracting events between major political actors from news corpora.

Introduction

The digitization of large news corpora has provided an unparalleled opportunity for the systematic study of international relations.

Data

The model we describe in §3 is learned from a corpus of 6.5 million newswire articles from the English Gigaword 4th edition (1994—2008, Parker et al., 2009).

Model

We design two models to learn linguistic event classes over predicate paths by conditioning on real-world contextual information about international politics, p(wpredpath | s, 7“, t), leveraging the fact there tends to be dyadic and temporal coherence in international relations: the types of actions that are likely to occur between nations tend to be similar within the same dyad, and usually their distribution changes smoothly over time.

Inference

After randomly initializing all 77k,8,7.,t, inference is performed by a blocked Gibbs sampler, alternating resamplings for three major groups of variables: the language model (z,gb), context model (07,7, [3, p), and the 77, 6 variables, which bottleneck between the submodels.

Experiments

We fit the two models on the dataset described in §2, varying the number of frames K, with 8 or more separate runs for each setting.

Related Work

6.1 Events Data in Political Science

Conclusion

Large-scale information extraction can dramatically enhance the study of political behavior.

Topics

topic model

Appears in 7 sentences as: topic model (4) topic models (2) topic model’s (2)
In Learning to Extract International Relations from Political Context
  1. Our unsupervised model brings together familiar components in natural language processing (like parsers and topic models ) with contextual political information—temporal and dyad dependence—to infer latent event classes.
    Page 1, “Abstract”
  2. We use syntactic preprocessing and a logistic normal topic model , including latent temporal smoothing on the political context prior.
    Page 1, “Introduction”
  3. Thus the language model is very similar to a topic model’s generation of token topics and wordtypes.
    Page 3, “Model”
  4. This simple logistic normal prior is, in terms of topic models , analogous to the asymmetric Dirichlet prior version of LDA in Wallach et al.
    Page 3, “Model”
  5. This is an example of individual lexical features outperforming a topic model for predictive task, because the topic model’s dimension reduction obscures important indicators from individual words.
    Page 6, “Experiments”
  6. Similarly, Gerrish and Blei (2011) found that word-based regression outperformed a customized topic model when predicting Congressional bill passage, and Eisen-
    Page 6, “Experiments”
  7. 13 In the latter, a problem-specific topic model did best.
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention topic model.

See all papers in Proc. ACL that mention topic model.

Back to top.

dependency path

Appears in 6 sentences as: dependency path (6) dependency paths (2)
In Learning to Extract International Relations from Political Context
  1. where s and 7“ denote “source” and “receiver” arguments, which are political actor entities in a predefined set 5, t is a timestep (i.e., a 7-day period) derived from the article’s published date, and wpredpath is a textual predicate expressed as a dependency path that typically includes a verb (we use the terms “predicate-pat ” and “verb-pat ” interchangeably).
    Page 2, “Data”
  2. Verb paths are identified by looking at the shortest dependency path between two mentions in a sentence.
    Page 2, “Data”
  3. o For each frame k, draw a multinomial distribution of dependency paths, gbk; N Dir(b / V) (where V is the number of dependency path types).
    Page 3, “Model”
  4. The vanilla model is capable of inducing frames through dependency path co-occurences, when multiple events occur in a given context.
    Page 3, “Model”
  5. Many of our dependency paths, when traversed from the source to receiver direction, also follow surface order, due to English’s SVO word order.6 Therefore we convert each path to a word sequence and match against the TABARI lexicon—plus a few modifications for differences in infinitives and stemming—and find 528 dependency path matches.
    Page 5, “Experiments”
  6. We also create a baseline El-regularized logistic regression that uses normalized dependency path counts as the features (10,457 features).
    Page 6, “Experiments”

See all papers in Proc. ACL 2013 that mention dependency path.

See all papers in Proc. ACL that mention dependency path.

Back to top.

Gibbs sampling

Appears in 5 sentences as: Gibbs sampler (1) Gibbs samples (1) Gibbs samples’ (1) Gibbs sampling (2)
In Learning to Extract International Relations from Political Context
  1. After randomly initializing all 77k,8,7.,t, inference is performed by a blocked Gibbs sampler , alternating resamplings for three major groups of variables: the language model (z,gb), context model (07,7, [3, p), and the 77, 6 variables, which bottleneck between the submodels.
    Page 4, “Inference”
  2. find that experimenting with different models is easier in the Gibbs sampling framework.
    Page 4, “Inference”
  3. While Gibbs sampling for logistic normal priors is possible using auxiliary variable methods (Mimno et al., 2008; Holmes and Held, 2006; Polson et al., 2012), it can be slow to converge.
    Page 4, “Inference”
  4. Posteriors are saved and averaged from 11 Gibbs samples (every 100 iterations from 9,000 to 10,000) for analysis.
    Page 4, “Experiments”
  5. where n refers to the averaged Gibbs samples’ counts of event tuples having frame k and a particular verb path,8 and N is the number of token comparisons (i.e.
    Page 5, “Experiments”

See all papers in Proc. ACL 2013 that mention Gibbs sampling.

See all papers in Proc. ACL that mention Gibbs sampling.

Back to top.

gold-standard

Appears in 4 sentences as: gold-standard (4)
In Learning to Extract International Relations from Political Context
  1. “Express intent to deescalate military engagement”), we elect to measure model quality as lexical scale parity: whether all the predicate paths within one automatically learned frame tend to have similar gold-standard scale scores.
    Page 5, “Experiments”
  2. (This measures cluster cohesiveness against a one-dimensional continuous scale, instead of measuring cluster cohesiveness against a gold-standard clustering as in VI, Rand index, or purity.)
    Page 5, “Experiments”
  3. We assign each path 212 a gold-standard scale g(w) by resolving through its matching pattern’s CAMEO code.
    Page 5, “Experiments”
  4. Figure 2: The USA—>Iraq directed dyad, analyzed by smoothed (above) and vanilla (below) models, showing (1) gold-standard MID values (red intervals along top), (2) weeks with nonzero event counts (vertical lines along X-aXis), (3) posterior E[0k,USA,IRQ,t] inferences for two frames chosen from two different K = 5 models, and (4) most common verb paths for each frame (right).
    Page 6, “Experiments”

See all papers in Proc. ACL 2013 that mention gold-standard.

See all papers in Proc. ACL that mention gold-standard.

Back to top.

language model

Appears in 4 sentences as: language model (3) Language model: (1)
In Learning to Extract International Relations from Political Context
  1. 0 Language model:
    Page 3, “Model”
  2. Thus the language model is very similar to a topic model’s generation of token topics and wordtypes.
    Page 3, “Model”
  3. After randomly initializing all 77k,8,7.,t, inference is performed by a blocked Gibbs sampler, alternating resamplings for three major groups of variables: the language model (z,gb), context model (07,7, [3, p), and the 77, 6 variables, which bottleneck between the submodels.
    Page 4, “Inference”
  4. The language model sampler sequentially updates every za) (and implicitly gb via collapsing) in the manner of Griffiths and Steyvers (2004): p(z(i)|6, ma), 1)) oc 68,r,t,z(nw,z + b/V)/(nz + b), where counts 77 are for all event tuples besides 7'.
    Page 4, “Inference”

See all papers in Proc. ACL 2013 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

logistic regression

Appears in 4 sentences as: logistic regression (4)
In Learning to Extract International Relations from Political Context
  1. Deduplication removes 8.5% of articles.5 For topic filtering, we apply a series of keyword filters to remove sports and finance news, and also apply a text classifier for diplomatic and military news, trained on several hundred manually labeled news articles (using El-regularized logistic regression with unigram and bigram features).
    Page 2, “Data”
  2. We also create a baseline El-regularized logistic regression that uses normalized dependency path counts as the features (10,457 features).
    Page 6, “Experiments”
  3. The verb-path logistic regression performs strongly at AUC 0.62; it outperforms all of the vanilla frame models.
    Page 6, “Experiments”
  4. Green line is the verb-path logistic regression baseline.
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention logistic regression.

See all papers in Proc. ACL that mention logistic regression.

Back to top.

natural language

Appears in 4 sentences as: Natural Language (1) natural language (3)
In Learning to Extract International Relations from Political Context
  1. Our unsupervised model brings together familiar components in natural language processing (like parsers and topic models) with contextual political information—temporal and dyad dependence—to infer latent event classes.
    Page 1, “Abstract”
  2. (This highlights how better natural language processing could help the model, and the dangers of false positives for this type of data analysis, especially in small-sample drilldowns.)
    Page 8, “Experiments”
  3. 6.2 Events in Natural Language Processing
    Page 9, “Related Work”
  4. Political event extraction from news has also received considerable attention within natural language processing in part due to government-funded challenges such as MUC-3 and MUC-4 (Lehnert, 1994), which focused on the extraction of terrorist events, as well as the more recent ACE program.
    Page 9, “Related Work”

See all papers in Proc. ACL 2013 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.

rule-based

Appears in 3 sentences as: rule-based (3)
In Learning to Extract International Relations from Political Context
  1. First, we compare the automatically learned verb classes to a preexisting ontology and handcrafted verb patterns from TABARI,1 an open-source and widely used rule-based event extraction system for this domain.
    Page 1, “Introduction”
  2. Beginning in the mid-19808, political scientists began experimenting with automated rule-based extraction systems (Schrodt and Gerner, 1994).
    Page 8, “Related Work”
  3. These efforts culminated in the open-source program, TABARI, which uses pattern matching from extensive hand-developed phrase dictionaries, combined with basic part of speech tagging (Schrodt, 2001); a rough analogue in the information extraction literature might be the rule-based , finite-state FASTUS system for MUC IE (Hobbs et al., 1997), though TABARI is restricted to single sentence analysis.
    Page 8, “Related Work”

See all papers in Proc. ACL 2013 that mention rule-based.

See all papers in Proc. ACL that mention rule-based.

Back to top.