Is a 204 cm Man Tall or Small ? Acquisition of Numerical Common Sense from the Web
Narisawa, Katsuma and Watanabe, Yotaro and Mizuno, Junta and Okazaki, Naoaki and Inui, Kentaro

Article Structure

Abstract

This paper presents novel methods for modeling numerical common sense: the ability to infer whether a given number (e.g., three billion) is large, small, or normal for a given context (e.g., number of people facing a water shortage).

Introduction

Textual entailment recognition (RTE) involves a wide range of semantic inferences to determine whether the meaning of a hypothesis sentence (it) can be inferred from another text (75) (Dagan et al., 2006).

Related work

Surprisingly, NLP research has paid little attention to semantic processing of numerical expressions.

Topics

semantic representation

Appears in 9 sentences as: Semantic representation (1) semantic representation (6) semantic representations (2)
In Is a 204 cm Man Tall or Small ? Acquisition of Numerical Common Sense from the Web
  1. We describe a method of normalizing numerical expressions referring to the same amount in text into a unified semantic representation .
    Page 1, “Introduction”
  2. For instance, the context of 319 people in the sentence 319 people face a water shortage is “face” and “water shortage.” In order to extract and aggregate numerical expressions in various documents, we converted the numerical expressions into semantic representations (to be described in Section 4.1), and extracted their context (to be described in Section 4.2).
    Page 3, “Related work”
  3. Numerical Semantic representation Expression Value | Unit ‘ Mod.
    Page 4, “Related work”
  4. The first step for collecting numerical expressions is to recognize when a numerical expression is mentioned and then to normalize it into a semantic representation .
    Page 4, “Related work”
  5. The semantic representation of a numerical expression consists of three fields: the value or range of the real number(s)5, the unit (a string), and the optional modifiers.
    Page 4, “Related work”
  6. Table 2 shows some examples of numerical expressions and their semantic representations .
    Page 4, “Related work”
  7. If the words that precede or follow an extracted number match an entry in the dictionary, we change the semantic representation as described in the operation.
    Page 5, “Related work”
  8. The component in Section 4.1 recognizes $300 as a numerical expression, then normalizes it into a semantic representation .
    Page 5, “Related work”
  9. Both approaches start with collecting the numbers (in semantic representation ) and contexts of numerical expressions from a large number of sentences (Shinzato et al., 2012), and storing them
    Page 5, “Related work”

See all papers in Proc. ACL 2013 that mention semantic representation.

See all papers in Proc. ACL that mention semantic representation.

Back to top.

gold-standard

Appears in 8 sentences as: gold-standard (8)
In Is a 204 cm Man Tall or Small ? Acquisition of Numerical Common Sense from the Web
  1. In order to prepare a gold-standard data set, we obtained 1,041 sentences by randomly sampling about 1% of the sentences containing numbers (Arabic digits and/or Chinese numerical characters) in a Japanese Web corpus (100 million pages) (Shinzato et al., 2012).
    Page 6, “Related work”
  2. recall using the gold-standard data set”.
    Page 7, “Related work”
  3. We built a gold-standard data set for numerical common sense.
    Page 7, “Related work”
  4. 16All fields (value, unit, modifier) of the extracted tuple must match the gold-standard data set.
    Page 7, “Related work”
  5. We measured the correctness of the proposed methods on the gold-standard data.
    Page 7, “Related work”
  6. With the strict criterion, the method must predict a label identical to that in the gold-standard .
    Page 7, “Related work”
  7. With the lenient criterion, the method was also allowed to predict either large/small or normal when the gold-standard label was relatively large/small.
    Page 7, “Related work”
  8. The clue-based approach tends to predict small even if the gold-standard label is normal.
    Page 8, “Related work”

See all papers in Proc. ACL 2013 that mention gold-standard.

See all papers in Proc. ACL that mention gold-standard.

Back to top.

Human judges

Appears in 3 sentences as: Human judges (1) human judges (1) humans’ judgments (1)
In Is a 204 cm Man Tall or Small ? Acquisition of Numerical Common Sense from the Web
  1. We utilize large and small modifiers (described in Section 4.1), which correspond to textual clues m0 (as many as, as large as) and shika (only, as few as), respectively, for detecting humans’ judgments .
    Page 6, “Related work”
  2. We asked three human judges to annotate every numerical expression with one of six labels, small, relatively small, normal, relatively large, large, and unsure.
    Page 7, “Related work”
  3. The cause of this error is exemplified by the sentence, “there are two reasons.” Human judges label normal to the numerical expression two reasons, but the method predicts small.
    Page 8, “Related work”

See all papers in Proc. ACL 2013 that mention Human judges.

See all papers in Proc. ACL that mention Human judges.

Back to top.