Sorani Kurdish versus Kurmanji Kurdish: An Empirical Comparison
Sheykh Esmaili, Kyumars and Salavati, Shahin

Article Structure

Abstract

Resource scarcity along with diversity—both in dialect and script—are the two primary challenges in Kurdish language processing.

Introduction

Despite having 20 to 30 millions of native speakers (Haig and Matras, 2002; Hassanpour et al., 2012; Thackston, 2006b; Thackston, 2006a), Kurdish is among the less-resourced languages for which the only linguistic resource available on the Web is raw text (Walther and Sagot, 2010).

The Kurdish Language and Dialects

Kurdish belongs to the Indo-Iranian family of Indo-European languages.

The Pewan Corpus

Text corpora are essential to Computational Linguistics and Natural Language Processing.

Empirical Study

In the first part of this section, we first look at the character and word frequencies and try to obtain some insights about the phonological and lexical correlations and discrepancies between Sorani and Kurmanji.

Conclusions and Future Work

In this paper we took the first steps towards addressing the two main challenges in Kurdish language processing, namely, resource scarcity and diversity.

Topics

Linear Regression

Appears in 4 sentences as: Linear Regression (2) linear regression (2)
In Sorani Kurdish versus Kurmanji Kurdish: An Empirical Comparison
  1. Table 2: Heaps’ Linear Regression
    Page 4, “Empirical Study”
  2. As the curves in Figure 4 and the linear regression coefficients in Table 2 show, the growth rate of distinct words in both Sorani and Kurmanji Kurdish are higher than Persian and English.
    Page 4, “Empirical Study”
  3. Table 3: Zipf’s Linear Regression
    Page 5, “Empirical Study”
  4. The results of our experiment—plotted curves in Figure 5 and linear regression coefficients in Table 3— show that: (i) the distribution of the top most frequent words in Sorani is uniquely different; it first shows a sharper drop in the top 10 words and then a slower drop for the words ranked between 10 and 100, and (ii) in the remaining parts of the curves, both Kurmanji and Sorani behave similarly; this is also reflected in their values of coefficient 2 (1.33 and 1.31).
    Page 5, “Empirical Study”

See all papers in Proc. ACL 2013 that mention Linear Regression.

See all papers in Proc. ACL that mention Linear Regression.

Back to top.