MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
Chan, Yee Seng and Ng, Hwee Tou

Article Structure

Abstract

We propose an automatic machine translation (MT) evaluation metric that calculates a similarity score (based on precision and recall) of a pair of sentences.

Introduction

In recent years, machine translation (MT) research has made much progress, which includes the introduction of automatic metrics for MT evaluation.

Automatic Evaluation Metrics

In this section, we describe BLEU, and the three metrics which achieved higher correlation results than BLEU in the recent ACL-07 MT workshop.

Metric Design Considerations

We first review some aspects of existing metrics and highlight issues that should be considered when designing an MT evaluation metric.

Topics

unigram

Appears in 24 sentences as: unigram (23) unigrams (15)
In MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
  1. Then, unigram matching is performed on the remaining words that are not matched using paraphrases.
    Page 3, “Automatic Evaluation Metrics”
  2. Based on the matches, ParaEval will then elect to use either unigram precision or unigram recall as its score for the sentence pair.
    Page 3, “Automatic Evaluation Metrics”
  3. Based on the number of word or unigram matches and the amount of string fragmentation represented by the alignment, METEOR calculates a score for the pair of strings.
    Page 3, “Automatic Evaluation Metrics”
  4. In aligning the unigrams, each unigram in one string is mapped, or linked, to at most one unigram in the other string.
    Page 3, “Automatic Evaluation Metrics”
  5. These word alignments are created incrementally through a series of stages, where each stage only adds alignments between unigrams which have not been matched in previous stages.
    Page 3, “Automatic Evaluation Metrics”
  6. If there is a tie, then the alignment with the least number of unigram mapping crosses is selected.
    Page 3, “Automatic Evaluation Metrics”
  7. The “exact” stage maps unigrams if they have the same surface form.
    Page 3, “Automatic Evaluation Metrics”
  8. The “porter stem” stage then considers the remaining unmapped unigrams and maps them if they are the same after applying the Porter stemmer.
    Page 3, “Automatic Evaluation Metrics”
  9. Finally, the “WN synonymy” stage considers all remaining unigrams and maps two unigrams if they are synonyms in the WordNet sense inventory (Miller, 1990).
    Page 3, “Automatic Evaluation Metrics”
  10. Once the final alignment has been produced, unigram precision P (number of unigram matches m divided by the total number of system unigrams) and unigram recall R (m divided by the total number of reference unigrams ) are calculated and combined into a single parameterized harmonic mean (Rij sber-gen, 1979):
    Page 3, “Automatic Evaluation Metrics”
  11. To account for longer matches and the amount of fragmentation represented by the alignment, METEOR groups the matched unigrams into as few chunks as possible and imposes a penalty based on the number of chunks.
    Page 3, “Automatic Evaluation Metrics”

See all papers in Proc. ACL 2008 that mention unigram.

See all papers in Proc. ACL that mention unigram.

Back to top.

BLEU

Appears in 20 sentences as: BLEU (23)
In MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
  1. Among all the automatic MT evaluation metrics, BLEU (Papineni et al., 2002) is the most widely used.
    Page 1, “Introduction”
  2. Although BLEU has played a crucial role in the progress of MT research, it is becoming evident that BLEU does not correlate with human judgement
    Page 1, “Introduction”
  3. The results show that, as compared to BLEU , several recently proposed metrics such as Semantic-role overlap (Gimenez and Marquez, 2007), ParaEval-recall (Zhou et al., 2006), and METEOR (Banerjee and Lavie, 2005) achieve higher correlation.
    Page 1, “Introduction”
  4. Current metrics (such as BLEU , METEOR, Semantic-role overlap, ParaEval-recall, etc.)
    Page 2, “Introduction”
  5. In contrast, most other metrics (notably BLEU ) limit themselves to matching based only on the surface form of words.
    Page 2, “Introduction”
  6. In this section, we describe BLEU, and the three metrics which achieved higher correlation results than BLEU in the recent ACL-07 MT workshop.
    Page 2, “Automatic Evaluation Metrics”
  7. 2.1 BLEU
    Page 2, “Automatic Evaluation Metrics”
  8. BLEU (Papineni et al., 2002) is essentially a precision-based metric and is currently the standard metric for automatic evaluation of MT performance.
    Page 2, “Automatic Evaluation Metrics”
  9. To score a system translation, BLEU tabulates the
    Page 2, “Automatic Evaluation Metrics”
  10. Generally, more n-gram matches result in a higher BLEU score.
    Page 2, “Automatic Evaluation Metrics”
  11. When determining the matches to calculate precision, BLEU uses a modified, or clipped n-gram precision.
    Page 2, “Automatic Evaluation Metrics”

See all papers in Proc. ACL 2008 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

bigrams

Appears in 16 sentences as: bigram (8) bigrams (16)
In MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
  1. Similarly, we also match the bigrams and trigrams of the sentence pair and calculate their corresponding Fmean scores.
    Page 4, “Metric Design Considerations”
  2. where in our experiments, we set N =3, representing calculation of unigram, bigram , and trigram scores.
    Page 4, “Metric Design Considerations”
  3. To determine the number matchbi of bigram matches, a system bigram (lsipsi, l8i+1psi+1) matches a reference bigram (lripme +1 pm.
    Page 4, “Metric Design Considerations”
  4. In the case of bigrams , the matching conditions are lsi = l“.
    Page 4, “Metric Design Considerations”
  5. We add the number of unigram, bigram , and trigram matches found during this phase to matchum, matchbi, and matchm respectively.
    Page 4, “Metric Design Considerations”
  6. each for the remaining set of unigrams, bigrams , and trigrams.
    Page 5, “Metric Design Considerations”
  7. Using bigrams to illustrate, we construct a weighted complete bipartite graph, where each edge 6 connecting a pair of system-reference bigrams has a weight 212(6), indicating the degree of similarity between the bigrams connected.
    Page 5, “Metric Design Considerations”
  8. Note that, without loss of generality, if the number of system nodes and reference nodes ( bigrams ) are not the same, we can simply add dummy nodes with connecting edges of weight 0 to obtain a complete bipartite graph with equal number of nodes on both sides.
    Page 5, “Metric Design Considerations”
  9. Further, if we are comparing bigrams or trigrams, we impose an additional condition: 8,- 75 0, for l g 2' g 22, else we will set 212(6) 2 0.
    Page 5, “Metric Design Considerations”
  10. In the top half of Figure l, we show an example of a complete bipartite graph, constructed for a set of three system bigrams (31, 32, 33) and three reference bigrams (21,22,23), and the weight of the connecting edge between two bigrams represents their degree of similarity.
    Page 5, “Metric Design Considerations”
  11. Next, we aim to find a maximum weight matching (or alignment) between the bigrams such that each system (reference) bigram is connected to exactly one reference (system) bigram .
    Page 5, “Metric Design Considerations”

See all papers in Proc. ACL 2008 that mention bigrams.

See all papers in Proc. ACL that mention bigrams.

Back to top.

n-gram

Appears in 14 sentences as: N-gram (1) n-gram (16)
In MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
  1. number of n-gram matches of the system translation against one or more reference translations.
    Page 2, “Automatic Evaluation Metrics”
  2. Generally, more n-gram matches result in a higher BLEU score.
    Page 2, “Automatic Evaluation Metrics”
  3. When determining the matches to calculate precision, BLEU uses a modified, or clipped n-gram precision.
    Page 2, “Automatic Evaluation Metrics”
  4. With this, an n-gram (from both the system and reference translation) is considered to be exhausted or used after participating in a match.
    Page 2, “Automatic Evaluation Metrics”
  5. 4.1 Using N-gram Information
    Page 4, “Metric Design Considerations”
  6. Lemma and POS match Representing each n-gram by its sequence of lemma and POS-tag pairs, we first try to perform an exact match in both lemma and POS-tag.
    Page 4, “Metric Design Considerations”
  7. In all our n-gram matching, each n-gram in the system translation can only match at most one n-gram in the reference translation.
    Page 4, “Metric Design Considerations”
  8. In an n-gram bipartite graph, the similarity score, or the weight 212(6) of the edge 6 connecting a system
    Page 5, “Metric Design Considerations”
  9. n-gram ([81 p81, .
    Page 5, “Metric Design Considerations”
  10. ,lsnpsn) and a reference n-gram ([7.1 pm, .
    Page 5, “Metric Design Considerations”
  11. This captures the intuition that in matching a system n-gram against a reference 22-gram, where n > 1, we require each system token to have at least some degree of similarity with the corresponding reference token.
    Page 5, “Metric Design Considerations”

See all papers in Proc. ACL 2008 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

sentence pair

Appears in 14 sentences as: sentence pair (14) sentence pairs: (1)
In MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
  1. To match each system item to at most one reference item, we model the items in the sentence pair as nodes in a bipartite graph and use the Kuhn-Munkres algorithm (Kuhn, 1955; Munkres, 1957) to find a maximum weight matching (or alignment) between the items in polynomial time.
    Page 1, “Introduction”
  2. Also, metrics such as METEOR determine an alignment between the items of a sentence pair by using heuristics such as the least number of matching crosses.
    Page 2, “Introduction”
  3. Also, this framework allows for defining arbitrary similarity functions between two matching items, and we could match arbitrary concepts (such as dependency relations) gathered from a sentence pair .
    Page 2, “Introduction”
  4. A uniform average of the counts is then taken as the score for the sentence pair .
    Page 2, “Automatic Evaluation Metrics”
  5. Based on the matches, ParaEval will then elect to use either unigram precision or unigram recall as its score for the sentence pair .
    Page 3, “Automatic Evaluation Metrics”
  6. Similarly, we also match the bigrams and trigrams of the sentence pair and calculate their corresponding Fmean scores.
    Page 4, “Metric Design Considerations”
  7. To obtain a single similarity score scores for this sentence pair 3, we simply average the three Fmean scores.
    Page 4, “Metric Design Considerations”
  8. Then, to obtain a single similarity score Sim-score for the entire system corpus, we repeat this process of calculating a scores for each system-reference sentence pair 3, and compute the average over all |S | sentence pairs:
    Page 4, “Metric Design Considerations”
  9. In this subsection, we describe in detail how we match the n-grams of a system-reference sentence pair .
    Page 4, “Metric Design Considerations”
  10. In the previous subsection, we describe our method of using bipartite graphs for matching of n-grams found in a sentence pair .
    Page 5, “Metric Design Considerations”
  11. This use of bipartite graphs, however, is a very general framework to obtain an optimal alignment of the corresponding “information items” contained within a sentence pair .
    Page 5, “Metric Design Considerations”

See all papers in Proc. ACL 2008 that mention sentence pair.

See all papers in Proc. ACL that mention sentence pair.

Back to top.

evaluation metrics

Appears in 11 sentences as: evaluation metric (5) evaluation metrics (6)
In MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
  1. We propose an automatic machine translation (MT) evaluation metric that calculates a similarity score (based on precision and recall) of a pair of sentences.
    Page 1, “Abstract”
  2. When evaluated on data from the ACL—07 MT workshop, our proposed metric achieves higher correlation with human judgements than all 11 automatic MT evaluation metrics that were evaluated during the workshop.
    Page 1, “Abstract”
  3. Since human evaluation of MT output is time consuming and expensive, having a robust and accurate automatic MT evaluation metric that correlates well with human judgement is invaluable.
    Page 1, “Introduction”
  4. Among all the automatic MT evaluation metrics , BLEU (Papineni et al., 2002) is the most widely used.
    Page 1, “Introduction”
  5. During the recent ACL-07 workshop on statistical MT (Callison-Burch et al., 2007), a total of 11 automatic MT evaluation metrics were evaluated for correlation with human judgement.
    Page 1, “Introduction”
  6. In this paper, we propose a new automatic MT evaluation metric , MAXSIM, that compares a pair of system-reference sentences by extracting n-grams and dependency relations.
    Page 1, “Introduction”
  7. Finally, when evaluated on the datasets of the recent ACL-07 MT workshop (Callison-Burch et al., 2007), our proposed metric achieves higher correlation with human judgements than all of the 11 automatic MT evaluation metrics evaluated during the workshop.
    Page 2, “Introduction”
  8. We first review some aspects of existing metrics and highlight issues that should be considered when designing an MT evaluation metric .
    Page 3, “Metric Design Considerations”
  9. The ACL-07 MT workshop evaluated the translation quality of MT systems on various translation tasks, and also measured the correlation (with human judgement) of 11 automatic MT evaluation metrics .
    Page 6, “Metric Design Considerations”
  10. In this paper, we present MAXSIM, a new automatic MT evaluation metric that computes a similarity score between corresponding items across a sentence pair, and uses a bipartite graph to obtain an optimal matching between item pairs.
    Page 8, “Metric Design Considerations”
  11. When evaluated for correlation with human judgements, MAXSIM achieves superior results when compared to current automatic MT evaluation metrics .
    Page 8, “Metric Design Considerations”

See all papers in Proc. ACL 2008 that mention evaluation metrics.

See all papers in Proc. ACL that mention evaluation metrics.

Back to top.

dependency relations

Appears in 10 sentences as: Dependency Relations (1) dependency relations (10)
In MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
  1. This general framework allows us to use arbitrary similarity functions between items, and to incorporate different information in our comparison, such as n-grams, dependency relations , etc.
    Page 1, “Abstract”
  2. In this paper, we propose a new automatic MT evaluation metric, MAXSIM, that compares a pair of system-reference sentences by extracting n-grams and dependency relations .
    Page 1, “Introduction”
  3. Recognizing that different concepts can be expressed in a variety of ways, we allow matching across synonyms and also compute a score between two matching items (such as between two n-grams or between two dependency relations ), which indicates their degree of similarity with each other.
    Page 1, “Introduction”
  4. Also, this framework allows for defining arbitrary similarity functions between two matching items, and we could match arbitrary concepts (such as dependency relations ) gathered from a sentence pair.
    Page 2, “Introduction”
  5. Hence, using information such as synonyms or dependency relations could potentially address the issue better.
    Page 3, “Metric Design Considerations”
  6. 4.2 Dependency Relations
    Page 5, “Metric Design Considerations”
  7. Hence, besides matching based on n-gram strings, we can also match other “information items”, such as dependency relations .
    Page 5, “Metric Design Considerations”
  8. In our work, we train the MSTParser4 (McDonald et al., 2005) on the Penn Treebank Wall Street Journal (WSJ) corpus, and use it to extract dependency relations from a sentence.
    Page 6, “Metric Design Considerations”
  9. To compute the similarity score when incorporating dependency relations, we average the Fmean scores for unigrams, bigrams, trigrams, and dependency relations .
    Page 6, “Metric Design Considerations”
  10. Also, we have seen that dependency relations help to improve correlation on the NIST dataset, but not on the ACL-07 MT workshop datasets.
    Page 8, “Metric Design Considerations”

See all papers in Proc. ACL 2008 that mention dependency relations.

See all papers in Proc. ACL that mention dependency relations.

Back to top.

human judgements

Appears in 10 sentences as: human judgement (4) human judgements (5) human judges (1)
In MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
  1. When evaluated on data from the ACL—07 MT workshop, our proposed metric achieves higher correlation with human judgements than all 11 automatic MT evaluation metrics that were evaluated during the workshop.
    Page 1, “Abstract”
  2. Since human evaluation of MT output is time consuming and expensive, having a robust and accurate automatic MT evaluation metric that correlates well with human judgement is invaluable.
    Page 1, “Introduction”
  3. Although BLEU has played a crucial role in the progress of MT research, it is becoming evident that BLEU does not correlate with human judgement
    Page 1, “Introduction”
  4. During the recent ACL-07 workshop on statistical MT (Callison-Burch et al., 2007), a total of 11 automatic MT evaluation metrics were evaluated for correlation with human judgement .
    Page 1, “Introduction”
  5. Finally, when evaluated on the datasets of the recent ACL-07 MT workshop (Callison-Burch et al., 2007), our proposed metric achieves higher correlation with human judgements than all of the 11 automatic MT evaluation metrics evaluated during the workshop.
    Page 2, “Introduction”
  6. In the ACL-07 MT workshop, ParaEval based on recall (ParaEval-recall) achieves good correlation with human judgements .
    Page 3, “Automatic Evaluation Metrics”
  7. The ACL-07 MT workshop evaluated the translation quality of MT systems on various translation tasks, and also measured the correlation (with human judgement ) of 11 automatic MT evaluation metrics.
    Page 6, “Metric Design Considerations”
  8. For human evaluation of the MT submissions, four different criteria were used in the workshop: Adequacy (how much of the original meaning is expressed in a system translation), Fluency (the translation’s fluency), Rank (different translations of a single source sentence are compared and ranked from best to worst), and Constituent (some constituents from the parse tree of the source sentence are translated, and human judges have to rank these translations).
    Page 7, “Metric Design Considerations”
  9. For this dataset, human judgements are available on adequacy and fluency for six system submissions, and there are four English reference translation texts.
    Page 7, “Metric Design Considerations”
  10. When evaluated for correlation with human judgements , MAXSIM achieves superior results when compared to current automatic MT evaluation metrics.
    Page 8, “Metric Design Considerations”

See all papers in Proc. ACL 2008 that mention human judgements.

See all papers in Proc. ACL that mention human judgements.

Back to top.

n-grams

Appears in 9 sentences as: n-grams (9)
In MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
  1. This general framework allows us to use arbitrary similarity functions between items, and to incorporate different information in our comparison, such as n-grams , dependency relations, etc.
    Page 1, “Abstract”
  2. In this paper, we propose a new automatic MT evaluation metric, MAXSIM, that compares a pair of system-reference sentences by extracting n-grams and dependency relations.
    Page 1, “Introduction”
  3. Recognizing that different concepts can be expressed in a variety of ways, we allow matching across synonyms and also compute a score between two matching items (such as between two n-grams or between two dependency relations), which indicates their degree of similarity with each other.
    Page 1, “Introduction”
  4. We note, however, that matches between items (such as words, n-grams , etc.)
    Page 3, “Metric Design Considerations”
  5. In this subsection, we describe in detail how we match the n-grams of a system-reference sentence pair.
    Page 4, “Metric Design Considerations”
  6. Lemma match For the remaining set of n-grams that are not yet matched, we now relax our matching criteria by allowing a match if their corresponding lemmas match.
    Page 4, “Metric Design Considerations”
  7. Bipartite graph matching For the remaining n-grams that are not matched so far, we try to match them by constructing bipartite graphs.
    Page 4, “Metric Design Considerations”
  8. In the previous subsection, we describe our method of using bipartite graphs for matching of n-grams found in a sentence pair.
    Page 5, “Metric Design Considerations”
  9. Note that so far, the parameters of MAXSIM are not optimized and we simply perform uniform averaging of the different n-grams and dependency scores.
    Page 8, “Metric Design Considerations”

See all papers in Proc. ACL 2008 that mention n-grams.

See all papers in Proc. ACL that mention n-grams.

Back to top.

similarity score

Appears in 8 sentences as: similarity score (8)
In MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
  1. We propose an automatic machine translation (MT) evaluation metric that calculates a similarity score (based on precision and recall) of a pair of sentences.
    Page 1, “Abstract”
  2. Unlike most metrics, we compute a similarity score between items across the two sentences.
    Page 1, “Abstract”
  3. The weights (from the edges) of the resulting graph will then be added to determine the final similarity score between the pair of sentences.
    Page 1, “Introduction”
  4. To obtain a single similarity score scores for this sentence pair 3, we simply average the three Fmean scores.
    Page 4, “Metric Design Considerations”
  5. Then, to obtain a single similarity score Sim-score for the entire system corpus, we repeat this process of calculating a scores for each system-reference sentence pair 3, and compute the average over all |S | sentence pairs:
    Page 4, “Metric Design Considerations”
  6. In an n-gram bipartite graph, the similarity score , or the weight 212(6) of the edge 6 connecting a system
    Page 5, “Metric Design Considerations”
  7. To compute the similarity score when incorporating dependency relations, we average the Fmean scores for unigrams, bigrams, trigrams, and dependency relations.
    Page 6, “Metric Design Considerations”
  8. In this paper, we present MAXSIM, a new automatic MT evaluation metric that computes a similarity score between corresponding items across a sentence pair, and uses a bipartite graph to obtain an optimal matching between item pairs.
    Page 8, “Metric Design Considerations”

See all papers in Proc. ACL 2008 that mention similarity score.

See all papers in Proc. ACL that mention similarity score.

Back to top.

NIST

Appears in 7 sentences as: NIST (7)
In MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
  1. To evaluate our metric, we conduct experiments on datasets from the ACL-07 MT workshop and NIST
    Page 6, “Metric Design Considerations”
  2. Table 4: Correlations on the NIST MT 2003 dataset.
    Page 7, “Metric Design Considerations”
  3. 5.2 NIST MT 2003 Dataset
    Page 7, “Metric Design Considerations”
  4. We also conduct experiments on the test data (LDC2006T04) of NIST MT 2003 Chinese-English translation task.
    Page 7, “Metric Design Considerations”
  5. In the recent work of (Lavie and Agarwal, 2007), the values of these parameters were tuned to be (a=0.81, 6:083, 7:028), based on experiments on the NIST 2003 and 2004 Arabic-English evaluation datasets.
    Page 7, “Metric Design Considerations”
  6. We found that by setting a=0.7, MAXSIMn+d could achieve a correlation of 0.972 on the NIST MT 2003 dataset.
    Page 8, “Metric Design Considerations”
  7. Also, we have seen that dependency relations help to improve correlation on the NIST dataset, but not on the ACL-07 MT workshop datasets.
    Page 8, “Metric Design Considerations”

See all papers in Proc. ACL 2008 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

semantic roles

Appears in 6 sentences as: semantic role (2) Semantic Roles (1) semantic roles (4)
In MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
  1. 2.2 Semantic Roles
    Page 2, “Automatic Evaluation Metrics”
  2. This metric first counts the number of lexical overlaps SR-Or-t for all the different semantic roles I that are found in the system and reference translation sentence.
    Page 2, “Automatic Evaluation Metrics”
  3. In their work, the different semantic roles r they considered include the various core and adjunct arguments as defined in the PropBank project (Palmer et al., 2005).
    Page 2, “Automatic Evaluation Metrics”
  4. To extract semantic roles from a sentence, several processes such as lemmatization, part-of-speech tagging, base phrase chunking, named entity tagging, and finally semantic role tagging need to be performed.
    Page 2, “Automatic Evaluation Metrics”
  5. Besides matching a pair of system-reference sentences based on the surface form of words, previous work such as (Gimenez and Marquez, 2007) and (Rajman and Hartley, 2002) had shown that deeper linguistic knowledge such as semantic roles and syntax can be usefully exploited.
    Page 5, “Metric Design Considerations”
  6. Possible future directions include adding semantic role information, using the distance between item pairs based on the token position within each sentence as additional weighting consideration, etc.
    Page 8, “Metric Design Considerations”

See all papers in Proc. ACL 2008 that mention semantic roles.

See all papers in Proc. ACL that mention semantic roles.

Back to top.

word alignment

Appears in 3 sentences as: word alignment (2) word alignments (1)
In MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
  1. Although a maximum weight bipartite graph was also used in the recent work of (Taskar et al., 2005), their focus was on learning supervised models for single word alignment between sentences from a source and target language.
    Page 2, “Introduction”
  2. Given a pair of strings to compare (a system translation and a reference translation), METEOR (Banerjee and Lavie, 2005) first creates a word alignment between the two strings.
    Page 3, “Automatic Evaluation Metrics”
  3. These word alignments are created incrementally through a series of stages, where each stage only adds alignments between unigrams which have not been matched in previous stages.
    Page 3, “Automatic Evaluation Metrics”

See all papers in Proc. ACL 2008 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.