Quantification of Diabetes Comorbidity Risks across Life Using Nation-Wide Big Claims Data

To explore these issues, we develop a framework allowing for the first time to quantify nationwide risks and their age and sex-dependence for each diabetic comorbidity, and whether the association may be consequential or causal, in a sample of almost two million patients. This study is equivalent to nearly 40,000 single clinical measurements. We confirm the highly controversial relation of increased risk for Parkinson’s disease in diabetics, using a 10 times larger cohort than previous studies on this relation. Detection of type 1 diabetes leads detection of depressions, whereas there is a strong comorbidity relation between type 2 diabetes and schizophrenia, suggesting similar pathogenic or medication-related mechanisms. We find significant sex differences in the progression of, for instance, sleep disorders and congestive heart failure in diabetic patients. Hypertension is a highly sex-sensitive comorbidity with females being at lower risk during fertile age, but at higher risk othenNise. These results may be useful to improve screening practices in the general population. Clinical management of diabetes must address age and sex-dependence of multiple comorbid conditions.

This study therefore contains almost 40,000 single clinical measurements, all with the maximum patient number available in an entire country. We confirm the relation between diabetes and Parkinson's disease, and find different progression routes of mental disorders in type 1 and type 2 diabetics. Among many other results, we also report significant gender differences in the progression of congestive heart failure, sleep disorders, hypertension, and hyperlipidemia. This work provides the first complete statistical description of all diabetic comorbidities and their dependence on patient age and seX. These results may be of immediate use to improve screening practices and therapy of diabetic patients due to more accurate diagnosis and treatment of important comorbidities.

The worldwide number of adult diabetes patients doubled over the last three decades to approximately 350 million as of 2010, and is expected to double again until 2030 as a result of population ageing and a shift to western lifestyle patterns in developing countries [1]. Diabetes comprises a heterogeneous group of disorders with the most prominent types being type 1 (DM1) and 2 diabetes (DM2). These disorders have different pathophysiology and phenotype; the exact underlying mechanisms, their interplay finally leading to manifestation, progressions of the diseases, and their complications are still unclear. Diabetes is related to a large number of comorbid diseases, including but not limited to vascular complications [2], renal failures [2], neuropathy [2], heart diseases [3, 4], cognitive disorders [5, 6], retinopathy [7], and hypertension [8]. Each of these comorbidities opens up a unique direction of research. Following the methodological approach developed in this work, thousands of such relations can be investigated in parallel. Besides studying the individual diabetic comorbidities and how they depend on patient age and gender, this allows to compare the strength of these relations among each other and to rank them according to their significance.

To exploit the full potential of ‘big data’ for medical sciences the development of novel, quantitative methods to extract clinically relevant features from large datasets of electronic health records (EHR) is necessary. First efforts in this direction have proven to be extremely fruitful by developing or improving data-driven comorbidity indices to predict mortality rates [10], or by studying healthcare utilization and outcome measures of specific patient cohorts [11]. Large-scale analyses of comorbidities using EHR data have demonstrated that human disease phenotypes can be related to each other in highly connected networks with strong pairwise correlations between diseases [12, 13, 14, 15]. In this work we develop a new quantitative framework to measure age and gender-dependent relative risks for all possible comorbidity relations for DM1 and DM2 using medical claims data from almost two million people. We introduce tests to assess the significance of the comorbidity relations, the influence of sex, and whether diabetes is more likely to be a diagnosed before or after the other disease.

The data gives a comprehensive, nationwide picture of the medical condition of most of the approximately 8.3 million Austrians. The patient collective was formed by extracting all persons receiving inpatient care in 2006 or 2007. We identified patients being diagnosed with DM1 or DM2 (1CD 10 codes E10 and E11). Patients who died in 2006 or 2007 were removed. In this way 16 667 DM1 patients (8 355 males and 8 312 females) and 105 904 with DM2 (50 596 males and 55 308 females) were selected. The total sample of inpatients used in this study consists of 1 862 258 patients (1 064 952 females and 797 306 males). From these patients we know their year of birth, seX, ATC codes of all their prescriptions, and the ICD codes of all their diagnoses (main and side-diagnoses).

For the occurrences of each diagnosis x (ICD 10, three-digit-level) a patient-age-resolved cross tabulation with the occurrences of DM1 and DMZ is performed. Symptoms, injuries, pregnancies, and external causes and factors of morbidity were excluded. We therefore test 1 051 diagnosis (ICD 10 codes ranging from A01 to N99) for their co-occurrence with diabetes. The patients are grouped by their age in five-year intervals and by their gender. Patients older than 95 have been excluded. We test 1 051 possible comorbidities for 19 age groups for DM1 and DMZ, giving 39 938 tests. For each diagnosis and age interval a contingency table is built. If each entry in the table is greater than 10, relative risks RR1(2)(x,t) are computed, a chi-squared test is performed and p-values are calculated for rejecting the null hypothesis that co-occurrence of the diagnosis with DM1 or DMZ is independent. This leads to a multiple hypothesis testing problem for each age group where 1 051 hypotheses are tested in parallel. To correct for these multiple comparisons we apply the Benjamini-Hoch-berg procedure [17] to control for the false discovery rate a. This procedure is a multiple comparison correction where the value of a gives the eXpected probability that a null hypothesis is incorrectly rejected. For example, if 100 comorbidities are identified with a false discovery rate a of a = 0.01, the eXpected number of false positives among these comorbidities is one. If there are less than ten co-occurrences or the results are not significant, the relative risk is set to one. For the co-occurrence analysis we use both the main and the side diagnoses of each patient. Table 1. A list of major well-known diabetic comorbidities that is used to validate the results of the co-occurrence analysis. ICD10 Diagnosis G45 Transient cerebral ischemic attacks and related syndromes I10 Essential (primary) hypertension I20 Angina pectoris I21 ST elevation (STEMI) and non-ST elevation (NSTEMI) myocardial infarction I24 Other acute ischemic heart diseases I47 Paroxysmal tachycardia I50 Heart failure I62 Other and unspecified nontraumatic intracranial hemorrhage I65 Occlusion and stenosis of precerebral arteries, not resulting in cerebral infarction I66 Occlusion and stenosis of cerebral arteries, not resulting in cerebral infarction I67 Other cerebrovascular diseases I70 Atherosclerosis I71 Aortic aneurysm and dissection I72 Other aneurysm I73 Other peripheral vascular diseases I74 Arterial embolism and thrombosis N17 Acute kidney failure N18 Chronic kidney disease N19 Unspecified kidney failure

To validate the results of the co-occurrence analysis we compile a list of major known diabetic complications from different literature sources [18, 19, 20]. These lists are based on hand curated collections of diabetic comorbidities, some of them validated using EHR data [19, 20]. These studies disagree on the exact list of 1CD codes for diabetic complications, but each list focusses on cardiovascular, renal, and ophthalmic comorbidities. The 1CD codes that are listed as diabetic complications in each of these studies are therefore used to validate our co-occurrence analysis, see Table 1. Note that, for eXample, mental disorders like depression or pancreatic cancer, both well-known diabetic comor-bidities [5, 6, 21], are not included in any of these studies. Nevertheless, a valid method to detect comorbidities is supposed to pick up a substantial number of the diagnoses listed in Table 1, among other comorbidities. We Will therefore be interested in the recall R(a) as a function of the false discovery rate a. R(a) is the probability that a diabetic comorbidity listed in Table 1 is also identified by our co-occurrence analysis at a given level of a.

The sex ratio SR(x,t) is related to the quotient of the percentage of female and male diabetes patients in age group tthat also have diagnoses x or are prescribed a medication x. Denote the number of male (female) DM1 and DM2 patients in age group tby Dmog(t) and the number of male (female) diabetes patients Who also have diagnoses or medication x by me(x,t). The sex ratio SR(x,t) is then related to the logarithmic quotient of the percentage of female and male diabetes patients Who also have diagnoses x,

Positive (negative) values of SR(x,t) indicate that the co-occurrence is more likely for females (males). To assert the statistical significance of nonzero SR(x,t) values we build a contingency table for all diabetes patients of a given age group t. The table contains the two variables seX and co-occurrence with diagnosis/ medication x. If the null hypothesis of statistical independence of these two variables cannot be rejected in a chi-squared test using a p-value of p = 0.05 the seX ratio is set to zero, SR(x,t) = O. Lead/lag indicator. The lead/lag indicators assess whether patients with diagnoses 61,- are more likely to be later diagnosed with another disease x, the lead indicator I lead(di,x), or whether it is more likely that people having diagnoses x will be diagnosed with diabetes, the lag indicator I lag(d,-,x). There exist several known biases in EHR data that need to be addressed in the definition of these indicators [22]. (i) The first occurrence of a coding of a diagnosis in the EHR data will typically not correspond to the true initial diagnosis of the disease. (ii) The data only spans two years, which may not be enough to observe the manifestation of diabetic complications directly. We use the following methodology to measure the lead/lag indicators and adjust for these known biases. Let us consider the lead indicator I lead(d,-,x) that measures if the diagnosis x is typically made after the diabetes diagnosis. Given the limitations of our data, we cannot observe the typical time between the manifestations of the two diseases. We can, however, measure whether there is a tendency that x will be diagnosed in a patient that already had a prior diabetes diagnosis. As opposed to the co-occurrence analysis, it is crucial for the lead/lag analysis to distinguish between main and side diagnoses. To this end we consider the probability that a male (female) patient has a diabetes diagnosis (main or side diagnosis) in year t1, and a main-diagnosis x in year t2, but no diagnosis of x in t1 (main or side diagnosis). Denote this probability by pmm(x,t2| di, Ix,t1) for males (females). This number overestimates the true effect size, since some cases where a patient does not have diagnosis x in year t1 might be due to inaccuracies in the coding or incompleteness of the data, in particular with respect to unknown preexisting conditions. However, we assume that these errors are not systematic in the sense that they are equally likely to influence the data for year t1 and t2. If there is no true temporal ordering in the onsets of d,and x, the value of pmm(x,t2|d,-, Ix,t1) just measures noise due to incomplete or inaccurate data. But this is equally true for the probability that diagnosis x does not occur for a patient in year t2, given that she(he) has both diagnosis 61,- and x in t1, the probability pmm(x,t1|di, Ix,t2). If there is a substantial tendency that x is diagnosed after the onset of di, however, these two probabilities are likely to differ. The lead indicator I lead(d,-,x) is therefore given by

We therefore exclude diagnoses x from the analysis if they have less than a threshold of 2 male or female patients that also have d,- in t2. In the following we set t1 = 2006 and t2 = 2007. For the lag indicator for DM1 we exclude all patients older than 30.

Surrogate data is created by keeping the list of diagnoses for each patient fixed and by shuffling the information about the year when the diagnoses were made. Assume that patient p has 11? diagnosis {xi} made in the years {T1} with i E {1,. . ., 111,}. The surrogate data is constructed by replacing {Ti} by a random permutation of itself. This procedure is repeated 1 000 times and the lead and lag indicators are computed for each surrogate dataset. We test the null hypothesis that the values for the lead and lag indicators observed in the data are as large as one would expect for indicator values taken from the surrogate data, where the temporal information has been randomly shuffled. The p-value for each lead and lag indicator is the probability of obtaining the observed values for I lead(d,-,x) and I lag(d,-,x) from the surrogate data. The null hypothesis is rejected if p<0.01, that is if out of 1 000 surrogate datasets less than ten give indicator values that are larger than the observed values.

A significant value of the lag indicator I lag(di,x), on the other hand, suggests that diabetes is typically incident in patients already diagnosed with x. A similar approach to study lead/lag behavior between diseases, but without a test for statistical significance of the results, was proposed for networks of comorbid diseases [12].

1(a) shows the fraction of male and female inpatients of the entire population as a function of age. The inpatient fractions are around 20% for children under five, then drop to 10—15% for ages around ten, and from then on rise to more than 80% for 80 year-old patients, With an additional peak for females of age around 30, most likely due to child birth. With increasing patient age the inpatient sample becomes increasingly representative of the entire population.

1(b) shows the fraction of male and female DM1 inpatients as a function of age. The distributions have a first peak around the typical onset-age of ten for both male and females, and a second peak for ages 60 (70) for males (females). Fig. 1(c) shows the fractions of inpatients diagnosed with DMZ as a function of age, with comparably few patients below age thirty, and the bulk of male (female) patients concentrated around age 60 (70).

1 shows the sex ratio SR(x,t) for DM1 patients and their number of diagnoses (d) and received drugs (e); (f) and (g) show the same for DMZ patients. Up to an age of 60 there is an excess of male patients, for older patients there is an excess of females. For drugs there is a male excess only for age up to 60 and for less than 10—20 drugs. For older age and a larger number of drugs there is an excess of female patients. Females below age 60 have fewer diagnoses than males, but especially those with a large number of diagnoses have more prescriptions than males. After age 60, females outweigh males in both diagnoses and prescriptions.

Drugs are classified according to their 3-digit-level ATC codes. The sex ratios for drugs for pain relief, psycholeptics, and psychoanaleptics (N02, N05, N06), but also for diuretics (C03) are dominated by females at all ages. Beta blocking agents (C07), calcium channel blocker (C08), and ACE inhibitors (C09) show an excess of males at ages around 30, but a female excess at older ages. Lipid modifying agents (C10) show an excess of males, whereas the gender ratios for antineoplastic agents (L01) are almost balanced.

The results of the co-occurrence analysis are validated by considering the recall R(a) for the major diabetic comorbidities from Table 1. A false discovery rate of a = 0.001 gives a list of 75 significant comorbidities and a recall of R(a = 0.001) = 0.59.

In the following we choose a threshold of a = 0.01. The expected number of false positives among these comorbidities is 1.23. Note that for this threshold we pick up several diseases that are very closely related to those major diabetic complications that we do not retrieve. For example, we do not pick up the subarachnoid, intracere-bral, and intracranial hemorrhages (I60-I6Z), but we retrieve cerebral infarctions (I63) and other strokes (I64). Similarly at this threshold we do not retrieve aneurysms (I71 —7Z), but artherosclerosis and other peripheral vascular diseases (I70, I73). We identify occlusion and stenosis of cerebral arteries (I66) instead of precerebral arteries (I65).

The comorbidities are also listed in the supplement, SI Table, along with relative risks, p-values, and patient ages for the age group with the smallest p-values for DM1 and DMZ, respectively. In the following we refer to these values whenever referring to the relative risks of a diagnosis with a 95% confidence interval (CI).

The threshold 2 is set to z = 50 for DM1 and DMZ. Table Z shows diagnoses which have been identified as either leading or lagging for male or female DM1 or DMZ patients.

Emphasis is put on comor-bidities that have been disputed in the literature, or where the lead/lag analysis advances our understanding of them. Another important group of results consists of comorbidities for which we find a yet unknown degree of sensitivity to sex. In particular we find for several co-morbidities a certain patient age where the sex ratio switches from an excess of one sex to an excess of the different sex for older ages; we will refer to these patient ages as ‘age switch’. Table 2. Diagnoses are shown which have been identified in the lead/lag analysis. Diabetes leads (comes before other disease) A41 Other sepsis 025 Malignant neoplasm of pancreas C34 Malignant neoplasm of bronchus and lung E16 Other disorders of pancreatic internal secretion F03 Unspecified dementia F32 Depressive episode G45 Transient cerebral ischemic attacks and related syndromes l25 Chronic ischemic heart disease I46 Cardiac arrest I48 Atrial fibrillation and flutter l50 Heart failure l50 Heart failure J18 Pneumonia, unspecified organism N18 Chronic renal failure Diabetes lags (comes after other disease) C44 Other and unspecified malignant neoplasm of skin D40 Neoplasm of uncertain behavior of male genital organs K81 Cholecystitis diabetes type sex Egg For each diagnoses the order (if diabetes leads or lags), gender (‘F’ for females, ‘M’ for males) and diabetes type (1 or 2) where the relationship was detected are listed.

In the literature there is no consensus on whether diabetes patients have a higher risk for Parkinson’s disease (PD), or if there is actually a lower risk or no relation at all. There are two large prospective studies finding an increased risk for PD in diabetes patients, one study finding no relation, and one study reporting lower risk of diabetes [23]. We find that PD is comorbid (2.3, CI 1.9—2.7 for DM1 and 1.5, CI 1.4—1.6 for DM2) with an excess of male patients. It has been suggested that surveillance bias may lead to the reporting of spurious positive correlations between PD and diabetes [23]. Given our patient cohort we can exclude this kind of bias. Note that the size of our patient cohort (1.8 million patients) is at least 10 times larger than the largest cohorts in previous studies on the relation between PD and diabetes

As potential mechanism of this association the involvement of insulin in the regulation of brain dopanergic activity has been proposed [25, 26]. Animal and in vitro studies have shown that insulin and dopamine may exert reciprocal regulation [26].

Depression, schizophrenia, and schizo-affective disorders are also comor-bid. While the relative risks for DM1 patients are highest in the age group 65—70 with values from 1.9—2.3 for these diseases, we find higher risks for DM2 patients at younger ages, e.g. a relative risk of 4.8, CI 3.3—7.0, for recurrent depressive disorders at age 35—40. We find that depression is usually incident in DM1 patients. From these results one may speculate that DM1 patients develop depressions because of the burden of the disease and the psychological distress of maintaining a good level of glycemic control. Depression in diabetic patients in general, DM1 and

Indeed it is remarkable that depression and overweight as diabetic comorbidities show nearly the same age and seX dependence. A possible biological mechanism is that obesity increases the risk of increased insulin resistance, which may induce alterations in the brain which in turn increase the risk of depression [28]. Of importance are also psychological pathways, since the perception of being overweight increases psychological distress [29]. Diabetes has also been associated with the use of atypical neuroleptics in the treatment of schizophrenia [30]. The seX ratios for antipsy-chotics show a strong excess of female patients, see Fig. 2, which compares well with the female excess in the seX ratios for depression and schizophrenia. It is interesting to note that the comor-bidity relations with schizophrenia and schizo-affective disorders stand out as much weaker for DM1 than for DM2 patients, when compared to all other results of the comorbidity analysis.

While patients with thyroiditis, hypothyroidism, thyro-toxicosis, and obesity are predominantly female, disorders of the lipoprotein, purine, and pyrimidine metabolism tend to be found in males. Diabetic patients feature a two to three times higher increased risk of disorders of the thyroid gland, particularly those with autoimmune diabetes, a comorbidity relation that is strongly influenced by gender [31]. For volume depletion and disorders of fluid, electrolyte and acid-base balance there appears to be an age switch, from an excess of male patients for ages 20—40 to an excess of females in older age. Primary hypertension is a comorbidity with relative risks of 5.3 (CI 4.8—5.9) for DM1 and 9.5 (CI 8.8—10) for DM2. These switches may indicate an important impact of sexual hormones and of potential pregnancies but may also point to social factors related to sex-specific phases of life. The prescriptions of beta and calcium channel blocker, as well as ACE inhibitor show a sex-dependence very similar to hypertension, suggesting that these drugs are commonly used to treat hypertension, see Fig. 2. There is a strong excess of female patients in the prescriptions of diuretics, especially in elderly patients, whereas there is a strong excess of younger males being prescribed statins or other lipid modifying agents, the latter matching the sex ratio observed for hypercho-lesterolemia and hyperlipidemia. Note that our results make no statements about the combinations of antihypertensive drugs which are actually used in the treatment of individual patients.

Bacterial and viral infections (gastroenteritis, erysipelas, pneumonia, osteomyelitis, hepatitis, dermatophytosis, candidiasis) show an excess of male patients with the exception of gastroenteritis and candidiasis, which are dominated by female patients. We find an excess of sepsis comorbidity which is strongest in male DM1 patients at the age around 50, with higher relative risks for DM1 (12, CI 82—18) than DM2 (2.7, CI 2.4—2.9).

The increased risk for epilepsy (4.6, CI 3.1—6.9, for DM1 and 1.6, CI 1.4—1.7, for DM2) in young type 1 diabetics [32] may be linked to ketoacidosis as a two times higher risk of epilepsy was found in children and adolescents with metabolic acidosis [33]. A four times greater risk of DM1 was also described in young adults with epilepsy [34]. Both metabolic extremes, hypoglycemia and diabetic ketoacidosis, relate to EEG abnormalities in diabetic children which may increase risk of epilepsy.

The Framingham heart study reported that diabetic women are more vulnerable to congestive heart failure (CHF, RR of 5.2, CI 4.7—5.9, for DM1, 3.8, CI 3.6—3.9, for DM2) than men [35]. However, subsequent cohort studies found no such sex differences [35, 36]. We find an excess of male patients and that diabetes is typically detected before CHF in females with DM1 and in males with DM2.

Sleep disorders are comorbid in DM1 (1.9, CI 1.5—2.4) and DM2 patients (2.3, CI 2.1—2.6). We find support for sex specific progression routes. It is known that DM2 and obstructive sleep apnea (OSA) present a vicious circle, With OSA exerting adverse effects on glucose metabolism and thereby increasing the risk for DMZ [37]. In patients With already existing DMZ, on the other hand, there is a significant relationship between sleep-disordered breathing (SDB) and insulin resistance independent of obesity [38]. The fact that there is an eXcess of male patients in the comorbidity relation may be related to the higher prevalence of central adiposity and therefore OSA in men [37].

There are higher relative risks for DM1 patients (8.6, CI 56—13) than DM2 patients (2.5, CI 2.1—2.8). The risks peak in the age range 50—70 with a balanced sex ratio. It has been shown that diabetic patients are at increased risk of pancreatic cancer with a pooled RR of approximately two compared to non-diabetics in a meta-analysis [21] with at least one year diabetes duration prior to diagnosis of pancreatic cancer [39]. Diabetes also leads the diagnosis of pancreatic and lung cancer.

Nicotine dependence (3.3, CI 2.7—4.1, for DM1 and 2.8, CI 2.6—3.0, for DM2) and alcohol related disorders dependence (2.3, CI 1.7—3.2 and 2.1, CI 1.9—2.4) are comorbidities with relative risks peaking at ages 30—45, dominated by male patients. Alcoholic liver disease dependence (4.0, CI 2.7—5.7 and 2.6, CI 23—29) is also a male-dominated comorbidity. Toxic liver disease (2.8, CI 1.6—4.9 and 14, CI 85—23) and fibrosis and cirrhosis of liver (5.0, CI 3.7—6.6 and 2.4, CI 2.2—2.7) show also an excess of male patients. There tend to be higher risks for DM1 than DM2 patients, potentially outlining greater impact of chronic hyperglycemia than of overweight-related parameters of the metabolic syndrome. The relationship between alcohol consumption and DM2 has been shown to be dosage dependent. While moderate alcohol consumption is protective, dosages of more than 60g/day increase diabetes risk [40]. It is not possible to establish an alcohol-dosage dependent diabetes risk from our data.

Identified comorbid diseases of the circulatory system include is-chemic and pulmonary heart disease, cardiomyopathy, valvular disorders, tachycardia, as well as cerebrovascular diseases and diseases of the arteries and veins [2, 4, 41]. Comorbid diseases of the circulatory system show a consistent excess of male patients, including ischemic, pulmonary, and other heart diseases (cardiomyopathy, valvular disorders, tachycardia), as well as cerebrovascular diseases and diseases of the arteries and veins. The highest relative risks among cardiovascular diseases are found for acute ischemic heart diseases for DM1 patients (6.6, CI 5.2—8.3, compared to 3.1, CI 2.8—3.4, for DM2 patients) at ages higher than 60.

Pneumonia and acute bronchitis show increased relative risks for older ages (e.g. for pneumonia 2.7, CI 2.4—3.0, for DM1, 2.3, CI 2.1—2.4, for DM2). Chronic obstructive pulmonary disease (COPD) is led by diabetes (2.9, CI 2.5—3.5 and 2.2, CI 2.1—2.3). Diabetes is often identified as independent risk factor for lower respiratory tract infections [42]. Individuals with COPD are substantially more likely to have preexisting DM [43] , on the other hand lung function impairment in COPD is a risk factor for developing diabetes and insulin resistance [44]. Benign pleural effusion (3.4, CI 2.1—5.6 and 3.1, CI 2.5—3.9), representing a symptom of various underlying diseases, is dominated by males. In diabetic patients pleural effusion may be related to left ventricular dysfunction as described previously [45].

Iron-deficiency and anemia in chronic diseases show higher relative risks for DM1 (3.7, CI 3.0—4.6, and 6.3, CI 4.9—8.1) than DM2 (2.7, CI 2.4—2.9 and 2.8, CI 2.5—3.2) patients. Cataracts, retinal detachments, glaucoma, disorders of the vitreous body, and blindness are identified here with relative risks up to 200. The higher relative risks for DM1 compared to DM2 patients for retinopathies [7] at older age suggest a higher lifespan for type 1 diabetics. Chronic and acute kidney diseases, the nephrotic syndrome, and glomerular disorders are identified as comorbidities with an excess of male patients; relative risks range up to 128 for DM1 patients and 8.6 for DMZ. There is an excess of female patients in the age range 20—40. Intestinal malabsorption (including celiac disease) shows elevated risks for ages 10—25 for DM1 (10, CI 6.3—17) with a weak female excess; there are no significant results for DMZ. Cholelithiasis is a female dominated comorbidity (1.7, CI 1.5—Z.O and 1.5, CI 1.4—1.6). Chole-cystitis is typically followed by DMZ in males. Pressure and non-pressure ulcers exhibit higher risks for DM1 (7.2, CI 5.2—9.9, and 7.4, CI 5.8—9.4) than DMZ patients (2.2, CI 2.0—2.4 and 4.2, CI 3.9—4.6). For males there are increased risks for disorders of prepuce (6.0, CI 35—10 and 3.1, CI 2.5—3.8), while for females there is increased risk for disorders of the urinary system (2.5, CI 2.2—2.8 and 1.8, CI 1.7—1.9). Evidence from epidemiological studies suggests that asymptomatic bacteriuria and symptomatic urinary tract infections occur more commonly in women with DM compared to non-diabetic controls [41, 46]. Increased prevalence of urinary incontinence and urge incontinence among women with DMZ [47, 48] has been reported.

Only persons with inpatient stays were included in the study. To test if this pre-selection introduces a bias in our results, we repeated the study with a sample of all patients having been prescribed at least once a drug used in diabetes (ATC code starting with ‘A10’) in 2006 or 2007. We compare the frequencies of their diseases with those in the rest of the population, roughly 8.3 million patients. This assumes that DM patients with no hospital stay in the study period have no diagnosis and therefore no comorbidities. Although this is a highly incorrect assumption, it serves as a conservative test-assumption, which allows to test if the comor-bidities are simply significant as a consequence of our limited sample that contains only inpatients. Results are shown in the supplement in SI Fig. In the enlarged sample only one out of the 123 comorbidities using the inpatient sample has a p-value greater than 0.05 (M23), all other remain significant (p<0.05). Significance of comorbidity in the inpatient sample is therefore highly representative of comorbidity in the entire population. However, our approach might miss diabetic comorbidities that are typically not related to hospitalizations and that are most prevalent in younger patients, where the inpatient sample contains a lesser amount of the entire population, compare Fig. 1(a). Unknown preexisting conditions may also affect the observed temporal order of the diseases, which has been addressed by applying a series of corrections to the lead/lag indicators, equations (Z) and (3). Other limitations relate to the coding quality of disorders in the medical claims data, which has been shown to lead to an underreporting of comorbidities [49] and may cause false negatives in our testing procedure.

For the first time we develop a standardized testing procedure to obtain a complete comorbidity profile for DM1 and DMZ using medical claims data. This analysis is equivalent to 39 938 individual tests, each with the maximum number of patients available in a country. We identified 123 highly significant disorders with increased or decreased risks, strongly depending on patient age and sex. The comorbidities are investigated by a lead/ lag analysis to inquire whether the relation between the diseases is more likely causal or consequential.

Diabetic comorbidi-ties are rule rather than exception and their treatment must address their high degree of age and sex dependence. Despite being a risk factor for certain diabetic complications, sex may also influence and to a certain degree even determine the mechanisms underlying the disease progressions. Our results may be of immediate use to improve screening practices and therapy of diabetic patients to increase their quality of life and potentially contribute to longer life expectancy due to early detection and treatment of important comorbidities. In particular we propose to screen and, where applicable, treat diabetes patients for comorbid depressions, since this allows a more efficient treatment of diabetes itself. Depressive patients should be screened for diabetes to detect it at an early stage and perform lifestyle interventions that focus on weight control. It is also important to treat depressive patients with drugs that have a minimum of side effects on weight gain, and lipid and glucose metabolism. Our results emphasize that physicians must be aware of nontraditional diabetic comorbidities and risk factors during anam-nesis and that, for example, screening for diabetes may be appropriate in patients with cardiovascular diseases, CHF, or fatty liver, whereas diabetes patients should be screened for pancreatic cancer.

The results from the inpatient sample are reproduced to large parts, only disorder M23 exhibits nonsignificant p-values.

ICD code and disease name for the 123 comorbidities identified in the co-occur-rence analysis. For the age groups with the smallest p-value the relative risks RR, patient ages, and the corresponding p-values are shown for DM1 and DM2, respectively. Where the patient sample was too small to apply the statistical tests missing values are shown. 81 Data. Comorbidity data for DMl patients, the relative risks RRI, the confidence intervals for RRI, if applicable the p-Value for the co-occurrence analysis, and the sex ratio for each diagnosis and age group. (CSV) 82 Data. Comorbidity data for DM2 patients, the relative risks RRZ, the confidence intervals for RRZ, if applicable the p-Value for the co-occurrence analysis, and the sex ratio for each diagnosis and age group. (CSV)

Performed the experiments: PK. Analyzed the data: PK. Contributed reagents/materials/ analysis tools: PK AKW AC ISF ST. Wrote the paper: PK AKW ISF ST.

Appears in 10 sentences as: age group (8) age groups (3)

In *Quantification of Diabetes Comorbidity Risks across Life Using Nation-Wide Big Claims Data*

- We test 1 051 possible comorbidities for 19 age groups for DM1 and DMZ, giving 39 938 tests.Page 3, “Data”
- This leads to a multiple hypothesis testing problem for each age group where 1 051 hypotheses are tested in parallel.Page 3, “Data”
- The sex ratio SR(x,t) is related to the quotient of the percentage of female and male diabetes patients in age group tthat also have diagnoses x or are prescribed a medication x. Denote the number of male (female) DM1 and DM2 patients in age group tby Dmog(t) and the number of male (female) diabetes patients Who also have diagnoses or medication x by me(x,t).Page 4, “Data”
- To assert the statistical significance of nonzero SR(x,t) values we build a contingency table for all diabetes patients of a given age group t. The table contains the two variables seX and co-occurrence with diagnosis/ medication x.Page 4, “Data”
- Each diagnosis where the null hypothesis of statistical independence with either DM1 or DMZ can be rejected with a given value of the false discovery rate in at least one of the age groups is identified as a comorbidity.Page 6, “Results/Discussion”
- The comorbidities are also listed in the supplement, SI Table, along with relative risks, p-values, and patient ages for the age group with the smallest p-values for DM1 and DMZ, respectively.Page 7, “Results/Discussion”
- While the relative risks for DM1 patients are highest in the age group 65—70 with values from 1.9—2.3 for these diseases, we find higher risks for DM2 patients at younger ages, e.g.Page 10, “Controversial comorbidity associations”
- For the age groups with the smallest p-value the relative risks RR, patient ages, and the corresponding p-values are shown for DM1 and DM2, respectively.Page 14, “Supporting Information”
- Comorbidity data for DMl patients, the relative risks RRI, the confidence intervals for RRI, if applicable the p-Value for the co-occurrence analysis, and the sex ratio for each diagnosis and age group .Page 14, “Supporting Information”
- Comorbidity data for DM2 patients, the relative risks RRZ, the confidence intervals for RRZ, if applicable the p-Value for the co-occurrence analysis, and the sex ratio for each diagnosis and age group .Page 14, “Supporting Information”

See all papers in *April 2015* that mention age group.

See all papers in *PLOS Comp. Biol.* that mention age group.

Back to top.

Appears in 7 sentences as: p-Value (2) p-value (5)

In *Quantification of Diabetes Comorbidity Risks across Life Using Nation-Wide Big Claims Data*

- If the null hypothesis of statistical independence of these two variables cannot be rejected in a chi-squared test using a p-value of p = 0.05 the seX ratio is set to zero, SR(x,t) = O. Lead/lag indicator.Page 4, “Data”
- The p-value for each lead and lag indicator is the probability of obtaining the observed values for I lead(d,-,x) and I lag(d,-,x) from the surrogate data.Page 5, “Data”
- Lead/lag behavior is identified for male and female DM1 and DMZ patients if the null hypothesis that the observed indicator values for I lead(di,x) and I lag(d,-,x) can be obtained from randomized surrogate data can be rejected with a p-value of p< 0.01.Page 7, “Results/Discussion”
- In the enlarged sample only one out of the 123 comorbidities using the inpatient sample has a p-value greater than 0.05 (M23), all other remain significant (p<0.05).Page 13, “Further comorbidities”
- For the age groups with the smallest p-value the relative risks RR, patient ages, and the corresponding p-values are shown for DM1 and DM2, respectively.Page 14, “Supporting Information”
- Comorbidity data for DMl patients, the relative risks RRI, the confidence intervals for RRI, if applicable the p-Value for the co-occurrence analysis, and the sex ratio for each diagnosis and age group.Page 14, “Supporting Information”
- Comorbidity data for DM2 patients, the relative risks RRZ, the confidence intervals for RRZ, if applicable the p-Value for the co-occurrence analysis, and the sex ratio for each diagnosis and age group.Page 14, “Supporting Information”

See all papers in *April 2015* that mention p-value.

See all papers in *PLOS Comp. Biol.* that mention p-value.

Back to top.

Appears in 5 sentences as: discovery rate (5)

In *Quantification of Diabetes Comorbidity Risks across Life Using Nation-Wide Big Claims Data*

- To correct for these multiple comparisons we apply the Benjamini-Hoch-berg procedure [17] to control for the false discovery rate a.Page 3, “Data”
- For example, if 100 comorbidities are identified with a false discovery rate a of a = 0.01, the eXpected number of false positives among these comorbidities is one.Page 3, “Data”
- We Will therefore be interested in the recall R(a) as a function of the false discovery rate a. R(a) is the probability that a diabetic comorbidity listed in Table 1 is also identified by our co-occurrence analysis at a given level of a.Page 4, “Data”
- Each diagnosis where the null hypothesis of statistical independence with either DM1 or DMZ can be rejected with a given value of the false discovery rate in at least one of the age groups is identified as a comorbidity.Page 6, “Results/Discussion”
- A false discovery rate of a = 0.001 gives a list of 75 significant comorbidities and a recall of R(a = 0.001) = 0.59.Page 6, “Results/Discussion”

See all papers in *April 2015* that mention discovery rate.

See all papers in *PLOS Comp. Biol.* that mention discovery rate.

Back to top.

Appears in 5 sentences as: false discovery rate (5)

In *Quantification of Diabetes Comorbidity Risks across Life Using Nation-Wide Big Claims Data*

- To correct for these multiple comparisons we apply the Benjamini-Hoch-berg procedure [17] to control for the false discovery rate a.Page 3, “Data”
- For example, if 100 comorbidities are identified with a false discovery rate a of a = 0.01, the eXpected number of false positives among these comorbidities is one.Page 3, “Data”
- We Will therefore be interested in the recall R(a) as a function of the false discovery rate a. R(a) is the probability that a diabetic comorbidity listed in Table 1 is also identified by our co-occurrence analysis at a given level of a.Page 4, “Data”
- Each diagnosis where the null hypothesis of statistical independence with either DM1 or DMZ can be rejected with a given value of the false discovery rate in at least one of the age groups is identified as a comorbidity.Page 6, “Results/Discussion”
- A false discovery rate of a = 0.001 gives a list of 75 significant comorbidities and a recall of R(a = 0.001) = 0.59.Page 6, “Results/Discussion”

See all papers in *April 2015* that mention false discovery rate.

See all papers in *PLOS Comp. Biol.* that mention false discovery rate.

Back to top.

Appears in 3 sentences as: confidence interval (1) confidence intervals (2)

In *Quantification of Diabetes Comorbidity Risks across Life Using Nation-Wide Big Claims Data*

- In the following we refer to these values whenever referring to the relative risks of a diagnosis with a 95% confidence interval (CI).Page 7, “Results/Discussion”
- Comorbidity data for DMl patients, the relative risks RRI, the confidence intervals for RRI, if applicable the p-Value for the co-occurrence analysis, and the sex ratio for each diagnosis and age group.Page 14, “Supporting Information”
- Comorbidity data for DM2 patients, the relative risks RRZ, the confidence intervals for RRZ, if applicable the p-Value for the co-occurrence analysis, and the sex ratio for each diagnosis and age group.Page 14, “Supporting Information”

See all papers in *April 2015* that mention confidence intervals.

See all papers in *PLOS Comp. Biol.* that mention confidence intervals.

Back to top.