Genome-wide characterization of 54 urinary metabolites reveals molecular impact of kidney function

0
Genome-wide characterization of 54 urinary metabolites reveals molecular impact of kidney function

Genome-wide association study identified 54 associations with urinary metabolites

We performed GWAS of 54 urinary metabolites in 3 cohorts followed by a meta-analysis (Methods). The analysis included in total of 8011 individuals from the Finnish Diabetic Nephropathy Study (FinnDiane, n = 3244, including 1632 men and 1612 women)9,10, Generation Scotland (GS, n = 2743, including 1373 men and 1370 women)11, and the VIKING study (VIKING, n = 2024, including 811 men and 1213 women)12 (Fig. 1, Supplementary Table 1). We measured 54 urinary metabolites with the Nightingale Health urine NMR platform (Supplementary Data 1 and 2). The metabolites were quantified in absolute concentrations and normalized with urinary creatinine concentration prior to analysis.

Altogether 27 of the 54 metabolites showed significant evidence of heritability ranging from 6% to 36% (p < 0.05), with the highest estimates obtained for urinary citrate (36%, p = 2.3  ×  10−20), 3-aminoisobutyrate (33%, p = 1.7 × 10−11), and tyrosine (29%, p = 4.9 × 10−12) concentrations (Supplementary Data 3, Supplementary Fig. 1, Methods). Only six metabolites showed evidence of between study heterogeneity in the heritability estimates (p < 0.05/54 = 9 × 10−4), but none of the 27 metabolites with significant heritability estimates, supporting that there are no major differences in the genetic architecture of these metabolites between the studies.

The GWAS meta-analysis results were filtered to variants found in at least 2 out of 3 cohorts and with a minor allele frequency (MAF) ≥ 1%. We identified 26 chromosomal regions harbouring associations with at least one metabolite amongst the 54 metabolites meta-analysed across the three cohorts (p < 9.3×10−10; Fig. 2, Methods). In total, the regions contained 34 associations with 19 unique urinary metabolites and three of the 26 regions showed evidence of pleiotropy: the loci on chromosomes 5p15.33 and 17q12 associated with 5 and 4 amino acids respectively, and a locus on chromosome 7p21.1 associated with quinic acid and trigonelline. The GWAS results for glucose were spurious and hard to interpret, and are therefore not reported.

Fig. 2: Manhattan plot of signals with p < 5.0×10−5 for the metabolites.
figure 2

Signals from different metabolites are clumped together if they are within 50 kb from another signal. Pruned genome-wide significant signals with p < 5 × 10−8/54 = 9.3 × 10−10 and variants 1 Mb around them are highlighted with two alternating colours. Note: y-axis clipped at 60. P-values calculated with METAL applying inverse variance weighted method and genomic control correction for the individual study level results. All p-values are two-sided.

We performed conditional and joint (COJO) multiple-SNV analysis13 to pinpoint independent signals within these loci and found 54 significant associations (p < 9.3 × 10−10) (Supplementary Fig. 2 and Supplementary Data 4, Methods). In total, 6 metabolites had multiple signals in the same locus, notably, a region on chromosome 5p13.2 in or near the AGXT2 gene had 15 associations with 3-aminoisobutyrate. Nine of the 54 lead COJO signals showed evidence of between-study heterogeneity. These included 5 secondary signals for the 3-aminoisobutyrate locus on chromosome 5p13.2, with the strongest associations obtained from the FinnDiane study with individuals with T1D. Of note, for four of these secondary signals, the COJO estimate – conditional on other signals within the same locus – was in the opposite direction to the simple single-variant meta-analysis estimate, emphasizing the complexity of this locus with 15 detected independent signals (Supplementary Fig. 3). Furthermore, the glycine association in the GM2A gene showed a trend in the opposite direction in the FinnDiane study compared to the two general population studies.

Of the 54 COJO associations, 33 signals were not reported in the GWAS Catalogue for urinary or blood metabolite traits (“novel signals”, Table 1), whereas 21 associations were previously reported for relevant urinary and/or blood metabolite traits (Supplementary Table 2). These novel signals included 11 associations with 3-aminoisobutyrate, 4 associations with glycine, 2 associations with 3-hydroxyisovalerate, 4-deoxyeryhronic acid, threonine, and xylose, and finally, single associations with 3-hydroxyisobutyrate, 4-deoxythreonate, citrate, ethanolamine, formate, propylene glycol, quinic acid, trigonelline, tryptophan, and tyrosine (Table 1).

Table 1 Variants associated with urinary metabolites (two-sided p < 9.3×10−10) with no previously reported associations in the GWAS catalogue with the same metabolite in blood or urine (window size = -/+500 kb, r2 > 0.8, and p < 5×10−8)

We tested replication of the 33 novel signals for 16 metabolites based on two previous urinary metabolite studies4,7. We found evidence of replication (p < 0.05/24 = 0.0021 and concordant effect direction) for 16 of the 24 signals that were present in the replication data (Supplementary Data 5). Of note, 9 of the associations had a genome-wide significant p-value (p < 5 × 10−8) for replication, even though they were not reported in the GWAS catalogue; these were either due to i) a more stringent significance threshold required in the original study (e.g., rs2472479 3-hydroxyisobutyrate signal from Schlosser et al. 2023 study) ii) some of the older findings not being reported in the GWAS catalogue (e.g., rs62313082 associated with ethanolamine in Raffler et al. 2015); or iii) many of the secondary independent signals having been filtered out from the GWAS catalogue results (e.g. multiple of the 3-aminoisobutyrate signals on chromosome 5 and replicated in Schlosser et al. 2023).

As we analysed urinary metabolite to creatinine ratios instead of pure metabolite concentrations in order to account for the variation in the urinary concentration and volume, we additionally tested whether the lead metabolite signals would be driven by the denominator (urinary creatinine) instead of the actual metabolite level (nominator). However, none of the COJO lead variants were significantly associated with urinary creatinine (p > 0.05/54 for all; Supplementary Fig. 4), supporting that the observed signals are mostly driven by the urinary metabolite levels.

Five of the 54 lead variants were missense variants, two representing previously unknown metabolite associations. rs11567842 (SLC13A2 p.Ile599Leu) was associated with urinary citrate concentration and has previously been associated with blood urea nitrogen concentration14. SLC13A2 encodes Solute Carrier Family 13 Member 2, which is a kidney sodium-coupled citrate transporter15. We have previously shown that urinary citrate concentration is associated with progression of DKD2. The citrate-associated rs11567842 was nominally associated with multiple DKD phenotypes (p = 0.03-0.001)16, although these genetic associations did not remain after correction for multiple testing. At a locus on chromosome 5p13.2, including 15 independent signals associated with 3-aminoisobutyrate, three of the lead variants were missense variants but in three different genes: rs37369 (AGXT2 p.Val140Ile), rs2308957 (RAD1 p.Gly114Asp), and rs138373837 (NADK2 p.Arg211His). The rs37369 (AGXT2 p.Val140Ile) variant has previously been associated with urinary and plasma 3-aminoisobutyrate levels17,18 whereas rs2308957 (RAD1 p.Gly114Asp) and rs138373837 (NADK2 p.Arg211His) had not. Other lead variants in the region were eQTLs for AGXT2 or both AGXT2 and RAD1 in the kidney. AGXT2 encodes a mitochondrial alanine–glyoxylate aminotransferase 2, expressed particularly in kidney and liver in the human protein atlas, and is the biologically most plausible gene underlying the association signal, as it converts (R)-3-aminoisobutyric acid and pyruvate to 2-methyl-3-oxopropanoate and alanine (Reactome ID: R-HSA-909780.2). In single-cell RNAseq of human kidneys, the gene is expressed specifically in the proximal convoluted tubules (Supplementary Fig. 5)19. Finally, rs1047891 (CPS1 p.Thr1406Asn) has previously been associated with 266 traits including the currently observed glycine association in plasma20, as well as with estimated glomerular filtration rate (eGFR)21.

Of note, all the identified missense variants were predicted to be tolerated or benign by SIFT and PolyPhen.2 algorithms, but they may be sufficient to cause subtle changes in the protein function seen as altered urinary metabolite concentrations.

eQTL data in kidney and whole blood highlights membrane transport proteins

Considering that most of the identified variants were non-coding, we next used available gene expression quantitative trait loci (eQTLs) data to help assess whether the identified variants influence the expression of nearby genes (Methods). As the kidneys play an important role in urine production and filtration, we first investigated the regulatory effect of the identified variants on gene expression in kidneys. We did this by querying our lead variants in Human Kidney eQTL Atlas22, containing kidney eQTL data from 686 micro-dissected human kidney samples (whole kidney, and dissected into tubular and glomerular tissues). Half of the identified variants for urinary metabolites (26 of 54) were cis-eQTLs, i.e., associated with gene expression of a nearby gene (1 Mb), i.e., an eGene, in either kidney tubules, glomeruli, and/or whole kidney (p < 5.3 × 10−4, Table 2, Supplementary Data 6) influencing the expression of 30 genes. The kidney eQTL data provided a likely candidate, for e.g., for rs62313082 (upstream of the RCC2P8 gene) associated with urinary ethanolamine. This variant was a kidney eQTL for ETNPPL (p = 1.2 × 10−34, Table 2), which catalyses breakdown of phosphoethanolamine23, and thus, represents a plausible gene underlying the metabolite association.

Table 2 Expression quantitative trait loci (eQTL, two-sided p < 5.3 × 10−4) target genes in kidney (K), glomeruli (G) and tubule (T) for the COJO lead signals, and target genes of whole blood eQTL signals colocalizing (PP > 0.5) with COJO lead signals

We further extended the eQTL look-ups to eQTLGen whole blood data24, allowing us to also investigate the colocalization of the eQTL association with the metabolite association (Methods). Although the eQTLGen data may miss eQTLs for genes expressed only in specific tissues, such as kidneys, the larger sample size of eQTLgen (n = 31,684) also enables the detection of weaker signals for general eQTL associations. In total, 25 of the 54 lead variants that associated with the urinary metabolites were cis-eQTLs influencing the expression of 106 nearby genes in whole blood (p < 5.5 ×10−5; Supplementary Data 6). Eleven of these blood eQTLs were also kidney eQTLs for the same genes; e.g. the 3-hydroxyisobutyrate- associated variant (rs2472479) located upstream of NIPSNAP3B was both a kidney eQTL and a blood eQTL for NIPSNAP3A, encoding NipSnap homolog proteins 3B and 3A with putative roles in vesicular transport25. However, we found that only one third of the eQTLs associations colocalized with our urinary metabolite-associations (34 of 106, posterior probability (PP) > 0.5); 24 of these resulted from eQTL signals for 6 genes colocalizing with 4 amino acid signals in a single region on chromosome 17q12 (Table 2).

Altogether nine eQTLs that colocalized with the metabolite-association were not detected in the kidney eQTL datasets. These included the variant rs62565993, a strong eQTL for GLDC in whole blood (p = 2.4×10−79), colocalizing with the glycine-association (PP = 0.98). Given that GLDC encodes glycine decarboxylase, a component of the glycine cleavage system catalysing the degradation of glycine26, it is a plausible underlying gene for the urinary glycine association. However, for the 1Methylnicotinamide-associated variant rs17322446, the closest gene (ACMSD) may be more plausible due to shared pathways with 1Methylnicotinamide27, although kidney eQTL and blood eQTL colocalizations indicated other genes (Table 2).

Many of the eQTL-associated genes, particularly in kidneys, were members of the SLC (Solute Carrier) family, which encode molecules facilitating the transport of metabolites across cell membranes (Table 2). For example, rs10788884, which represents a new association for xylose, was a strong kidney eQTL for SLC5A9 (p = 10-52), encoding a sodium-dependent glucose transporter (SGLT4) expressed in the intestine and the kidneys28. Although it primarily functions in the transport of mannose, 1,5-anhydro-D-glucitol, and fructose28 it is plausible that it could also be involved in the transport of other sugars like xylose. In addition, two previously unreported urinary tyrosine- and tryptophan-associated variants (rs7704882 and rs7704058, in full LD in the European population: r2 = 1) were strong kidney eQTLs (Table 2) for SLC6A18. However, the association signal for urinary tyrosine around rs7704882 showed evidence of colocalization with the SLC6A19 kidney eQTL signal (Fig. 3 and Supplementary Fig. 6), which indeed encodes a known neutral amino acid transporter for e.g., tyrosine, whereas the nearby paralogue SLC6A18 seems to be more specific for glycine29. Of note, tyrosine was one of the amino acids associated with progression of DKD to kidney failure in our previous observational study2. SLC19A19 was also indicated as the target eGene for the already reported metabolite4,6,30 and eGFR-associated31 variant, rs11133665 (Table 3), located 37 kb away. The other solute carrier genes revealed by the eQTL data were the SLC16A10 gene (expression affected by urinary tyrosine-associated rs241768) encoding a known transporter of tyrosine and other metabolites32. The urinary 3-aminoisobutyrate-associated rs2080403 was a kidney and blood eQTL for SLC6A13, which it also colocalized with (PP = 0.82; Table 2; Supplementary Fig. 7) suggesting 3-aminoisobutyrate as an unknown substrate of SLC6A13 as previously hypothesized4,33. In addition, the three variants that associated with glycine on chromosome 5q33.1 included two intronic variants, rs61067578 and rs147000073, in the SLC36A2 gene encoding a proton-coupled amino acid transporter involved in the reabsorption of small amino acids such as glycine, proline, and alanine in the proximal tubules of the kidneys34.

Fig. 3: Regional association with tyrosine for lead variants rs11133665 and rs7704882 on chromosome 5.
figure 3

Upper panel shows LocusZoom plot centred around the previously known rs11133665 variant, and the previously unreported signal at rs7704882 independently associated with tyrosine. P-values were calculated with METAL applying inverse variance weighted method and genomic control correction for the individual study level results. The middle panel shows kidney eQTL associations for SLC6A18 and SLC6A19 overlaid on top of the tyrosine association signals, highlighting lead variants rs11133665 (eQTL for SLC6A19 in kidney) and rs7704882 (eQTL for SLC6A18 in the kidney)22. The lower panel contains the protein-coding genes in the region with exons highlighted. The r2 values with rs11133665 are calculated based on 1000 Genomes phase 3 European population. Variants with no r2 information are not shown. P-values from two-sided tests.

Table 3 Metabolite lead variant associations with eGFR and CKD in the CKDGen meta-analysis31, and DKD phenotypes in DNCRI-SUMMIT16 meta-analysis

Gene set, pathway and tissue enrichment analyses

To gain insight into the relevant tissues and molecular pathways underlying the urinary metabolite concentrations, we performed two different types of gene set enrichment analyses (Methods). As a first approach, we used MAGMA gene set analysis that first annotates all variants, without any p-value threshold, to the underlying or flanking genes and evaluates the gene-level significance. MAGMA tissue expression analysis identified a positive relationship between the highly expressed genes in adipose tissue and cis-Aconitate genetic associations (p = 6.9 × 10−5); as well as between kidney and glycine (p = 1.5 × 10−4) and pituitary gland and pyroglutamate (Supplementary Table 3); confirmatory with some prior studies.

MAGMA gene set enrichment analysis identified nine significantly enriched gene sets (p < 3 × 10−6, Supplementary Data 7). After the strongest enrichment between tyrosine and the positional chr5p15 breast cancer locus, the second strongest enrichment was obtained between threonine and “tachykinin receptors bind tachykinins” pathway (p = 1.1 × 10−7). Of note, the five tachykinin and their receptor genes are all located in different chromosomes, thus representing a true genome-wide enrichment. Tachykinins are neuropeptides derived from alternate processing of the three tachykinin genes. They are expressed throughout the nervous and immunological system, participate in a variety of physiological processes, and contribute to multiple disease processes, including acute and chronic inflammation and pain, fibrosis, affective and addictive disorders, functional disorders of the intestine and urinary bladder, infection, and cancer35. Other significant gene sets included enrichment between 4-deoxyerythronic acid and pyruvate family amino acid metabolic process genes, 3-hydroxyisovalerate and uronic acid metabolic process genes, glycolic acid and eukaryotic translation initiation factor 3 complex proteins, tryptophan and genes involved in APC/C:Cdc20 mediated degradation of Cyclin B, and 4-deoxythreonate and genes involved in neuron intrinsic apoptotic signaling pathway in response to oxidative stress.

While MAGMA uses a straightforward approach to annotate the associated variants to the underlying or flanking genes, the gene affected by the GWAS signal is not always the closest one, and we therefore also utilized the FUMA gene set enrichment analysis as a complementary approach. We included only variants reaching a p < 1 × 10−5, but utilised eQTL associations in addition to the positional mapping of variants to genes. Altogether 26 of the 54 metabolites were found to have significant (p < 0.05) associations with gene set pathways following FUMA analysis (Supplementary Data 8). The metabolites 3-aminoisobutyrate, histidine, threonine, tryptophan, tyrosine and valine were associated with eGFR, and 4-deoxyerythronic acid was associated with urate concentrations, suggesting a role in kidney health.

Breast cancer was found to be most abundant with related pathways significantly associated with 12 of the metabolites, followed by asthma-related pathways which were significantly associated with 11 metabolites. Of note, the cancer gene sets typically represent single chromosomal loci with a gene cluster, rather than genome-wide enrichment. The amino acid biomarkers histidine, threonine, tryptophan, tyrosine, and valine showed very similar results being significantly associated with the same pathways. These include inflammatory bowel disease, atrial fibrillation, rheumatoid arthritis, menopause, systemic lupus erythematosus and polycystic ovary syndrome. This is due to a GWAS hit on chromosome 17q12, which is present in all 5 amino acids and thus driving most of the associations with the gene set pathways.

Kidney health causally affects urinary metabolites

As urinary metabolites may reflect kidney health, we investigated whether the identified variants are also associated with kidney disease traits in the general population (CKDGen meta-analysis)31 and in individuals with diabetes (DNCRI-SUMMIT meta-analysis)16 (Methods). Indeed, seven of the variants were genome-wide significantly (p < 5 × 10−8) associated with eGFR, a main measure to monitor kidney health, and 4 variants with chronic kidney disease (CKD) (p < 3.6 × 10−4) in the general population. Four variants were also nominally (p < 0.05) associated with DKD or kidney failure in diabetes (Table 3).

As multiple urinary metabolite-associated variants were also associated with kidney disease traits we tested whether kidney health causally affects urinary metabolite concentrations. We performed two-sample Mendelian randomization analysis using two kidney function markers, eGFR and urinary albumin-creatinine ratio (UACR), as the exposures, and metabolite concentrations as the outcomes (Supplementary Table 4, Methods). MR assumes that the genetic variants (instrumental variables) are associated with the exposure, are not associated with any confounders of the exposure-outcome relationship and impact the outcome only through the exposure and not via independent pleiotropic pathways. To avoid violation of the relevance assumption underlying MR, we used a genetic instrument for eGFR, composed of 150 independent (r2 = 0.001) genome-wide significant SNVs identified in the CKDGen GWAS meta-analysis31. Genetically instrumented eGFR was associated (p < 4.7 × 10−4) with the urinary metabolite concentrations of 4 amino acids (alanine, glutamine, leucine, and valine), as well as 9 other metabolites: 2-hydroxyisobutyrate, 3-hydroxyisovalerate, ethanolamine, formate, glycine, glycolic acid, pseudouridine, pyroglutamate, and uracil (Supplementary Data 9, Supplementary Fig. 8). This suggests a causal association of glomerular filtration rate on these urinary metabolites. For all the metabolites, higher eGFR was associated with higher metabolite concentration in the urine. Next, we used complementary MR methods (median- and mode-based methods as well as MR Egger), with differing underlying assumptions, to assess the robustness of our findings with respect to pleiotropy. The causal estimates were directionally consistent across different MR analysis methods, suggesting no significant pleiotropy, for most of the metabolites. Moderate heterogeneity between the variant-specific causal estimates (30-32%, Supplementary Data 9), were detected for valine and alanine. However, the Egger intercept did not significantly deviate from zero (p > 0.05, Supplementary Data 9). Furthermore, eGFR remained associated (p < 0.05) with glycolic acid, 3-hydroxyisovalerate, and pseudouridine even with the MR Egger method which is more robust against pleiotropic effects. Finally, as some genetic loci associated with eGFR may reflect creatinine metabolism rather than kidney function, we performed sensitivity analyses by filtering the eGFR loci to those additionally associated (p < 0.05) with two alternative measurements of kidney function, namely cystatine C based eGFR21, and blood urea nitrogen31; all detected eGFR-to-metabolite associations remained significant in one or both of these filtered analyses, supporting that the observed causal pathways belong to the kidney function rather than relating to the creatinine metabolism (Supplementary Data 10).

As eGFR causally affected multiple metabolites we tested whether some of the metabolite GWAS associations might be driven by association with eGFR. However, adjusting the analysis with eGFR in the FinnDiane study did not markedly affect the effect size estimates for the 53 COJO lead variants included in the FinnDiane GWAS data set (Supplementary Fig. 9A).

Urinary metabolites potentially causally linked to kidney function and body mass index

We also performed two-sample Mendelian Randomization to test whether urinary metabolites are causal risk factors or reflect causal biological processes leading to CKD and other chronic diseases (Supplementary Table 4, Methods). The analysis suggested that higher urinary 3-hydroxyhippurate, quinic acid and trigonelline concentrations are causally associated with higher body mass index (BMI), higher eGFR (i.e., reflecting better kidney health), and contradictorily, also with higher UACR (i.e., reflecting worse kidney health; Table 4). All these three metabolites are found in coffee; their instrumental variables (IVs) rs2106727 and rs6968554 are located in the AHR locus, and are in strong LD with rs4410790 that has been previously associated with caffeine intake (p = 2.0 × 10−249)36, suggesting that the metabolite associations observed with Mendelian Randomization may reflect the underlying association with coffee consumption. Indeed, in line with our observation of the three metabolites associated with higher eGFR, a previous Mendelian Randomization study suggested that coffee consumption has a beneficial effect on kidney function and albuminuria37. Why the three urinary metabolites were simultaneously associated with higher UACR in our data remains unclear. For BMI, previous MR studies have found contradictory evidence regarding the causality between coffee consumption and BMI or obesity38,39. One challenge may be the reverse causality; in our study, BMI was not causally affecting the three metabolites, or any other urinary metabolite (Supplementary Data 11). In general, the urinary metabolites may provide a more exact estimate of the coffee intake than self-reported data on coffee consumption, and thus, our findings add to the previous evidence of a BMI-increasing effect on coffee consumption. However, we note that the AHR variants rs2106727 and rs6968554 are also associated with other traits such as blood lipid concentrations in the GWAS catalog (p < 5 × 10−8), indicating potential pleiotropic effects.

Table 4 Two sample Mendelian randomization analysis results with p < 0.05 / 468 = 1.1 × 10−4 using urinary metabolites as the exposures for outcomes from IEU GWAS database, CKDGen meta-GWAS, DNCRI meta-GWAS, and DIAMANTE meta-GWAS

In addition, the genetic instrument for urinary ethanolamine, was associated with higher eGFR (p = 6.1 × 10−8); of note, this association was not evident when using cystatin C based eGFR (p = 0.016) and BUN (p = 0.18) as alternative outcome measurements of kidney function (Supplementary Data 12). The genetic instrument was based on the rs62313082 variant which is also a kidney eQTL for the ETNPPL gene (Tables 2 and 4) but has not been associated with other traits in the GWAS catalogue (Supplementary Table 2). Moreover, the MR analysis suggested that the genetic instruments for urinary 1-methylnicotinamide (p = 2.3 × 10−5) and 4-deoxythreonate (p = 7.6 × 10−5) were associated with higher body mass index (BMI; Table 4). These were genetically instrumented by rs78470967 and rs181558 (4-deoxythreonate), and rs17322446 (1-methylnicotinamide), for which we found a few reported associations in the GWAS catalogue (rs17322446; Quinolinate levels, walking pace and FEV1, rs181558; type 2 diabetes and height). Of note, as these MR findings were based on only one or two genetic variants, complementary MR methods more robust to pleiotropy could not be implemented.

link

Leave a Reply

Your email address will not be published. Required fields are marked *