Associations between dietary patterns and gene expression pattern in peripheral blood mononuclear cells: a cross-sectional study

Background: Diet may alter gene expression in immune cells involved in cardio-metabolic disease susceptibility. However, we still lack a robust understanding of the association between diet and immune cell-related gene expression in humans. Objective: Our objective was to examine the associations between dietary patterns (DPs) and gene expression profiles in peripheral blood mononuclear cells (PBMCs) in a population of healthy, Norwegian adults. Methods: We used factor analysis to define a posteriori DPs from food frequency questionnaire-based dietary assessment data. In addition, we derived interpretable features from microarray-based gene expression data (13 967 transcripts) using two algorithms: CIBERSORT for estimation of cell subtype proportions, and weighted gene co-expression network analysis (WGCNA) for cluster discovery. Finally, we associated DPs with either CIBERSORT-predicted PBMC leukocyte distribution or WGCNA gene clusters using linear regression models. All analyses were gender-stratified (n = 130 women and 105 men). Results: We detected three DPs that broadly reflected Western, Vegetarian, and Low carb diets. CIBERSORT-predicted percentage of monocytes associated strongly and negatively with the Vegetarian DP in both women and men. For women, the Vegetarian DP associated most strongly with a large gene cluster consisting of 600 genes mainly involved in regulation of DNA transcription. For men, the Western DP inversely associated most strongly with a smaller cluster of 36 genes mainly involved in regulation of metabolic and inflammatory processes. In subsequent protein-protein interaction network analysis, the most important driver genes within these WGCNA gene clusters seemed to physically interact in biological networks. Conclusions: DPs may affect percentage monocytes and regulation of key biological processes within the PBMC pool. Although the present findings are exploratory, our analysis pipeline serves a useful framework for studying the association between diet and gene expression.


Introduction 29
Cardio-metabolic diseases are the main causes of death worldwide (1). They are mainly caused 30 by life-long exposure to classical risk factors such as obesity, hypertension, dyslipidemia and 31 dysglycemia (2). Diet affects these risk factors and thereby contributes to the rate of disease 32 progression (3). Diet can also influence gene expression in immune cells directly, and so 33 potentially affect cardio-metabolic disease susceptibility (4)(5)(6). However, we still lack a thorough 34 understanding of the association between diet and immune cell-related gene expression. 35 Free-living humans consume a variety of foods in combination. To capture this variation 36 meaningfully, we often define so-called dietary patterns (DPs). A posteriori DPs are data-driven; 37 they are defined based on the co-consumption of foods in the population under study (7). influence the transcriptome of the PBMCs (9). 49 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint 5 Many previous studies in humans that have associated diet with PBMC gene expression have 50 used a classical gene expression-wide association (gxWA) strategy (10,11). The underlying 51 correlation structure of the transcriptome, however, provides an opportunity to improve upon 52 gxWA methods. Biologically-relevant dimensionality reduction algorithms, such as CIBERSORT 53 and weighted gene co-expression network analysis (WGCNA), simplify whole-genome gene 54 expression matrices into interpretable features (12,13). These methods also increase the signal-55 to-noise ratio and thereby robustness of the features, while they simultaneously reduce the 56 multiple testing burden (14). 57 The objective of the present study was to examine the associations between a posteriori-58 defined DPs and derived gene expression features in PBMCs in a population of healthy, 59 Norwegian adults. We hypothesized that DPs would associate with PBMC gene expression, and 60 that the associations would point to specific biological mechanisms that potentially mediate the 61 effects of diet on cardio-metabolic diseases. 62 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Study design and participants 64
The present study is based on cross-sectional data from the screening visit of a randomized 65 controlled dietary intervention, presented in detail elsewhere (15). We included all participants 66 from whom we had both dietary assessment data and PBMC gene expression data, in addition 67 to standard clinical and biochemical measurements. After excluding four participants with self-68 reported energy intake above 25 MJ/d, we included 235 participants in the analyses (n = 130 69 women, n = 105 men). 70 The subject characteristics are presented in Table 1. Briefly, the men were younger than the 71 women, but had a more unhealthy body composition and subsequent clinical sequelae. Both 72 genders had moderate hypercholesterolemia. 73

Data types 74
We used a food-frequency questionnaire (FFQ) to assess habitual food intake from the 75 preceding year (16). From the originally 323 food items, we removed 41 items due to unclear 76 interpretation, and grouped the remaining 282 into 33 food groups, based on food category and 77 nutrient content ( Table 2). Self-reported intake of foods and nutrients are presented in 78 Supplementary Table 1, Supplementary Table 2 and Supplementary Table 3. Furthermore, we 79 collected PBMCs and extracted RNA according to standardized protocols, as previously 80 described (6). 81 See Supplemental Methods for an extended description of the data types. 82 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Statistical and bioinformatics analyses 83
Here we describe the statistical and bioinformatic analyses related to DPs, gene expression 84 clusters, and statistical modeling. All analyses were performed in R version 3.6.2 (17). We refer 85 to R packages and functions where appropriate, and using the following notation: 86 package::function. Important deviations from default function setting are written in 87 parentheses. 88 The flow of the analysis pipeline is outlined in Supplementary Figure 1. Women and men were 89 analyzed separately, as preliminary analyses suggested a strong gender-related signal in both 90 the DPs and gene expression dataset. 91

Dietary patterns 92
We used a combination of principal component analysis (PCA) and factor analysis to determine 93 DPs. Factor analysis is a dimensionality reduction method similar to PCA, but it results in more 94 interpretable features. However, because factor analysis is informed by the same covariance 95 matrix as PCA, we used PCA-derived component variances (stats::prcomp) to determine a 96 meaningful number of factors to retain; the results are presented in Supplementary Figure 2. 97 For both genders, the eigenvalue-one criterion suggested around 12 principal components 98 (PCs), but there was little change between components from component 3-5 and outwards; the 99 scree test suggested around 3-5 components; the per component variance explained suggested 100 that about 7-17 % of the variance could be explained until about three components, and then 101 stabilized at 4-5 % at 4-6 components, with little change thereafter. We decided to extract three 102 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Gene expression features 105
Two main mechanisms are central in studies of diet-related associations with cardio-metabolic 106 disease mechanisms in PBMCs: dietary effects on leukocyte subset distributions, and biological 107 modulation independent of leukocyte subset distribution. As a result, we performed analyses to 108 examine each of these aspects, as outlined in the upper right corner of Supplementary Figure 1. 109

Leukocyte subsets 110
We used CIBERSORT to perform in silico flow cytometry (13). This method uses support vector 111 regression to conduct robust deconvolution of a heterogenous cell population, and returns 112 predicted relative levels of various cell subsets. We used the raw, untransformed, whole-113 genome gene expression data matrix as input. Although the algorithm provides 22 leukocyte 114 subsets, we filtered on the top most relevant cell types for the PBMC population, mainly 115 monocyte and lymphocytes subsets, and thereby retained 12 cell subsets (Supplementary 116 Figure 3). Note that although we had standard blood cell differential counts available, 117 CIBERSORT resulted in a richer set PBMC cell subsets unique to the gene expression profile of 118 each sample. 119

Gene expression clusters 120
We used WGCNA to identify highly correlated ("co-expressed") clusters of genes (18). The 121 WGCNA package (CRAN, Bioconductor) provides a well-established and popular framework to 122 perform the WGCNA analysis (12). The details of the implementation can be found in (12); in 123 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint 9 Supplemental Methods we give a brief outline of key steps in the WGCNA-based gene 124 expression cluster analysis pipeline. 125 To both avoid confounding by sex chromosomes and to aid compatibility with the dietary data, 126 we performed the analysis separate for women and men. We examined the stability and validity 127 of the resulting gene expression clusters between genders with module preservation statistics 128 (19). Also, to avoid confounding by cell types, we adjusted for measured percentage monocytes 129 and lymphocytes (standard differential counts), and extracted the residuals. Therefore, input for 130 this analysis was the residuals for the complete gene expression matrix (p = 13967 variables)), 131 after removing the main effect of monocytes and lymphocytes. 132 In order to highlight a few of the more important genes within interesting clusters, we 133 performed a driver gene analysis. First, we calculated cluster membership, which is defined as 134 the absolute correlation between gene expression and cluster eigengene, and can be 135 interpreted as the degree to which each gene contributes to that cluster's overall behavior, and 136 contributes to its variation. Secondly, we calculated DP significance, which is the absolute 137 correlation between gene expression and DP score. A positive correlation between cluster 138 membership and DP significance indicates that those genes that drive the variation in the 139 cluster eigengene are the same that drive the association with the specific DP (driver genes). 140 Finally, to rank driver genes, driver gene estimates were calculated as the sum of the cluster 141 membership and DP significance. 142 In order to describe relevant gene expression clusters biologically, we performed gene ontology 143 (GO) enrichment using the GO Consortium database (20,21). Also, to link statistical findings with 144 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Linear models 147
We associated DPs with two types of outcomes: CIBERSORT-predicted cell counts, and the 148 eigengenes from the gene expression clusters using linear models (Supplementary Figure 1). 149 Supplementary Figure 4 shows the directed acyclic graphs (DAGs) used in model development. 150 We used the open-access dagitty.net/dags web-resource to evaluate these relationships. 151 Minimal sufficient adjustment sets for estimating the total effect of dietary pattern on gene 152 expression were age and education (three levels: lower, middle, higher). For predicted cell 153 counts, we additionally adjusted for adiposity (total fat mass, measured by bioelectrical 154 impedance analysis) in sensitivity analyses. Also, in sensitivity analyses for the gene expression 155 clusters, we estimated the direct effect (see Supplemental Methods). 156 Note that for all models, technical covariates were considered in upstream batch correction 157 (Supplementary Figure 1). Percentage of total leukocyte count of monocytes and lymphocytes 158 (which make up the pool of PBMC subsets) were adjusted for in the gene expression pre-159 processing pipeline (as shown in Supplementary Figure 1), prior to WGCNA only. Finally, to aid 160 interpretation of the results, we normalized (base::scale) both DP scores and cluster 161 eigengenes to a standard normal distribution (mean = 0, sd = 1) before modeling. 162 Miscellaneous 163 The dietary intervention study was powered to detect a significant change in LDL-C (15); 164 however, this does not apply to the present exploratory study. Because this study is exploratory, 165 All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint we did not evaluate associations by standard significance level cut-offs. Instead, we evaluated 166 the strength and direction of associations, and their interrelations. 167 All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Dietary patterns 170
First, we constructed gender-specific DPs from self-reported FFQ data, yielding three DPs 171 there were some overlap, the Vegetarian and Low carbohydrate DPs were more unique to each 175 gender compared to the Western DP. This was also supported by the DP loading for various 176 foods (Supplementary Figure 5). For both genders, the Western DP associated with intake of 177 meat and eggs, fastfood, snacks, dairy, and fiber-poor carbohydrate sources. The Vegetarian DP 178 associated positively with a number of foods perceived as healthy, including plant foods, whole 179 grains, nuts and seeds, and tea. Additionally, the association with animal products, fast food, 180 dairy, and fiber-poor carbohydrate foods was low or negative. For women, the association with 181 high-fat dairy and snacks was slightly positive. The Low carbohydrate DP was generally a mixture 182 of the two former, reflected in positive associations for both plants and animal products. The 183 association with fastfoods, snacks and carbohydrate-rich foods, however, was negative. Wine 184 associated positively, whereas sweet beverages associated negatively with the Low 185 carbohydrate DP for women and men, respectively. 186 In addition to the direct link with food intake, the DP scores correlated with both macronutrient 187 intake (Supplementary Figure 6) and clinical variables (Supplementary Figure 7). The Western 188 All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint 13 DP correlated with energy intake and negatively with fiber intake in both genders. The 189 Vegetarian DP correlated positively with fiber and negatively with saturated fat intake in men. 190 In women, the Vegetarian DP correlated weakly, but positively, with energy, healthy fats, fiber 191 and sugar. The Low carbohydrate DP was negatively associated with carbohydrate and sugar 192 intake in both genders, and with higher protein and fat intake in men. 193 For the clinical variables, the negative association between Western DP and age was most 194 notable, which indicates that the younger part of the study sample adhere to a more unhealthy 195 diet. Additionally, The Vegetarian DP associated negatively with multiple obesity-related 196 markers, including immune cells and CRP. Again, the Low carbohydrate DP was a mixture of the 197 two, with positive correlations for age and lipids. 198

Leukocyte subsets 199
We used the CIBERSORT algorithm to computationally estimate the distribution of 12 leukocyte 200 subsets (13). As expected, predicted leukocyte cell proportions associated with multiple clinical 201 variables, although most notably for the differential count measures and obesity-related 202 measures (Supplementary Figure 8). 203

Gene expression clusters 204
Using the WGCNA algorithm, we detected 45 and 37 unique gene expression clusters for 205 women and men, respectively, which by default were named different colors (12). Although 206 there were large differences in cluster size (range = 67-307 and 85-438 genes for women and 207 men, respectively), most clusters explained a large proportion of the variance of the genes they 208 comprised (range = 32-39 and 33-40 % for women and men, respectively) (Supplementary 209 Figure 9A and B, and Supplementary Table 5). For men, explained variance inversely associated 210 All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint with cluster size (Supplementary Figure 9C). In addition, genes in all clusters were generally 211 distributed over all chromosomes, with certain exceptions, such as chromosome 1 and 19 212 (Supplementary Figure 9D). The gene expression clusters displayed some correlation within 213 each gender, but they could largely be considered unique features (Supplementary Figure 10). 214 Between genders, the module preservation was acceptable for most medium-and large-sized

Dietary patterns and gene expression clusters 228
In general, relatively few associations were evident between DP scores and gene expression 229 cluster eigengenes (Figure 3). For women, the positive association between the Vegetarian DP 230 and the yellow cluster was strongest. The yellow cluster contained 600 genes involved in 231 All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint regulation of transcription (Supplementary Figure 16). For men, the Western DP associated with 232 multiple clusters, of which the association with the darkmagenta cluster was strongest. This 233 cluster contained 36 genes related to metabolic and inflammatory processes, including 234 sterol/cholesterol transport (Supplementary Figure 16). Similarly, both the pink and greenyellow 235 clusters associated negatively with the Western DP, although not as strongly as darkmagenta.  Table 6. 242

Identification of driver genes 243
Next, we examined the most relevant gene expression clusters more in detail, using a driver 244 gene analysis to identify genes with both high DP significance and high cluster membership. 245 Interestingly, DP significance and cluster membership associated strongly (Figure 4A and B, and 246 Supplementary Table 7), which suggests that genes that associated with DPs were also among 247 the most important parts of the clusters that associated with that DP. 248 The five top driver genes for the association between the Vegetarian DP and the yellow cluster 249 in women were GIMAP7 (GTPase, IMAP family member 7), ZNF200 (zinc finger protein 200), 250 LCMT2 (leucine carboxyl methyltransferase 2), GPR18 (G protein-coupled receptor 18), ASTE1 251 (asteroid homolog 1) ( Figure 4A). Proteins from these genes regulate aspects of biosynthetic 252 All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint 16 processes, including cell signaling, DNA transcription and repair, and protein synthesis (20,21). 253 For these genes, the correlation coefficients with DP2 score were in the range 0.19 to 0.26 ( = 254 0.03 -0.003), and with the cluster eigengene in the range 0.83 to 0.90 ( < 0.001) 255 (Supplementary Table 7). This means that women who consumed a Vegetarian DP tended to 256 have higher expression of these genes in PBMCs. 257 The five top driver genes for the association between the Western DP and darkmagenta cluster 258  Table 7). This 264 means that men who consumed a Western DP tended to have lower expression of these genes 265 in PBMCs. 266

Identification of hub proteins 267
Finally, to examine if these driver genes were part of physically interacting biological networks, 268 we filtered them through the PINA database (22). For the strongest associations for each 269 gender, we then created protein-protein interaction (PPI) networks (Figure 4C and D, and  270 Supplementary Table 8). These proteins can be considered hub proteins; they likely exert a 271 higher degree of control over the protein network, as more proteins physically interact with this 272 All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint hub in order to influence signaling pathways. For women and men, key hub proteins included 273 PPARGC1B (PPARG coactivator 1 beta) and UBC (ubiquitin C), respectively. 274 All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint Discussion 275 In the present study of 235 Norwegian adults, we detected novel associations between DPs and 276 gene expression features in PBMCs. Our results suggest that diet affects a number of specific 277 cell types and pathways, of which the most pronounced are: predicted proportion of 278 monocytes, regulation of transcription, and regulation of metabolic and inflammatory 279 processes. 280

We detected three DPs commonly consumed in Norway 281
Using data-driven analyses, we detected three DPs commonly consumed in Norway: Western-282 type, Vegetarian-type, and Low carbohydrate-type DP (Figure 1). These DPs were neither 283 unexpected nor surprising: Norwegian adults follow trends, and this includes the vegetarian and 284 low carbohydrate trends. In previous studies, similar names have been used to characterize the 285 detected DPs. In a cohort of Norwegian postmenopausal women, Markussen and co-workers 286 found four DPs, including the Western and Vegetarian DPs (23). In addition, they found a High-287 protein pattern that resembled our Low carbohydrate pattern. Their DPs, similar to ours, share 288 characteristics and therefore also names, with DPs throughout Europe and the US. This 289 emphasizes an important point: although the DPs retained in factor analyses are never exactly 290 equal, as opposed to a priori methods, our three DPs share characteristics with many other DPs 291 both in Norway and elsewhere (7,(23)(24)(25). 292 The three DPs associated with food items, nutrient intake and clinical parameters to give a 293 consistent picture of the DPs: in general, the Low carbohydrate DP appeared neutral compared 294 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint to the Western-type and Vegetarian-type DPs, which associated with a number of unhealthy or 295 healthy behaviors, respectively. 296

Vegetarian DP associated with monocytes 297
The Vegetarian DP associated with CIBERSORT-predicted levels of monocytes (Figure 2), 298 suggesting that gene expression related to monocyte differentiation and activity may be 299 affected by diet. These results are corroborated by previous reports by others and us (26)(27)(28)(29). 300 Craddock and co-workers recently reviewed the evidence that vegetarian diets affect 301 inflammatory and immune biomarkers, concluding that vegetarian diets associate with lower 302 CRP, fibrinogen, and total leukocyte concentrations (26). Similarly, Eichelmann showed that 303 plant-based diets cause reductions in obesity-related inflammatory biomarkers such as CRP, IL6 304 and sICAM (27). Indeed, our observed association between diet and monocyte level might be 305 related to the degree of obesity in the population; however, we found only a slight attenuation 306 of the association for women when adjusting for adiposity. In previous work, we have shown 307 that both diet and risk factors may affect PBMC leukocyte distribution (28,29). We found that 308 plasma omega 6 fatty acid level, as a marker of dietary intake of omega 6 fatty acids, associated 309 with predicted leukocyte distribution (28). Vegetarian diets tend to have high content of 310 vegetable oils, which may have affected our present results also. Similarly, we recently showed 311 that children with familial hypercholesterolemia displayed an altered leukocyte distribution 312 (29). 313 Most studies that examine the association between diet and immune cells use a modest 314 number of established biomarkers, such as standard differential count or protein biomarkers. In 315 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint 20 the current analysis, however, we used approximately 14 000 mRNA transcripts from PBMCs, 316 potentially making it a more sensitive test of associations with immune cell type distribution 317 specifically, and inflammation in general (13). Additionally, our finding is important since it adds 318 to the evidence that cell type distribution in cell mixtures can influence the association between 319 diet and gene expression. This must be taken into account when interpreting PBMC gene 320 expression results. 321

DPs associated with few gene expression clusters 322
Few WGCNA-based gene expression clusters were associated with DPs, after correcting for 323 variation in monocytes and lymphocytes number (Figure 3). This indicates that most of the co-324 variation between diet and gene expression in PBMCs relates to leukocyte cell type distribution. 325 Nevertheless, in women, the Vegetarian DP associated most strongly with a cluster of genes 326 involved in regulation of transcription, and in men, the Western DP associated most strongly 327 with a cluster of genes related to metabolic and inflammatory processes, including 328 sterol/cholesterol transport. 329 In previous reports, dietary intake of a healthy Nordic diet or omega-3 associated with 330 expression of genes related to mitochondrial function, cell cycle, endoplasmic reticulum stress, 331 apoptosis, and inflammatory processes (30)(31)(32)(33). Regulation of transcription is another such 332 unspecific term. Although highly unspecific, regulation of transcription may be a process related 333 to age-related global or pathway-specific DNA methylation and gene expression (34)(35)(36). 334 Sterol/cholesterol transport, on the other hand, is a highly specific biological process that is 335 dramatically affected by diet and that affects disease risk (36,37). Plasma LDL-C is mainly 336 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint 21 determined by cellular sterol status and the functionality and activity of the LDL receptor; this in 337 turn is a key determinant of disease risk (37). However, although cholesterol metabolism in liver 338 and monocytes are tightly coupled and similarly regulated, our observed association likely 339 results from molecular events occuring within the pool of PMBCs as they deal with cholesterol-340 related metabolic challenges. Nevertheless, PBMC expression of genes related to 341 sterol/cholesterol transport could prove a robust marker of dietary variation (10). 342 Interestingly, the second and third most significant clusters that associated with the Western DP 343 in men contained genes related to other metabolic processes, such as UDP-GlcNAc and acyl 344 carnitine metabolism. While UDP-GlcNAc is involved in cellular glucose sensing, acyl carnitines 345 are involved in fatty acid transport into the mitochondria (38,39). Indeed, in previous work, we 346 found that plasma levels of acyl carnitines of specific chain lengths may be directly altered by 347 changes in fatty acid quality of the diet (6). Taken together, these may be processes particularly 348 sensitive to variation in dietary intake. 349

We identified top driver genes and hub proteins 350
Finally, the WGCNA cluster analysis detected top driver genes that have been shown to 351 physically interact in protein-protein interaction networks (Figure 4) (22). This is an important 352 finding, as it provides further biological meaning to the statistical associations, and strengthens 353 our belief that the top driver genes may be more than just spurious associations (40). The 354 network analysis highlighted a few hub proteins that may act as central communicators within 355 each cluster, such as UBC and PPARGC1B. The UBC protein is a key cell signaling molecule, 356 especially related to ubiquitination, cytokine signaling, toll-like receptors, and nuclear factor 357 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint 22 kappa B (NFkB); in mouse models, knock-down of the ubiquitin system shows protection from 358 diet-induced obesity (41). Furthermore, PPARGC1B is a transcriptional co-regulator involved in a 359 number of biological processes, including thermogenesis, bone turnover and regulation of 360 energy expenditure by fat and glucose oxidation. For example, Yin and co-workers recently 361 showed that PPARGC1B affects PPAR alpha to protect against cardiomyophathy (42). 362

Strengths and limitations 363
To the best of our knowledge, nobody has used CIBERSORT and WGCNA to study molecular 364 associations with DPs. We believe these dimension reduction algorithms may be well suited to 365 examine diet-related effects on PBMC cell type distribution using sensitive gene expression 366 data. Although we have taken steps to minimize the probability of chance findings, we cannot 367 rule this out completely. Our study sample is also relatively small, compared to for example Lin 368 and co-workers (40). In addition to higher risk of false positive findings, low sample size also 369 increases the risk of false negative findings, for example for the associations between DPs and 370 WGCNA gene clusters. Furthermore, our study sample represents a highly selected part of the 371 Norwegian population, limiting the generalizability of our results. In line with this limitation, our 372 results should not be overinterpreted. Pathways related to metabolic regulation could 373 potentially be biomarkers of dietary intake, and also potentially predict future risk (40,43). 374 However, more prediction research is needed before this can be realized. 375

Conclusions 376
In conclusion, we detected novel associations between DPs and gene expression features in 377 PBMCs. Our results suggest that DPs may affect monocytes proportions and regulation of 378 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint biological processes, such as regulation of transcription and metabolic and inflammatory 379 processes. Although the present findings are exploratory, our analysis pipeline serves as a useful 380 framework for future studies on the association between diet and gene expression. More 381 research is needed before our results can be translated into clinically meaningful biomarkers. 382 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint  author/funder, who has granted medRxiv a license to display the preprint in perpetuity. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint   author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint  author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint  author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25.20018465 doi: medRxiv preprint  between cluster membership and DP significance for the strongest DP and gene expression cluster associations, for each gender. Cluster membership is defined as the absolute correlation between gene expression and cluster eigengene, and can be interpreted as the degree to which each gene belongs in that certain cluster, and contributes to its variation. DP significance is the absolute correlation between gene expression and DP score. A positive correlation between cluster membership and DP significance indicates that those genes the drive the variation in the cluster eigengene are the same that associate with the specific DP (driver genes). Finally, to rank driver genes, driver gene estimates were calculated by the sum of the cluster membership and DP significance. The darker the color, the higher the driver gene estimate; the top five genes driving this association are annotated. Note strong positive correlations for both comparisons, as is also evident from the linear regression line. C) and D) show networks of protein-protein interactions (PPI) for the same DP and gene expression cluster associations as above. Each network was created by the top 20 driver genes identified by the driver gene plot. The figures display hub proteins that are of particular interest to the gene regulatory network.
Abbreviations: DP, dietary pattern (see Supplementary Table 7 and Supplementary Table 8 for all abbreviations). All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 25