1. Background
The coronavirus disease 2019 (COVID-19) pandemic has emerged as a major global health crisis, resulting in substantial morbidity and mortality worldwide. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection causes not only viral pneumonia and acute respiratory distress syndrome but also severe systemic complications, including multiorgan failure, thrombotic events, acute kidney injury, myocarditis, and endothelial dysfunction, collectively contributing to poor clinical outcomes (1, 2).
Genome-wide association studies (GWAS) have identified the chromosome 3p21.31 locus as a critical genetic determinant of COVID-19 severity, harboring a Neanderthal-derived risk haplotype associated with approximately twofold increased respiratory failure risk and heightened mortality in younger individuals (3). This genetic variant shows marked population heterogeneity, present in over 60% of South Asian ancestry individuals compared to approximately 15% of European populations (4). Within this locus, the leucine zipper transcription factor-like 1 (LZTFL1) gene has emerged as a significant contributor to COVID-19 pathogenesis, influencing epithelial and immune responses and contributing to disease severity variability (5).
Oxidative stress, representing an imbalance between reactive oxygen species (ROS) and antioxidant defenses, is a fundamental driver of COVID-19 pathophysiology (6). Excessive ROS generation during severe infection promotes tissue damage, endothelial dysfunction, and amplifies systemic inflammation, creating a self-perpetuating cycle where thrombotic processes and inflammatory mediators further stimulate ROS production (7). Strong associations exist between oxidative stress biomarkers and systemic inflammatory markers, with prolonged oxidative stress triggering inflammasome activation implicated in disease severity (8).
Despite accumulating evidence, significant gaps remain in our understanding. Most genetic studies have focused on limited sentinel variants, particularly rs11385942, predominantly in European and Latin American populations, yielding inconsistent findings across ethnic groups (9, 10). The relationship between LZTFL1 polymorphisms and biochemical manifestations — including oxidative stress markers, inflammatory parameters, coagulation abnormalities, and radiologic findings — remains incompletely characterized. Underrepresented populations from the Middle East and North Africa possess distinct genetic architectures and disease phenotypes, yet have been largely excluded from susceptibility studies (11).
The Iraqi population is significantly underrepresented in worldwide COVID-19 genetic databases, such as the COVID-19 Host Genetics Initiative (COVID-19 HGI), which primarily compiles data from European, East Asian, and North American cohorts. Middle Eastern communities display unique genetic structures influenced by their specific ancestral histories and may possess population-specific allele frequencies that significantly diverge from those of extensively researched groups.
2. Objectives
The main aim of this work is to present the allele frequencies of rs530118201 and rs72876529 in an Iraqi COVID-19 cohort to address this significant data deficiency. Despite the lack of statistically significant illness associations — attributable to a restricted sample size — recording population-specific allele frequencies yields crucial data for future extensive meta-analyses and mitigates publication bias by incorporating negative findings into the literature. This methodology adheres to the tenets of transparent research, wherein null results hold equal significance in enhancing global comprehension of genetic susceptibility among varied populations.
3. Methods
3.1. Study Population and Ethics
This case-control study involved 50 COVID-19 patients from Al-Marjan Hospital in Iraq during the pandemic's second wave, classified by disease severity. Although patients were assessed for severity, the small sample size limited subgroup analyses. Participants provided informed consent, and the study was approved by relevant Iraqi authorities, adhering to ethical standards for research involving vulnerable populations. Retrospective serological testing for past infections was not conducted, but current SARS-CoV-2 infections were excluded (12).
A convenience sample of 30 healthy volunteers without a recorded history of COVID-19 was selected as a control group. Significant methodological limitation: The controls were considerably younger (mean age 27.33 ± 8.26 years) than the patients (51.66 ± 18.49 years; P = 0.001), resulting in a 24-year age disparity that undermines biochemical comparisons due to age-related variations in inflammatory markers, coagulation factors, and immune parameters. The rationale for maintaining controls in genetic analysis is that allele frequencies for prevalent single-nucleotide polymorphisms (SNPs) with a minor allele frequency exceeding 1% remain rather steady across adult age groups, provided there is no significant selection pressure or survival bias. Nonetheless, we cannot dismiss cohort effects or population stratification. This pilot investigation should have exclusively reported patient allele frequencies without a control group, offering population-specific data for future meta-analyses instead of doing case-control association tests with insufficient power (n = 80, power < 20% for moderate effects).
3.2. Sample Collection and Laboratory Analysis
Blood samples and clinical data were collected from participants under sterile conditions for laboratory processing. Whole blood samples were collected for DNA extraction. Demographic and clinical data [age, Body Mass Index (BMI), computed tomography (CT) scan scores] were recorded for the patient cohort (13).
3.3. DNA Extraction and Genotyping
Genomic DNA was extracted from peripheral blood leukocytes using a commercial kit, with quality assessed via NanoDrop spectrophotometry. Only samples with 260/280 ratios of 1.8 - 2.0 and concentrations ≥50 ng/μL were amplified. The LZTFL1 gene region on chromosome 3p21.31 was amplified using specific primers, producing a 468 base pair product that included polymorphic sites rs530118201 and rs72876529. Polymerase chain reaction products were confirmed through agarose gel electrophoresis before being sent to Macrogene Company for bidirectional Sanger sequencing. Genotype calling was performed manually by two investigators using Chromas software, accepting only high-quality sequences. All 80 samples (50 patients and 30 controls) produced clear genotype calls, achieving a 100% success rate due to stringent quality control and Sanger sequencing methods (14).
3.4. Statistical Analysis
Descriptive data were expressed as mean ± standard deviation for continuous variables and as percentages for categorical variables. Comparative analysis between COVID-19 patients and control groups was performed using an independent samples t-test, with the significance threshold set at P < 0.05. Genetic association analyses, including single-locus genotype and allele frequency comparisons, Hardy-Weinberg equilibrium testing, haplotype analysis, linkage disequilibrium characterization, and gene-gene interaction analysis, were conducted using SHESis software.
4. Results
4.1. Demographic and Biochemical Characteristics
This study explored the effects of the COVID-19 pandemic in Babylon Province, Iraq, focusing on oxidative stress markers, inflammatory biomarkers, and LZTFL1 gene polymorphisms. Two specific SNPs were analyzed for their association with COVID-19 outcomes. The small sample size limited the ability to analyze disease severity. Table 1 presents the descriptive demographics of the COVID-19 patient cohort. The patients had a mean age of 51.66 ± 18.49 years and a mean body mass index of 26.82 ± 5.76 kg/m². Control participants (n = 30) were significantly younger (mean age 27.33 ± 8.26 years). Due to this 24-year age gap, no comparisons of biochemical or inflammatory markers are presented, as age is a major confounder (15).
| Characteristic | COVID-19 Patients |
|---|---|
| Age (y) | 51.66 ± 18.49 |
| BMI (kg/m²) | 26.82 ± 5.76 |
| CT scan severity score | 41.50 ± 30.47 |
Abbreviations: COVID-19, coronavirus disease 2019; BMI, Body Mass Index; CT, computed tomography.
a Values are expressed as mean ± SD.
4.2. Genetic Association Analysis
4.2.1. Single Locus Association
The SNPs rs530118201 and rs72876529 were assessed for association with COVID-19 susceptibility using chi-square and Fisher's exact tests (Figure 1). For rs530118201, no significant correlation was observed between genotype frequencies in patients and controls (χ² = 0.407, P = 0.641). The odds ratio (OR) was 2.111 (95% CI: 0.203 - 21.873). For rs530118201, no significant correlation was observed between genotype frequencies in patients and controls (χ² = 0.407, P = 0.641). The OR was 2.111 (95% CI: 0.203 - 21.873); however, the wide confidence intervals crossing 1.0 and the non-significant P-value indicate no evidence of association in this sample. These null findings are attributable to the limited sample size and low statistical power (Table 2). Similarly, rs72876529 did not demonstrate a significant difference (χ² = 0.892, P = 0.45), with an OR of 2.25 (95% CI: 0.405 - 12.478), which does not provide statistical evidence of association given the confidence interval includes the null value of 1.0. These results indicate that, individually, neither SNP demonstrates a statistically significant association with COVID-19 status in this population (16).
Molecular characterization of leucine zipper transcription factor-like 1 (LZTFL1) gene polymorphisms rs530118201 and rs72876529; A, location of single-nucleotide polymorphisms (SNPs) in the NCBI database; B, whole blood genome electrophoresis; C, polymerase chain reaction amplification products (470 base pairs); and D, DNA sequencing histogram of study subjects
| SNP | OR [95% CI] (Patients/Controls) | Allele Counts | MAF (95% CI) | Fisher's P-Value | |
|---|---|---|---|---|---|
| Patients | Controls | ||||
| rs530118201 | 2.111 [0.203-21.873] | 0.030 (0.006 - 0.085) | 0.017 (0.001 - 0.091) | 0.641 | |
| Genotype counts | |||||
| TT | 47/29 | ||||
| TC | 3/1 | ||||
| CC | 0/0 | ||||
| Allele counts | |||||
| T | 97/59 | ||||
| C | 3/1 | ||||
| rs72876529 | 2.25 [0.405-12.478] | 0.060 (0.022 - 0.126) | 0.033 (0.004 - 0.115) | 0.45 | |
| Genotype counts | |||||
| CC | 44/28 | ||||
| CG | 6/2 | ||||
| GG | 0/0 | ||||
| Allele counts | |||||
| C | 94/58 | ||||
| G | 6/2 | ||||
Abbreviation: SNP, single-nucleotide polymorphism.
Wide confidence intervals reflect a limited sample size (n = 80 total) and low minor allele frequencies (MAF). No statistically significant associations were detected (P > 0.05 for both SNPs).
For rs530118201, no significant correlation was observed between genotype frequencies in patients and controls (χ² = 0.407, P = 0.641). The OR was 2.111 (95% CI: 0.203 - 21.873), indicating no evidence of increased risk; however, this finding was not statistically significant, attributable to wide confidence intervals and limited sample size.
4.2.2. Genotyping Quality Assurance
All genotyping was performed using Sanger sequencing, which is considered the gold standard for accuracy and reliability in SNP detection. The 100% genotyping call rate (80/80 samples successfully genotyped for both SNPs) reflects the technical advantages of Sanger sequencing over array-based methods, particularly when working with high-quality DNA from fresh blood samples and a limited number of target SNPs s. Manual verification of all chromatograms by two independent reviewers ensured data integrity. Representative chromatograms for each genotype (homozygous wild-type, heterozygous, and homozygous variant) are shown in Figure 1D.
4.2.3. Hardy-Weinberg Equilibrium
Hardy-Weinberg equilibrium was evaluated for both SNPs in patients and controls independently (Table 3). Fisher's exact tests produced P-values over 0.70 for both SNPs across all groups, signifying no statistically significant divergence from Hardy-Weinberg equilibrium. Nonetheless, these tests are significantly underpowered due to the limited sample sizes (n = 50 cases, n = 30 controls) and exceedingly low MAF (0.017 - 0.060). Only 3 - 6 heterozygotes were seen, and no homozygous variant individuals were present, resulting in exceedingly broad confidence ranges for Hardy-Weinberg equilibrium P-values, spanning from total equilibrium to significant disequilibrium. The P = 1.000 values for rs530118201 result from χ² statistics approaching zero (χ² < 0.04), indicating genotype distributions that closely align with theoretical Hardy-Weinberg expectations by chance; nonetheless, this should not be construed as conclusive proof of equilibrium. Insufficient sample numbers render it unfeasible to differentiate between authentic Hardy-Weinberg equilibrium and variations attributable to genotyping errors, population substructure, or selection pressures. Subsequent research, including at least 500 individuals in each group, would yield sufficient power to rigorously evaluate the Hardy-Weinberg equilibrium.
| SNP | Genotypes Counts (Observed) | Chi2 (Case) | Fisher's P-Value (Case) | Interpretation |
|---|---|---|---|---|
| rs530118201 | ||||
| Patients | 0.035 | 1 | No deviation detected | |
| TT | 47 | |||
| TC | 3 | |||
| CC | 0 | |||
| Controls | 0.002 | 1 | No deviation detected | |
| TT | 29 | |||
| TC | 1 | |||
| CC | 0 | |||
| rs72876529 | ||||
| Patients | 0.337 | 0.705 | No deviation detected | |
| CC | 44 | |||
| CG | 6 | |||
| GG | 8 | |||
| Controls | 0.023 | 0.999 | No deviation detected | |
| CC | 28 | |||
| CG | 2 | |||
| GG | 0 |
4.3. Primary Contribution: Population-Specific Allele Frequency Data
This study presents the inaugural allele frequency estimates for LZTFL1 polymorphisms rs530118201 and rs72876529 within an Iraqi population. The documented MAFs — rs530118201 C allele: 0.030 in patients (95% CI: 0.006-0.085), 0.017 in controls (95% CI: 0.001 - 0.091); rs72876529 G allele: 0.060 in patients (95% CI: 0.022 - 0.126), 0.033 in controls (95% CI: 0.004 - 0.115) — constitute the inaugural published genetic data for these variants within a Middle Eastern Arab population.
4.4. Comparison with International Databases
There is a paucity of published data regarding these exact SNPs in global databases. The 1000 Genomes Project and gnomAD do not offer Middle Eastern-specific frequencies for rs530118201 and rs72876529. The minor allele frequency values we obtained align with the range documented for other intronic LZTFL1 variations in admixed populations (minor allele frequency 0.01 - 0.10); however, direct comparisons are unfeasible due to the absence of ancestry-matched reference data. This data deficiency highlights the urgent necessity for genetic diversity projects aimed at underrepresented communities in the Middle East and North Africa. The COVID-19 Host Genetics Initiative (COVID-19 HGI), which has compiled data from over 100,000 cases worldwide, features less than 1% representation from Middle East and North Africa populations. Our Iraqi allele frequency data, although derived from a limited sample size, serve as a crucial reference for future multi-ethnic meta-analyses intended to identify population-specific genetic effects or validate null relationships across various ancestries.
4.5. Study Limitations
The study has considerable limitations owing to a small sample size (n = 80), consisting of 50 patients and 30 controls, yielding less than 20% statistical power to identify mild genetic effects. A post-hoc analysis indicates that a minimum of 2,000 cases is required for sufficient statistical power. The results are ambiguous, failing to distinguish between genuine null effects and Type II mistakes. Moreover, a significant age disparity (24 years) between the groups introduces confounding variables, complicating the attribution of variations in biochemical markers to the disease rather than to age. The study is deficient in thorough demographic and clinical data, including sex distribution and treatment protocols, and fails to consider the timing of sample collection in relation to symptom onset.
The biochemical results are dubious because of unvalidated assays, and the lack of multiple testing adjustments raises doubts regarding the statistical validity of the stated connections. Moreover, the control group lacks serological verification of previous COVID-19 infection, which may distort allele frequencies. Although negative findings are crucial for addressing publication bias, the study's limited scale and broad confidence intervals indicate that its influence on future meta-analyses should be diminished in favor of more extensive investigations.
5. Discussion
This pilot case-control study examines the allele frequencies of LZTFL1 polymorphisms rs530118201 and rs72876529 within an Iraqi COVID-19 cohort, representing the inaugural genetic data for both variants in a Middle Eastern Arab population. The study identified no significant illness connections, which can be ascribed to its limited sample size (n = 80) and substantial statistical underpowering. The principal contribution is the compilation of allele frequency data from Iraqi Arabs for subsequent meta-analyses. Nevertheless, significant drawbacks encompass insufficient sample size for association analysis, age discrepancies between cases and controls, missing clinical metadata, and unverified biochemical assays. Subsequent studies should encompass bigger cohorts (n = 2,000), age-matched controls, genome-wide genotyping, and validated biomarker assays to yield more conclusive insights into the impact of LZTFL1 polymorphisms on COVID-19 susceptibility. The findings serve as initial benchmarks for population genetics until that time.
