1. Background
2. Objectives
3. Methods
3.1. Drug and Disease Target Dataset Acquisition
3.1.1. Screening of XYS Bioactive Compounds
3.1.2. Prediction and Standardization of XYS-Related Targets
3.1.3. Collection and Standardization of DILI-Related Targets
3.2. Network Construction and Topological Analysis
3.2.1. Identification of Overlapping Targets and Protein-Protein Interaction Network Construction
3.2.2. Visualization and Topological Analysis of the Protein-Protein Interaction Network
3.2.3. Construction of the Herb-Compound-Target Network
3.3. Gene Expression Omnibus Dataset Acquisition and Transcriptome Analysis
3.3.1. GEO Dataset Acquisition
3.3.2. Differential Expression Analysis and Visualization
3.3.3. Integration of Network Pharmacology and Transcriptomic Evidence
3.3.4. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes Enrichment Analysis
3.4. Hub Gene Identification Using Machine Learning
3.5. Molecular Docking
3.5.1. Receptor and Ligand Preparation
3.5.2. Docking Procedure
3.5.3. Binding Mode Visualization
3.6. Cell-Based Experimental Validation
3.6.1. Cell Culture
3.6.2. Preliminary CCK-8 Concentration Screening
3.6.3. APAP-Induced Hepatotoxicity Model
3.6.4. Compound Treatment
3.6.5. Cytotoxicity and Hepatocellular Injury Assays
3.6.6. Reverse Transcription Quantitative PCR Assay
3.6.7. ELISA Assays
3.7. Statistical Analysis
4. Results
4.1. Target Identification and Network Construction
Integrated network pharmacology and transcriptomic analysis for identifying Xiao-Yao-San intervention targets in drug-induced liver injury. (A) Venn diagram showing 121 overlapping genes between XYS target proteins (n = 351) and DILI-associated genes (n = 1134). (B) PPI network of the 121 overlapping target proteins. Node size and color intensity indicate degree centrality, with larger and redder nodes indicating higher degree values and more pivotal roles in the network. (C) Herb-compound-target tripartite network showing relationships between XYS herbs (orange squares), compounds (circles/diamonds), and target proteins (purple diamonds) (Abbreviations: AM, Atractylodes macrocephala Koidz; AS, Angelica sinensis (Oliv.) Diels.; BC, Bupleurum chinense DC.; GU, Glycyrrhiza uralensis Fisch.; MH, Mentha haplocalyx Briq.; PC, Poria cocos (Schw.) Wolf.; PL, Paeonia lactiflora Pall.; and ZO, Zingiber officinale Rosc). (D) Top 10 compounds ranked by degree centrality, with quercetin showing the highest connectivity. (E) Volcano plot of DEGs in the GSE93840 dataset comparing control and DILI groups. Red dots indicate up-regulated genes (log2FC > 2, FDR < 0.05), blue dots indicate down-regulated genes (log2FC < -2, FDR < 0.05), and gray dots represent non-significant genes. (F) Heatmap of the top 30 DEGs between control and DILI groups, with hierarchical clustering. (G) Venn diagram showing 16 overlapping genes between XYS potential targets (121 genes) and DILI-related DEGs (781 genes) from the GEO database. (H) Spearman correlation matrix of the 16 core target genes. Red indicates positive correlations, while blue indicates negative correlations. (I) Top 6 GO terms enriched in the 16 core target genes, showing biological processes, cellular components, and molecular functions. (J) Sixteen KEGG pathways. Dot size represents gene count; color intensity indicates FDR-value significance.
4.2. Differential Expression Analysis and Core Target Gene Identification
4.3. Machine Learning-Based Hub Gene Identification
Machine learning-based hub gene identification. (A) Least absolute shrinkage and selection operator cross-validation error curve with optimal lambda (λ) parameter selection. Numbers above the plot represent the number of non-zero coefficients at each λ value. (B) LASSO coefficient shrinkage paths showing the variable selection process across different λ values. Each colored line represents the trajectory of 1 gene coefficient. Numbers above the plot indicate the number of non-zero coefficients remaining at each λ value. (C) Random Forest out-of-bag error curve across different numbers of decision trees. (D) Random Forest variable importance plot ranking genes by their importance scores. (E) Venn diagram showing 5 overlapping hub genes (CYP1A2, CYP3A4, PDE5A, F3, and ALOX5) identified by both LASSO and Random Forest algorithms.
4.4. Molecular Docking
Molecular docking analysis between top bioactive compounds identified from Xiao-Yao-San and hub proteins associated with drug-induced liver injury. Representative docking conformations with the lowest predicted binding energies are shown for 8 protein-ligand pairs: (A) CYP3A4-beta-sitosterol (-8.66 kcal/mol), (B) CYP1A2-beta-sitosterol (-7.92 kcal/mol), (C) CYP1A2-quercetin (-7.24 kcal/mol), (D) ALOX5-beta-sitosterol (-7.18 kcal/mol), (E) PDE5A-luteolin (-7.11 kcal/mol), (F) PDE5A-quercetin (-7.02 kcal/mol), (G) PDE5A-beta-sitosterol (-6.93 kcal/mol), and (H) CYP3A4-kaempferol (-6.65 kcal/mol). For each panel, the left image shows the overall protein-ligand binding mode with protein structures rendered as light purple cartoons and ligands as sticks. The right image displays an enlarged view of the binding pocket, highlighting key interacting residues (green sticks), ligands (colored sticks), and hydrogen bonds (yellow dashed lines).
4.5. Cell-Based Experimental Validation
Cytoprotective effects of Xiao-Yao-San bioactive compounds in an in vitro drug-induced liver injury model. (A) Cell viability determined by the Cell Counting Kit-8 assay. (B) Relative lactate dehydrogenase release. (C) Relative alanine aminotransferase levels. (D) Relative aspartate aminotransferase levels. Data are presented as mean ± SD from 3 independent biological replicates (n = 3). Bars with different letters indicate significant differences among groups (one-way ANOVA followed by Tukey HSD test, P < 0.05); bars sharing at least 1 letter are not significantly different.
Effects of Xiao-Yao-San bioactive compounds on hub gene expression and related functional markers in the APAP-induced hepatotoxicity model. (A-E) Relative mRNA expression levels of ALOX5, CYP1A2, CYP3A4, F3, and PDE5A. (F-J) Levels of LTB4, CYP1A2, CYP3A4, F3, and cGMP. Data are presented as mean ± SD from 3 independent biological replicates (n = 3). Bars with different letters indicate significant differences among groups (one-way ANOVA followed by Tukey HSD test, P < 0.05); bars sharing at least 1 letter are not significantly different.




