1. Background
Hydatid disease is globally distributed, and the western region of China (Xinjiang, Qinghai, etc.) has the highest incidence, especially in pastoral areas (1). Two species of echinococcosis are common in humans: Echinococcus granulosus, which causes cystic echinococcosis, and multilocularis, which causes alveolar echinococcosis (AE). Among them, AE is extremely harmful and deadly, and is a common parasitic disease that seriously endangers people's health. The primary organ invaded by AE is mainly the liver, and the parasitic tissue continues to proliferate and infiltrate, leading to a gradually enlarged liver lesion. Interactions between parasites and host connective tissue and immune cells can cause fibrosis of the liver (1). Infection by the parasite triggers both innate and acquired immune responses in the host’s liver (2). Alveolar echinococcosis progresses slowly and is similar in appearance to hepatocellular carcinoma (HCC), also known as "worm cancer" and "parasitic liver cancer" (3). The disease is only occasionally diagnosed when symptoms are caused by imaging tests or complications. Misdiagnosis or late diagnosis of AE may lead to serious delay in drug therapy, and there may be risks of apoptosis, autophagy, liver damage (4), liver fibrosis (5), and liver failure (6) caused by inflammation, resulting in early death of patients.
RNA-binding proteins (RBPs) are a class of proteins that bind to RNA during the regulatory and metabolic processes of RNA. RNA-binding proteins can recognize specific RNA binding domains, thereby interacting with RNA and participating in various post-transcriptional regulatory processes such as RNA splicing, transport, polyadenylation, intracellular localization, translation, and degradation (7). In the post-transcriptional regulation of RBPs, alternative splicing (AS) is a very important process, which involves organizing the exons of original gene transcripts (pre-mRNA) in different arrangements to produce mRNA and protein variants with different structure and function (8). Alternative splicing plays a significant role in diseases by making gene expression patterns more complex and transcriptional efficiency higher and promoting protein diversity. Posttranscriptional regulation of gene expression is increasingly recognized as a model for inherited and acquired diseases. Therapies targeting splicing and degradation of RBPs have potential applications in metabolic and immune-mediated liver diseases (9).
In order to search for molecular biological indicators of hepatic alveolar echinococcosis (HAE) in the early detection of HAE, as well as better alternative therapeutic drugs, and provide important clues to the treatment of HAE and prevention of disease progression, this study will analyze the transcriptome sequencing results. Differential regulated alternative splicing gene (RASG), differential alternative splice event (RASE), and RBP-AS regulatory axes have been preliminarily explored in HAE in humans.
2. Methods
2.1. Samples and Collection
Ten HAE patients who met the requirements for surgical resection in Qinghai Provincial People's Hospital were collected. After the surgical specimens were removed from the abdominal cavity, the required tissues were quickly cut and put into RNA-free specimen bags and stored in -80° liquid nitrogen for later experimental use. This study was approved by the Ethics Committee of Qinghai Provincial People's Hospital (2021 - 32), and all subjects signed informed consent.
2.2. RNA Extraction and Sequencing
Total RNA was treated with RQ1 DNase (Promega) to remove DNA. The quality and quantity of the purified RNA were determined by measuring the absorbance at 260nm/280nm (A260/A280) using smartspec plus (BioRad). RNA integrity was further verified by 1.5% agarose gel electrophoresis. For each sample, 1 μg of total RNA was used for RNA-seq library preparation by KAPA stranded mRNA-Seq Kit for Illumina® Platforms. Polyadenylated mRNAs were purified by VAHTS mRNA capture Beads (N401-01) and fragmented. Fragmented mRNAs were converted into double-stranded complementary DNA (cDNA). Following end repair and a tailing, the DNAs were ligated to Diluted Roche Adaptor. After purification of ligation product and size fractioning to 300 - 500 bps, the ligated products were amplified, purified, quantified, and stored at -80℃ before sequencing. The strand marked with dUTP (the 2nd cDNA strand) was not amplified, allowing strand-specific sequencing. For high-throughput sequencing, the libraries were prepared following the manufacturer's instructions and applied to Illumina Novaseq 6000 system for 150 nt paired-end sequencing.
2.3. RNA-Seq Raw Data Clean and Alignment
Raw reads containing more than 2-N bases were first discarded. Then adaptors and low-quality bases were trimmed from raw sequencing reads using FASTX-Toolkit (Version 0.0.13). The short reads less than 16 nt were also dropped. After that, clean reads were aligned to the human genome by HISAT2 allowing 4 mismatches. Uniquely mapped reads were used for gene reads number counting and FPKM calculation.
2.4. Differentially Expressed Genes Analysis
The R Bioconductor package DESeq2 was utilized to screen out the differentially expressed genes (DEGs). The P value for correction < 0.05 and fold change > 2 or < 0.5 were set as the cut-off criteria for identifying DEGs.
2.5. Alternative Splicing Analysis
The alternative splicing events (ASEs) and regulated alternative splicing events (RASEs) between the samples were defined and quantified by using the ABLas pipeline as described previously. In brief, ABLas detection of ten types of ASEs was based on the splice junction reads, including exon skipping (ES), alternative 5' splice site (A5SS), alternative 3' splice site (A3SS), mutually exclusive exons (MXE), mutually exclusive 5' UTRs (5pMXE), mutually exclusive 3' UTRs (3pMXE), cassette exon, A3SS&ES, and A5SS&ES.
2.6. Functional Enrichment Analysis
To sort out functional categories of DEGs, gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were identified using KOBAS 2.0 server. Hypergeometric test and Benjamini-Hochberg FDR controlling procedure were used to define the enrichment of each term.
2.7. Co-expression Analysis
Co-expression analysis was performed for all differentially expressed RBP and RASE. Meanwhile, Pearson correlation coefficient between differentially expressed RBP and RAS was calculated, and DERBP-RAS relationship pairs satisfying absolute value of correlation coefficient ≥ 0.95 and P-value ≤ 0.01 were screened.
2.8. Protein-Protein Interaction Network Construction Analysis
The Search Tool for the Retrieval of Interacting Genes was used to analyze the association between RBPs that are DEGs. The minimum required interaction score was set to 0.4, and the protein nodes that have no interaction with others were removed.
2.9. RNA Isolation, Complementary DNA Synthesis, and qRT-PCR
Total RNA from homogenized tissues was isolated using TRIzol reagent (Ambion, USA). The quality of the RNA was assessed using a nucleic acid protein quantifier. RNA was reverse-transcribed into cDNA using the Reverse Transcription Kit 5X All-In-One RT MasterMix (Abm, Canada) under the following reaction conditions: Thirty-seven degree Celsius for 15 min and 85°C for 5 s. qRT-PCR analysis was performed using a real-time PCR instrument (ABI, USA) with the EvaGreen Express 2qPCR MasterMix-Low Rox Kit (ABM, Canada) and the cDNA as a template. GAPDH served as the internal reference. The relative expression levels were calculated using the 2-ΔΔCt method and expressed in terms of fold change. The primers were designed using Primer5 (Table 1).
| Chemical Substance Index Names | Sequence (5' to 3') |
|---|---|
| CCL22-F | GGATTACAGGCGTGAGTC |
| CCL22-R | CTCCAATTCCACAGGTTCA |
| RAC2-F | CCACCTAGATGGGTCTGA |
| RAC2-R | ACCATCAACGAAGCTCTG |
| hum-GAPDH-F | GGTCGGAGTCAACGGATTTG |
| hum-GAPDH-R | GGAAGATGGTGATGGGATTTC |
3. Results
3.1. Differentially Expressed mRNA in Hepatic Echinococcosis Patients
After preprocessing and quality control of the sequencing data in this project, effective data were obtained. The reads were located in the reference genome and relevant data analysis was conducted. The version of the reference genome was GRCh38, among which intron and intergenic accounted for a relatively large proportion. Thus, the rationality of the reference genome could be proved (Figure 1). In this project, 855 DEGs were identified through RNA-seq technique in liver tissue samples of HAE lesions and adjacent liver tissues, among which 801 genes were up-regulated and 54 genes were down-regulated (Figure 2A). The top 10 biological functions of up-regulated gene GO include phagocytosis, opsonism, and positive regulation of B cell activation, recognition phagocytosis, immune system processes, leukocyte migration, and defense response to bacteria, complement activation, classical pathway, natural immune response, and collagen fiber tissue. Down-regulated gene GO enrichment into biological functional pathways related to urea cycle, drug response, and transcriptional regulation (Figure 2B and 2C). Kyoto Encyclopedia of Genes and Genomes the effects of up-regulated differential gene enrichment to hematopoietic cell lineage, protein digestion and absorption, intestinal immune network on IgA production, cytokine receptor interaction, phagosomes, tuberculosis, ECM receptor interaction, leishmaniasis, cell adhesion molecules, Staphylococcus aureus infection were analyzed. Down-regulated KEGG enrichment showed amino acid biosynthesis, arginine biosynthesis, alanine, aspartate and glutamate metabolism, metabolic pathway, 2-oxy-carboxylic acid metabolism, tryptophan metabolism, ovarian steroid production, etc. (Figure 2D and 2E). We found that hydatid development is closely related to immunity, so we observed 2 significantly upregulated immune-related DEGs CCL18 and TIMP1, and their FPKM values were shown in Figure 2F and 2G.
Results of RNA-seq and the enrichment of gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG); A, volcanic maps of all differentially expressed genes (DEGs) between the experimental group and the control group; the red part is significantly up-regulated genes, and the blue part is significantly down-regulated genes; B and C, the amount of DEGs in GO enrichment analysis was significantly up-regulated and down-regulated in the experimental group and the control group; D and E, the amount of DEGs in KEGG enrichment analysis was significantly up-regulated and down-regulated in the experimental group and the control group; F and G, histogram of DEGs CCL18 and TIMP1 in the experimental group and the control group
3.2. The Alternative Splicing Analysis of Hepatic Tissue in Hepatic Echinococcosis Patients
In this project, 569 differential RASEs were identified in RNA-seq (Figure 3A and 4C), and 310 were up-regulated and 258 down-regulated (Figure 3). Gene ontology and KEGG enrichment analysis was performed for RASGs with significant changes in variable splicing level, in which GO enrichment showed positive regulation of protein dephosphorylation. Biological processes such as mRNA splicing, mitochondrial translation termination, cell division, regulation of mitotic cell cycle transition, viral processes, RNA splicing, T-cell receptor signaling pathways, mitochondrial translation extension, and translation (Figure 3D). Kyoto Encyclopedia of Genes and Genomes is enriched in ribosomes, human T-cell leukemia virus 1 infection, proteasome, thermogenesis, Chagas disease (American trypanosomiasis), spliceosome, PD-L1 expression and PD-1 checkpoint pathway in cancer, T-cell receptor signaling pathway, protein processing in the endoplasmic reticulum, and pertussis pathways (Figure 3E). We found that the gene MRPL24 GO was enriched in mitochondrial translation, and KEGG was enriched in the ribosome-related pathway, and showed the probability ratio of its occurrence of ASEs (Figure 3F).
The differentially expressed RAS identified in RNA-seq and their enrichment in gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG); A, PCA based on splicing ratio of all NIR RAS differential alternative splice event (RASE); the ellipse for each group is the confidence ellipse; B, the bar plot showing the number of all significant regulated alternative splicing events (RASEs) between test and ctrl samples. X-axis: All ASE number; Y-axis: The different types of alternative splicing events (ASEs); C, hierarchical clustering heatmap of NIR RAS based on splicing ratio; D, the scatter plot exhibiting the most enriched GO biological process results of the regulated alternative splicing genes (RASGs); E, the scatter plot exhibiting the most enriched KEGG pathway results of the RASGs; F, bar plot showing the expression pattern and statistical difference of RASEs for MRPL24 gene; error bars represent mean ± SEM; * P-value < 0.05.
RNA-binding proteins (RBPs) and differentially regulated alternative splicing genes (RASGs); A, Venn diagram showing the overlapped genes between RBPs and differentially expressed genes (DEGs); B, hierarchical clustering heat map showing expression levels of DE RBPs; C, histogram of number of RBP, RASG, and differential alternative splice event (RASE); D, the bar plot exhibiting the most enriched GO biological process results and the most enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway results of the DE RBPs; E, bar plot showing the expression pattern and statistical difference of DEGs for some important RBPs; error bars represent mean ± SEM; *** P-value < 0.001
3.3. Co-expression Network Between Differentially Expressed RNA-Binding Proteins and Differentially Alternative Splicing Genes Associated with Hepatic Echinococcosis Patients
We found 26 overlapping genes between the DEGs screened from the RNA-seq data and the currently known RBPs (Figure 4A and 4B). We analyzed the interaction association between the 26 overlapping genes by using the interaction gene retrieval search tool (Figure 5A). According to the 26 selected RBPs, the association between them and differential RASE (RBP-AS) was established through co-expression. It was found that 26 RBPs were co-expressed with 187 differential RASGs, and a total of 894 RBP-AS co-expression relationship pairs were found (Figure 5B). We then screened the other significantly different ASEs (NIR) except intron retention according to the ascending P-value order, and finally selected 3 trusted ASEs and 4 RBPs (all with expression levels above "1"): The 3 trusted ASEs were LTBP3, AP1B1, and ECHDC2 (Figure 5D). The four RBP genes are S100A4, vimentin (VIM), CD44, and LGALS3 (Figure 4E). We identified 5 RBP-AS relation pairs: S100A4-ECHDC2; CD44-AP1B1; VIM-AP1B1; CD44-LTBP3; LGALS3-LTBP3, and show the co-expression relationship between them (Figure 5C). Gene ontology functional enrichment analysis of RASG gene co-expressed with RBP was mainly concentrated in translation, mitochondrial translation termination, cell division, mitochondrial translation elongation, cell cycle, autophagy, mRNA splicing, rRNA processing via spliceosome, vesicle-mediated transport, and RNA splicing. Kyoto Encyclopedia of Genes and Genomes enrichment analysis mainly concentrated in ribosomes and production (Figure 4D).
Co-expression network between DE RNA-binding proteins (RBPs) and regulated alternative splicing gene (RASG) associated hepatic echinococcosis patients; verification of differentially expressed genes (DEGs); A, protein-protein interaction (PPI) mapping of the 26 DERBPs; B, the network plot showing all RBPs co-expressing RASGs; C, the network plot showing three RASGs co-expressed by the four DERBPs; D, bar plot showing the expression pattern and statistical difference of regulated alternative splicing events (RASEs) for some important genes; error bars represent mean ± SEM; ** P-value < 0.01, * P-value < 0.05; E, qRT-PCR validation results 0.05, ** P-value < 0.01, *** P-value < 0.001
3.4. Verification of Differentially Expressed Genes in Alveolar Echinococcosis
We collected surgical specimens from 10 patients and conducted qRT-PCR experiments. Compared with normal tissues more than 1 cm from the lesion, GNB3, CCL22, and RAC2 were significantly overexpressed in tissues within 1 cm of the lesion and in the central tissue of the lesion, with the highest expression levels of the three genes in the central area of the lesion, P-value < 0.05. It was speculated that GNB3, CCL22, and RAC2 might play an important role in the occurrence and development of AE (Figure 5, Table 2).
| Experimental Grouping | GNB3 | CCL22 | RAC2 |
|---|---|---|---|
| The lesion is larger than 1cm of normal tissue | 1.110 ± 0.615 | 1.675 ± 1.679 | 1.343 ± 0.865 |
| Lesion tissue | 11.347 ± 8.760 | 10.033 ± 7.235 | 10.587 ± 6.795 |
| The lesion is 1cm of tissue | 3.363 ± 3.108 | 9.921 ± 6.294 | 2.517 ± 2.974 |
5. Discussion
Hepatic alveolar echinococcosis has a high incidence in remote pastoral areas due to underdeveloped economy and low medical level, and the survival time of most patients is significantly reduced due to late detection and delayed treatment (6). At present, the expression profile of circular RNA (circRNA) in the cyst wall of cystic echinococcosis has been studied by microarray sequencing (10). However, there are few studies on RNA-seq for AE. In this study, RNA-seq will be used to further study the tissue expression profile of HAE, which is expected to provide certain ideas for the disease progression, treatment, and prognosis of HAE.
We analyzed RNA-seq data and found that many DEGs were mainly up-regulated. Gene ontology analysis mainly concentrated on immune-inflammatory related biological functional pathways, such as the up-regulated gene HLA-DRA enrichment process in the immune system, and TIMP1 enriched the signaling pathway mediated by inflammatory factors. Chemokines such as CCL18 are enriched into the biological functional pathways of immune response, indicating that hydatid infection causes differential changes in the expression of a large number of immune genes, which is consistent with literature reports (11). However, some studies have shown that immune evasion may promote the progression of echinococcosis, so abnormal expression of immune factors such as TIMP1 and CCL18 enriched in the immune system may promote the progression of echinococcosis, but more experiments are needed to verify this result.
DEGs screened in this study overlapped with the currently known RBPs (12), and 26 RBPs were found to be differentially expressed. Some studies have found that some RBPs are expressed in the liver and are related to liver metabolism or diseases. For example, RBP gene ELAVL1 can trigger autophagy activation, promote the degradation of autophagy ferritin, and eventually lead to iron-dependent iron death, which is expected to become a potential target for liver fibrosis treatment (13). RNA-binding protein gene zinc finger protein 36 (ZFP36) may be a potential target for the treatment of liver fibrosis (14). RNA-binding protein gene heteroribonucleoprotein L (HNRNPL) affects AKT signaling and DNA damage sensing in HCC (15). Expression of RNA-binding protein RBMS3 is up-regulated in hepatic stellate cells (HSCs) and activation of fibrotic liver. By binding to Prx1 mRNA, RBMS3 controls the expression of Prx1 and indirectly synthesizes collagen α1, thus playing a role in the activation of hepatic stellate cell fibrosis (16). In the progression of HAE, different degrees of liver fibrosis will appear in the liver tissues around the lesion. These fibrotic "shells" will affect the entry of drugs into the lesion tissues, thus reducing the efficacy of drugs, which is also an important reason for the infiltration and growth of the lesion tissues. Some of the 26 RBPs genes may play a role in promoting or inhibiting liver fibrosis in HAE, and the two genes participate in the progression of hepatic echinococcosis in a relatively dynamic equilibrium state. Therefore, the in-depth study of these genes will provide important clues for the treatment of HAE and the prevention of disease progression.
We also paid attention to the genes with significant changes in the level of AS, and enriched the positive regulation of protein dephosphorylation and other biological processes in GO analysis. Some studies have found that there are currently many studies targeting protein kinases as drug targets in the treatment of echinococcosis (17), which is consistent with the research on the positive regulation of biological functional pathways of protein dephosphorylation in this study. We found that the AS gene MRPL24 was enriched in mitochondrial translation and ribosomal-related pathways in GO analysis and KEGG enrichment analysis. It has been reported that mitochondrial ribosomal drugs can be used as targets for the treatment of Echinococcus multilocularis infection (18). Therefore, we speculate that further research on the AS gene MRPL24 may provide more extensive ideas and methods for the treatment of HAE.
In this study, we screened out three credible ASEs: LTBP3, AP1B1, and ECHDC2. Studies have found that LTBP3, as a heparin-binding cytokine, plays an important role in carcinogenesis and tumor progression, and LTBP3 overexpression is a marker of poor prognosis in patients with primary HCC (19). AP1B1 is related to the prognosis of HCC (20), and patients with HBV-related HCC with high ECHDC2 expression can maintain relatively good hepatocyte metabolism and function, which may have a protective effect (21). The above three genes have not been studied in HAE, and further research on these three genes may provide a favorable basis for the treatment and disease prognosis of HAE. Gene coexpression analysis is widely used to infer gene modules associated with diseases and other clinical traits (22).
We also screened out four co-expressed RBP genes: S100A4, CD44, VIM, and LGALS3. Studies have found that S100A4 mediates the recruitment of macrophages at inflammatory sites in vivo (23); S100A4 promotes liver fibrosis by activating HSCs (24); the combination of CCR2 and CD44 promotes the recruitment of inflammatory cells in the process of fatty liver formation (25); blocking the CD44-hyaluronic acid interaction reduces neutrophil adhesion in endotoxemic mice (26). Vimentin is the most important gene for predicting advanced liver fibrosis (27). The expression levels of Wnt3a, β-catenin, N-cadherin, Col1a1, α-SMA, VIM, CTGF, and TGF-β were observed to increase in the tissues near human AE lesions and in the Wnt3A-treated mouse group. On the contrary, the expression of E-cadherin is relatively low (28). Galectin-3 encoded by LGALS3 belongs to the glycans binding protein family and can regulate basic cellular functions such as proliferation, migration, differentiation, apoptosis, and inflammation (29). We identified five RBP-AS relationship pairs by co-expression: S100A4-ECHDC2; CD44-AP1B1; VIM-AP1B1; CD44-LTBP3; LGALS3-LTBP3. Combined with the above studies, we speculate that S100A4, CD44, VIM, and LGALS3 may participate in the regulation of AS of ECHDC2, AP1B1, and LTBP3 by binding RNA function, thus promoting the progression of hepatic hydatid disease. However, we have not conducted any research on whether there is regulation or not, and it is only speculation based on theory. There are certain limitations.
Data quality control and preprocessing are often the first step in processing next-generation sequencing (NGS) data of tumors. Not only can it help us evaluate the quality of sequencing data, but it can also help us obtain high-quality data for downstream data analysis (30).
Immune chemokines have been studied in many diseases, like the cytokine IL-15, which could predict the prognosis of ESLD patients (31). The mutual microbiota-host interaction through the purinergic P2X7 receptor (P2X7R) and secretory immunoglobulin A (32). Therefore, we collected lesion tissue and liver tissue from 10 patients and conducted further studies on immune chemokines GNB3, CCL22, and RAC2.
GNB3 polymorphisms are linked to fibrotic responses in chronic liver diseases. In hepatic echinococcosis, cyst growth triggers fibrotic encapsulation, suggesting GNB3 may influence disease progression. Studies in other parasitic infections (e.g., schistosomiasis) highlight GNB3 variants as modifiers of host-pathogen interactions. No direct studies in echinococcosis yet.
CCL22 overexpression attracts Tregs to infection sites, suppressing anti-parasitic immunity. In HCV, CCL22 facilitates viral persistence — a parallel mechanism likely in Echinococcus infections. Preclinical studies suggest blocking CCL22/CCR4 axis enhances parasite clearance. No clinical trials for echinococcosis, but trials in cancer immunotherapy (e.g., anti-CCL22 antibodies) provide proof-of-concept.
RAC2 mutations impair macrophage chemotaxis and parasite engulfment. In hepatic echinococcosis, defective RAC2 signaling may explain poor cyst containment. RAC2 mediates oxidative burst in neutrophils — critical for killing Echinococcus larvae. Low ROS levels correlate with chronic infection. The above conclusion can provide guidance for the surgical resection range of HAE and be studied as potential targets.
Of course, we acknowledge that co-expression analysis and prediction of ASEs may have limitations or false positive results. However, in this study, we have taken active measures to reduce their impact on the results, such as quality control and statistical correction. Alternative splicing some articles have mentioned, the distribution of actual low-quality bases in the sequence may lead to many short sequences and may reduce the accuracy of downstream alignment, increasing the sequencing depth of some reference sites (30). Therefore, data preprocessing is becoming increasingly important in data analysis.
5.1. Conclusions
In summary, the differentially expressed mRNA, differentially expressed variable splicing genes, and RBPs may participate in the occurrence and development of liver alveolar hydatid disease, and play an important role in the invasion and metastasis of alveolar hydatid lesions. GNB3, CCL22, and RAC2 genes are highly expressed within the lesion and significantly lowly expressed in the liver tissue 1 cm away from the lesion. The above conclusion can provide guidance for the surgical resection range of HAE and be studied as potential targets. In the later stage, we will increase the sample size and further detect the expression levels of GNB3, CCL22, RAC2, RBPs, AS, etc., genes in peripheral blood to provide more powerful theoretical support for potential targets.




