Alternative splicing is a post-transcriptional process in eukaryotic organisms by which multiple distinct mRNA transcripts are produced from a single gene (
13). It has been estimated that up to 95% of human multi-exon protein coding genes undergo alternative splicing, often in a tissue- or developmental stage-specific manner (
13-
15), and in response to external stimuli as part of several signal transduction networks (
13,
15). The generation of multiple distinct functional mRNA transcripts from a single gene, by one of the four major mechanisms of alternative splicing, is an important source of proteomic diversity, with critical roles in the control of developmental processes and in the dynamic regulation of the transcriptome (
13,
15,
16). Alternative splicing may have contributed to functional innovation during the evolution of the eukaryotic genome (
14). The splicing process is performed by the spliceosome, a ribonucleoprotein megaparticle that assembles around splice sites at each intron (
13,
17). The mechanism of RNA splicing is highly complex, requiring multiple interactions between pre-mRNA, small nuclear ribonucleoproteins and a multitude of splicing factor proteins. Trans-acting factors, mainly RNA-binding proteins, modulate the activity of the spliceosome and of
cis-acting RNA sequences, which include exonic and intronic splicing enhancers and silencers (
13-
15,
17). In addition to their putative role in constitutive splicing, these
cis-regulatory elements are also involved in the regulation of alternative splicing. Up to one-third of alternatively spliced transcripts have been found to contain a PTC (
14,
16). Since these aberrant transcripts are apparent targets of nonsense-mediated mRNA decay (NMD) (
13,
16-
18), the coupling of alternative splicing and NMD may be a pervasive means of regulating protein expression, a process that has been called regulated unproductive splicing and translation (RUST) (
16). Mutations in regulatory sequences that affect alternative splicing are a widespread cause of human hereditary disease and cancer (
13,
15). About 10% of all human pathogenic mutations identified in diagnostic molecular genetics laboratories are in canonical splice sites (
19), but this estimate does not include mutations affecting splicing enhancers, silencers or trans-acting factors, which are much more difficult to recognise and may have been historically overlooked (
20).
With our cDNA GLA genotyping protocol, using RNA extracted from peripheral blood leukocytes, we have identified two novel alternatively spliced transcripts affecting exon 3, at concentrations high enough to be detectable on agarose gels, one lacking the last 62 nucleotides (c.del486-547), the other missing the entire exon (c.del370-547). While the latter was uncommonly found, the c.del486-547 transcript was present in more than 20% of the GLA cDNA samples analysed in the PORTYSTROKE study. These alternatively spliced GLA transcripts occurred in the absence of any sequence variants that might affect the splicing mechanism.
GLA intron 3 is a phase 1 intron, intervening between the first and the last two nucleotides of the glycine encoding codon 182. The two non-canonical transcripts lead to translational frameshifting, generating PTCs respectively 2 and 9 codons downstream from the c.del486-547 and c.del370-547 deleted segments. Therefore, these two alternatively spliced transcripts are most probably targeted to NMD (
13,
16,
17) and, when overexpressed, might be the cause of reduced αGal enzyme activity. Although the distributions of the DBS αGal activity levels in stroke patients presenting the c.del486-547
GLA transcript or exclusively with the canonical transcript were not statistically different, it should be borne in mind that the DBS αGal assay correlates better with the plasma than with the leukocyte αGal assay (
21) and, therefore, may not be the most reliable gauge of the
GLA mRNA expression in leukocytes.
Studies in healthy subjects, carried out to determine whether the non-canonical GLA transcripts were physiologically expressed and whether the holding time of the blood samples before RNA extraction might affect their expression, indicated that, at least in leukocytes, the c.del486-547 transcript is constitutively expressed at trace levels, which are only detectable with specific RT-PCR amplification, but become visible on agarose gels when the RNA is extracted from leukocytes that were left standing in the whole blood sample, at room temperature, for 48 hours. These findings suggest that, under those circumstances, the genetic metabolic environment in leukocytes favors the accumulation of the non-canonical c.del486-547 transcript, either by increased production or decreased degradation.
According to the in silico predictions, the donor splice sites of exons 2 and 3 are relatively weaker in comparison to the acceptor splice sites of
GLA exons 3 and 4. Although the first two nucleotides that are excluded in the alternatively spliced sequence of exon 3 are guanines, therefore not corresponding to a canonical donor splicing consensus motif, the terminal sequence of exon 3 remaining in the c.del486-547 transcript contains a putative high-score ESE binding site for Serine/Arginine-rich protein 40 (SRp40), which is one of the key proteins involved in the recruitment of spliceosomal components (
12). These conditions would favor the alternative recognition of a non-canonical donor splice site within exon 3 that might explain the generation of the
GLA c.del486-547 transcript.
The overexpression of a cryptic exon in intron 4, leading to extremely unbalanced ratios of the c.639ins + 57 over the wildtype
GLA transcript, has already been identified as the underlying cause of deficient αGal activity in (i) patients with the cardiac variant of FD associated with carrying the mid-intronic g.9331G > A/c.640-801G > A SNP (rs199473684) [http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi? rs = 199473684] originally reported as c.639 + 919G > A (
4) mapping 4 nucleotides before the 3’ end of the cryptic exon, who exhibited REA at ≈10% of the normal level (
4); and (ii) in a patient with the classic phenotype of FD and who exhibited REA at ≈1% of the normal level (
8), associated with the novel mid-intronic variant g.9273C > T/c.640-859C > T originally reported as c.861-5C > T (
8) mapping 5 nucleotides upstream from the beginning of the cryptic exon.
The hypothesis that the non-canonical c.639ins + 57
GLA transcript is overexpressed in patients carrying the g.9331G > A SNP due to increased recognition of the alternative splicing by an A/C-rich enhancer-type ESE (
4), generated by the G > A transition, has been disputed (
22) since that SNP occurs at a non-consensus site of the cryptic donor splice site and there are no functional ESE motifs (
23) within the surrounding intronic region. Bioinformatic analyses of the g.9273C > T variant indicated that this transition does not significantly change the predicted acceptor site score, in comparison to the wildtype sequence, and that the highly predominant expression of the c.639ins + 57 transcript might be explained by the creation of a novel ESE (
8).
The alternatively spliced c.639ins + 57 transcript was not identified in any of our study patients, suggesting that the c.640-859C > T and the c.640-801G > A (rs199473684) intronic variants are uncommon in the Portuguese population.
As a conclusion, we hypothesise that the production of alternatively spliced
GLA transcripts containing PTCs might be physiologically involved in the post-transcriptional regulation of
GLA gene expression, and that its dysregulated overexpression, especially if limited to specific cells or tissues, might be the cause of Gb3 storage in affected tissues with no pathogenic mutation identified in the
GLA gene, like the recently described Gb3-associated cardiomyopathy (
24). Elucidation of the mechanism underlying the production of these abnormal
GLA transcripts, and of their biological consequences, warrants further investigation as they may contribute important new data to the understanding of the molecular pathology of FD and Gb3-related disorders.