1. Background
2. Objectives
3. Methods
3.1. Identification of Consistent Mutations at HepG2 Transcription Factors Binding Sites
3.2. Significance of Reported Non-coding Mutations
3.3. Association of Genes with Non-coding Mutations
3.4. Mapping Non-coding Mutations to CTCF Binding Sites
3.5. Graphical Analysis of Significant Non-coding Mutations and Clusters
3.6. Analysis of Ras/MAPK Signaling Pathway
4. Results
4.1. Consistent Mutations at HepG2 Transcription Factor Binding Sites
| Genomic Location | Consistency | No. of TF Binding with Actual Size | No. of TF Binding with Precise Size | Names of TF Binding with Precise Size | Bracketing Gene in Vista Enhancer Browser |
|---|---|---|---|---|---|
| 5:1295113-1295113 | 410 | 4 | 1 | GABP | - |
| 22:40856967-40856967 | 15 | 13 | 3 | CJUN, ELF1, MAX | - |
| 5:1295046-1295046 | 11 | 6 | 5 | GABP, MAX, MXI1, POL2, SIN3AK20 | - |
| 4:24232389-24232389 | 10 | 19 | 4 | CEBPD, HDAC2, MAZ , SRF | - |
| 21:34544112-34544112 | 4 | 6 | 5 | MXI1, NFIC, P300, RAD21, SMC3 | RCAN1 |
| 16:58495226-58495226 | 2 | 11 | 7 | COREST, CTCF, HDAC2, MAFF, MAFK, RAD21, RFX5 | NDRG4 |
| 15:70099538-70099538 | 2 | 11 | 6 | ELF1, POL2, SIN3AK20, TAF1, TBP, YY1 | MIR629-UACA |
| 10:120851335-120851335 | 1 | 30 | 17 | BHLHE40, BRCA1, ELF1, FOSL2, FOXA1, FOXA2, GABP, HDAC2, MXI1, NFIC, RAD21, RFX5, RXRA, SIN3AK20,TAF1, TRF4, YY1 | - |
Abbreviation: TF, transcription factor.
a The column ‘Bracketing Gene in Enhancer Vista Browser’ provides names of genes showing enhancer activity where identified non-coding mutations were presenta.
4.2. Significance of Non-coding Mutations
| Genomic Locations | Consistency | No. of TF Binding with Precise Size | Scores (Out of 10) | P-Value (< 0.05) |
|---|---|---|---|---|
| 22:40856967-40856967 | 15 | 3 | 5.385 | 0.00175 |
| 20:17859269-17859269 | 1 | 39 | 5.333 | 0.5 |
| 6:157323527-157323527 | 3 | 32 | 5.102 | 0.00275 |
| 12:20815732-20815732 | 14 | 1 | 4.795 | 0.00695 |
| 20:49768490-49768490 | 1 | 34 | 4.692 | 0.5 |
| 18:58452573-58452573 | 13 | 1 | 4.461 | 0.00695 |
| 17:4278699-4278699 | 8 | 13 | 4.333 | 0.0001 |
| 2: 33013316-233013316 | 3 | 26 | 4.333 | 0.00275 |
| 14:24232389-24232389 | 10 | 4 | 3.846 | 0.001 |
| 14:39145540-39145540 | 7 | 8 | 3.358 | 0.0003 |
| 14:52873949-52873949 | 9 | 1 | 3.128 | 0.00695 |
| 17:75393912-75393912 | 7 | 4 | 2.846 | 0.00185 |
| 14:24425986-24425986 | 8 | 1 | 2.795 | 0.00695 |
| 5:72320307-72320307 | 2 | 10 | 1.948 | 0.0342 |
| 5:82351990-82351990 | 2 | 4 | 1.179 | 0.03505 |
| 8:84648494-84648494 | 1 | 1 | 0.461 | 0.5069 |
Abbreviation: TF, transcription factor.
a The scoring formula and calculations of P-values were based on the consistency of a particular mutation and the number of transcription factors (TF) binding there with precise size (100 base-pairs).
4.3. Association of Genes with Non-coding Mutations
4.4. Mapping Non-coding Mutations to CTCF Binding Sites
4.5. Graphical Analysis of Significant Non-coding Mutations and Clusters
Graphical profiles of significant mutations (A) Represents mutation 20:17859269-17859269, (B) Represents mutation 12:20815732-20815732. The red bars and blue bars in clinical variants represent copy number and gain. The green bar in ‘ClinVar Short Variant represents a benign clinical variant.
Graphical profiles of significant clusters (A) Represents cluster 11:62841559-62841872, (B) Represents cluster 1:152018685-152018775. The Gencode v29 track displays basic genes present close to the given cluster. The Conservation tracks ‘Cons 100 Verts’ track and ‘Multiz Alignment of 100 vertebrates’ display regions that are conserved in multiple species in condensed form.
4.6. Analysis of Ras/MAPK Signaling Pathway
5. Discussion
| Genes | Number of Non-coding Mutations Closer to Genes | Non-coding Mutations Present in the Upstream Region of Genes | Non-coding Mutations Present Within the Coding Region of Genes | Non-coding Mutations Present in Downstream Regions of Genes | The Closest Distance from Transcription Start Site (TSS) |
|---|---|---|---|---|---|
| ALB | 75 | 12 | 63 | 0 | 428 (up) |
| EYS | 43 | 0 | 43 | 0 | 0 |
| MLLT10P1 | 42 | 26 | 3 | 13 | 60 (down) |
| ZFHX3 | 38 | 0 | 38 | 0 | 0 |
| CNTNAP2 | 37 | 1 | 35 | 1 | 1220 (up) |
| LINC00511 | 36 | 0 | 32 | 4 | 5075 (down) |
| NPAS3 | 36 | 2 | 33 | 1 | 10540 (down) |
| WWOX | 35 | 0 | 33 | 2 | 276 (down) |
| PTPRN2 | 34 | 0 | 34 | 0 | 0 |
| LINC01410 | 32 | 0 | 24 | 8 | 673 (down) |
| LSAMP | 32 | 4 | 26 | 2 | 4801 (up) |
| PLCB1 | 32 | 0 | 31 | 1 | 3472 (down) |
| SYN3 | 27 | 4 | 23 | 0 | 363 (up) |
a The terms ‘up’ and ‘down’ represent upstream and downstream regions of genes. The closest distance from transcription start site (TSS) was written as 0 for mutations present within coding regions of genes.
| Cluster No. | Cluster Size | No. of Mutations in a Cluster | Highest Score in Cluster | Closer Genes | Closest Distance from TSS | CTCF Binding Between Gene TSS and Mutation |
|---|---|---|---|---|---|---|
| 48111 | 9:62802442-62802699 | 17 | 1.987 | LINC01410 | 0 (within) | 12 No |
| 17609 | 17:8173337-8173599 | 15 | 3.282 | TMEM107, SNORD118 | 0 (within), 11 (upstream) | 15 No, 15 No |
| 7433 | 11:62841559-62841872 | 11 | 5.333 | WDR74, RNU2-2P | 50 (upstream), 27 (downstream) | 9 No, 9 No |
| 2565 | 1:152018685-152018775 | 7 | 1.589 | AL450992.1, NBPF18P | 0 (within), 0 (within) | 7 Yes, 7 Yes |
| 29170 | 20: 53941417-53941434 | 6 | 2.589 | BCAS1, AC005220.1 | 0 (within), 0 (within) | 6 No, 6 No |
| 32451 | 3:113051365-113051399 | 6 | 1.307 | AC078785.1, AC078785.2 | 0 (within), 0 (within) | 6 Yes, 6 Yes |
Abbreviation: TSS, transcription start site.
a ‘Yes’ is written when CTCF binds between mutation and TSS gene and vice versa.



