The human genome contains over 4.5 million sequences of DNA called “transposable elements,” virus-like entities that “jump” around and help regulate gene expression.
They do this by binding transcription factors, which are proteins that regulate the rate of transcription of DNA to RNA, influencing gene expression in a broad range of biological events.
Now, an international team of scientists led by Didier Trono at EPFL has discovered that transposable elements play a significant role in influencing the development of the human brain. The study is published in Science Advances.
The scientists found that transposable elements regulate the brain’s development by partnering up with two specialized proteins from the family of proteins known as “Krüppel-associated box-containing zinc finger proteins, or KZFPs.
In 2019, another study led by Trono showed that KZFPs tamed the regulatory activity of transposable elements in the first few days of the fetus’s life.
However, they suspected that these regulatory sequences were subsequently re-ignited to orchestrate the development and function of adult organs.
The researchers identified two KZFPs as specific only to primates, and found that they are expressed in specific regions of the human developing and adult brain.
They further observed that these proteins kept controlling the activity of transposable elements – at least in neurons and brain organoids cultured in the lab.
As a result, these two KZFPs influenced the differentiation and neurotransmission profile of neurons, as well as guarded these cells against inflammatory responses that were otherwise triggered if their target transposable elements were left to be expressed.
“These results reveal how two proteins that appeared only recently in evolution have contributed to shape the human brain by facilitating the co-option of transposable elements, these virus-like entities that have been remodeling our ancestral genome since the dawn of times,” says Didier Trono.
“Our findings also suggest possible pathogenic mechanisms for diseases such as amyotrophic lateral sclerosis or other neurodegenerative or neurodevelopmental disorders, providing leads for the prevention or treatment of these problems.”
Transposable Elements (TEs) Account for Genome Evolution and Inter-Individual Genetic Variability
Two thirds of the human genome are composed of repetitive elements (66%), among which transposable elements (TEs) alone account for the 40–45% of human genome composition [1,2].
One fascinating question for genome biologists is to untangle the functions of this “dark side” of the genome, that still represents “alive matter” which evolution can influence to generate novel functions. It is clear nowadays that TEs capability of regulating the genome resides mainly in generating a sophisticated plethora of RNA regulatory networks, which in turn influence the transcriptional output of the cell [3,4,5].
TEs are organized into four different classes and, with the exception of DNA transposons, are mainly retrotransposons, which have acquired the ability by using RNA as intermediate to move via a ‘copy and paste’ mechanism. Retrotransposons include long interspersed elements (LINEs), short interspersed elements (SINEs), and long terminal repeat (LTR) retrotransposons.
They are further classified as autonomous or non-autonomous depending on whether they have open reading frames (ORFs) that encode for the machinery required for the retrotransposition .
LINE is a class of transposon that is very ancient and evolutionary successful. Three LINE superfamilies are found in the human genome, namely LINE1, LINE2 and LINE3, of which only LINE1 is still active. Full-length LINE1 (L1) elements are approximately 6 kb long and constitute an autonomous component of the genome.
A LINE1 element has an internal polymerase II promoter and encodes for two open reading frames, ORF1 and ORF2 (Figure 1) . Once the L1 RNA is transcribed, it is exported to the cytoplasm for translation, and subsequently assembled with the chaperone RNA- binding proteins ORF1 and the endonuclease and reverse transcriptase ORF2.
These ribonucleoparticles are then reimported into the nucleus, where ORF2 makes a single-stranded nick and primes reverse transcription from the 3′ end of the L1 RNA. Reverse transcription frequently results in many truncated, nonfunctional insertions, and for this reason, most of the LINE-derived repeats are short, with an average size around 900–1000 bp.
The L1s are estimated to be present in more than 500,000 copies in the human genome .
The L1 machinery is also responsible for the retrotransposition of the SINEs (which can be classified into three superfamilies: Alu, MIR, MIR3), non-autonomous retroelements without any coding potential, short in length (around 300 bp) and transcribed from polymerase III promoter (Figure 1). The most represented human specific SINE superfamily, the Alu, is represented in 1,090,000 copies in the human genome .
The LTR retrotransposons are initiated and terminated by long terminal direct repeats embedded by transcriptional regulatory elements. The autonomous LTR retrotransposons contain gag and pol genes, which encode a reverse transcriptase, integrase, protease and RNAse H (Figure 1). Four superfamilies of LTR exist: ERV- class I, ERV(K) class II, ERV(L) class III, and MalR. MalR is the most represented superfamily of LTR, present in 240,000 copies .
Evolutionary biologists hypothesize that self-replicating RNA genomes were the basis of early life on earth, and that the advent of reverse transcription had a pivotal function in the evolution of the first DNA genomes, the more stable deoxyribose-based polymers [6,10]. From this perspective, multiple rounds of reverse transcription could have helped to expand both the size and complexity of the human genome.
It is particularly evident in both mammals and plants that retrotransposons have massively accumulated, driving genome evolution. It is reported that L1 and Alu represent the most prominent catalysts of the human genome evolution  and that homologous recombination between TEs could have driven/drives mutations, chromosome rearrangement, deletions, inversions and translocations .
TEs are a major source of somatic genomic diversity and interindividual variability  and TE insertions have been documented as physiological occurrences [14,15,16]. In particular L1 retrotransposition has been extensively described as taking place in neurons, from fly to man [17,18,19], a mechanism that is fine-tuned and epigenetically regulated in neural progenitor development and differentiation, contributing to the somatic diversification of neurons in the brain [13,20].
The deregulation of TEs activity is nowadays emerging as an important contributor to many different diseases, as it occurs in neurological and inflammatory diseases and cancers [21,22,23].
The hosts have developed many systems to control TEs expression and expansion  (thus, epigenetic modification and noncoding RNAs (ncRNA) such Piwi interacting-RNAs) to contain the possible detrimental effects of their retrotransposition. This expansion has achieved a balance between detrimental and beneficial effects, possibly becoming a novel regulatory mechanism to promote genomic functions acquired through evolution .
It is nowadays accepted, both in mouse and in human, that TEs have been co-opted into multiple regulatory functions for the accommodation of the host genomes metabolisms and transcription, mediated both by their DNA elements and by their transcribed RNAs counterparts.
Not Just Transposition: TEs RNAs Are a Prolific Source for Novel Regulatory Functions
TEs were first discovered in maize by Barbara McClintock almost 80 years ago. She suggested these elements as “controlling elements” able to regulate the genes activity [25,26].
Her theories, even if dismissed for a long time, were pioneering and with the advent of next generation sequencing (NGS) technologies have been thoroughly revised. Currently emerging is the concept that TEs interact with the transcriptional regulatory functions of the hosts genomes [3,4,27,28].
Although a massive portion of the literature has been centered on the study of the retrotransposition and the effects of the de novo insertions, it is worth noting that TEs can have RNA regulatory functions decoupled from their retrotransposition.
International decade-long projects such as ENCODE (Encyclopedia of DNA Elements) and FANTOM (Functional Annotation of the Mammalian Genome) have produced and bioinformatically analyzed a vast number of datasets opening the way for studying TEs.
These results revealed that TEs have precise functions in establishing and influencing the cell type specific transcriptional programs, creating regulatory networks that are fostered both by their genomic elements and the derived transcripts [3,28], revealing that the RNAs transcribed from this elements could have a myriad of functions, definitely changing the way in which many genomic concepts were written in textbooks .
These studies clarified that TEs can create novel or alternative promoters , promote the assembly of transcription factors  and epigenetic modifiers and favor their spreading  and the regulation of gene expression. Further, TEs in particular SINEs and HERVs, have been demonstrated to have functions in 3D genome folding, as the binding sites for chromatin organizers [33,34,35].
In the 2009 Faulkner et al. , demonstrated for the first time that TEs are widely expressed in human and mouse cell types with tissue-specific patterns of expression, suggesting a specific spatiotemporal activation of retrotransposons. Faulkner et al. further demonstrated that up to the 30% of the transcripts initiate within repetitive elements .
It is interesting to notice that tissues of embryonic origin contain the highest proportion of transposable element-derived sequences in their transcriptomes, with specific expression of LTR in placenta and oocytes . In accordance, it was recently found that different classes of repeats are specifically enriched in genes with a definite spatiotemporal expression, further dictating their timing and magnitude of expression in development .
Within this scenario, TEs magnify the transcriptome complexity in different ways: generating antisense transcripts, usually in proximity to gene promoters , acting on the maturation of mRNAs via nursing alternative splicing sites for tissue specific exonization [39,40], and providing alternative polyadenylation signals [41,42] and sites for the RNA-mediated decoy .
Furthermore, TEs contribute to RNA regulatory sequences within introns and untranslated regions (UTRs) . It is important to notice that TEs are major contributors to long noncoding RNAs (lncRNAs) [44,45]. In this scenario, an enhancer RNAs function was proposed for LTR derived transcripts, as required for pluripotency maintenance in mouse and human embryonic stem (ES) cells [46,47].
Further, it has been demonstrated that LINEs and SINEs are expressed as RNAs tightly associated to the chromatin compartment, where they localized at euchromatin, suggesting a possible function of these RNAs in 3D genome folding . L1s have been described also as chromatin associated RNAs both in embryogenesis, regulating open chromatin accessibility [49,50], and in mouse ES cells, where they are involved in the regulation of genes required for cell identity maintenance and two-cell stage differentiation .
Although these seminal papers have increased the awareness and knowledge of the functions of TEs, highlighting important epigenetic roles for transposons in embryogenesis and development, the contribution of TEs to adult cell plasticity and diseases occurrence and progression is still poorly investigated.
This is a result of the intrinsic difficulties in studying TEs, which due to their repetitive nature, high degree of homology, sequence divergence, and degeneration, render almost unfeasible the application of the technologies established for biallelic genes, in particular in bioinformatic.
Here, we will revise the TEs mediated multi-faced functions in promoting the establishment of a sophisticated plethora of RNA regulatory networks, which in turn influence the transcriptional plasticity of the cells. We will show how TEs transcriptional deregulation in pathological context is instead instrumental to fuel diseases.
In particular we will review how TEs RNA can become a key player in the regulation of the immune response, using cell intrinsic specific pathways to directly control the regulation of interferon production and the activation of the immune cells; the alteration of these phenomena occurs in autoimmune and inflammatory diseases.
Similarly, transcriptional deregulation of TEs represents a hallmark of cells that have lost identity, such as in cancer cells, where TEs onco-co-optation represents an important way to evolve cancer specific functions to promote tumor fitness and survival. Many of these findings have been achieved through the employment of NGS technologies with the application of bioinformatic pipelines that are in continuous evolution.
Within this frame, an unambiguous TE identification and expression quantification of TEs at the genomic instance level would allow the precise and systematic definition of their contribution to RNA regulatory networks. We will review advances in the field and the challenges that should be addressed in this direction.
reference link : https://www.mdpi.com/1422-0067/21/9/3201/htm
More information: Priscilla Turelli et al. Primate-restricted KRAB zinc finger proteins and target retrotransposons control gene expression in human neurons, Science Advances (2020). DOI: 10.1126/sciadv.aba3200