An Extensive Collection of Quotes Indicating Functionality in Non-Protein-Coding DNA
Compiled and introduced by Richard Peachey
This statement from a well-known evolutionist and textbook author was published in early 1994:
“In fact, the human genome is littered with pseudogenes, gene fragments, ‘orphaned’ genes, ‘junk’ DNA, and so many repeated copies of pointless DNA sequences that it cannot be attributed to anything that resembles intelligent design.” (Kenneth R. Miller, “Life’s Grand Design.” Technology Review 97(2):24-32, Feb/Mar 1994). <http://www.millerandlevine.com/km/evol/lgd/index.html> (scroll to second last section, “The Story in DNA”)
The excerpts given below are from articles ranging from 1994 to the present. They are displayed here in support of the contention that so-called “junk DNA,” now known to have many functions, should no longer be used as a dysteleological argument against creationists. That is to say, so-called “junk DNA” (like “vestigial organs,” which formerly were claimed to be functionless), is not evidence that the Designer of life was incompetent.
(Also evident in several of these quotes is the reality that evolutionary thinking has retarded the progress of science.)
NOTE: The use of bold print within quotations indicates emphasis added.
DISCLAIMER: CSABC does not endorse the concept of macroevolution or the timeframe of millions of years found in some of the excerpts below.
“Mining Treasures From ‘Junk DNA’ ” (Rachel Nowak, Science, Vol. 263 [Feb 4, 1994], pp. 608-610)
“The protein-coding portions of the genes account for only about 3% of the DNA in the human genome; the other 97% encodes no proteins. Most of this enormous, silent genetic majority has long been thought to have no real function—hence its name: ‘junk DNA.’ But one researcher’s trash is another researcher’s treasure, and a growing number of scientists believe that hidden in the junk DNA are intellectual riches that will lead to a better understanding of diseases (possibly including cancer), normal genome repair and regulation, and perhaps even the evolution of multicellular organisms.
“Rather than genes, junk DNA ‘is actually the challenge right now,’ says Eric Lander of the Massachusetts Institute of Technology, who is himself a prominent Human Genome Project researcher. And in rising to meet that challenge, geneticists are beginning to formulate a new view of the genome. Rather than being considered a catalogue of useful genes interspersed with useless junk, each chromosome is beginning to be viewed as a complex ‘information organelle,’ replete with sophisticated maintenance and control systems—some embedded in what was thought to be mere waste.” (p. 608)
“Some of the earliest indications that junk DNA might have important functions came from studies on gene control. Those studies found that genes have regulatory sequences, short segments of DNA that serve as targets for the ‘transcription factors’ that activate genes. Many of the regulatory sequences lie outside the protein-coding sequences—in the genetic garbage can. ‘There’s at least five regulatory elements for each [human] gene, probably many more,’ says gene control expert Robert Tijan of the University of California, Berkeley. ‘For a long time it wasn’t appreciated how widespread those elements can be, but now it seems that patches of really important regulatory elements can be buried among the junk DNA.’
“These key regulatory elements can even occur in what many geneticists have considered the ultimate in genetic detritus: the repetitive sequences scattered throughout the genomes of higher organisms. These genetic stutters have come to epitomize junk because their structures are simple to the point of absurdity, sometimes including only two or three nucleotides repeated thousands of times. In addition, the lengths and compositions of these repetitions often vary wildly between species, between organisms of the same species, even between cells of the same organism. To the average geneticist, such caprice stands in sharp contrast to the structure of crucial genes, which are known to be highly conserved in the course of evolution—precisely because their functions are so crucial. Stretches of DNA that vary so wildly, it was thought, surely cannot have an important function.
“Now, however, it appears some repetitive sequences may contain stretches of DNA needed for gene regulation. What is more, the function of these stretches must be significant, because if their sequences go astray they may result in cancer. Last August, a team led by Theodore Krontiris of Tufts University School of Medicine in Boston confirmed hints that a mutation in a highly repetitive sequence called a minisatellite may contribute to as many as 10% of all cases of breast, colorectal, and bladder cancer, and acute leukemia.” (p. 608)
“Some repetitive sequences also seem to have a crucial function in maintaining the structure of the genome. Clustered at the centers and tips (or telomeres) of each chromosome is satellite DNA, similar to minisatellite DNA, but generally occurring in longer stretches. . . . The telomeric repetitive DNA apparently protects chromosomes by binding to proteins that stop the ends of the chromosome from ‘fraying’ and also by helping to repair damaged tips.” (pp. 608f.)
“Introns: Most genes of higher organisms are interrupted by sequences called introns that don’t code for proteins. Introns may play other vital roles, however. For instance, a slew of so-called small nucleolar RNAs (snoRNAs) are encoded by introns. Because snoRNAs accumulate in the nucleolus, where the protein-making ribosomes are formed, researchers speculate that they play a role in ribosome assembly.” (p. 609)
“Satellites: These short DNA sequences are repeated hundreds or thousands of times at a stretch, mainly at the ends and centers of chromosomes. Although they look like the quintessence of waste matter, in fact, the survival of the chromosome may depend on satellite DNA.” (p. 609)
“3′ Untranslated regions (3’UTRs): The final protein coding portions of genes are followed by DNA that is transcribed into RNA but is not translated into protein. The 3’UTRs won new respect when researchers discovered that they contain sequences that regulate gene activity.” (p. 609)
“Enough gems have already been uncovered in the genetic midden to show that what was once thought to be waste is definitely being transmuted into scientific gold.” (p. 610)
“Internet resources for the functional analysis of the 5′ and 3′ untranslated regions of eukaryotic mRNAs” (Graziano PesoIe and Sabino Liuni, Trends in Genetics, Vol. 15 No. 9 [Sep 1999], p. 378)
“Despite their long having been considered mostly ‘junk’ DNA, non-coding regions of eukaryotic genomes play crucial roles in the regulation of gene expression. In particular, 5′ and 3′ untranslated regions (UTR) of eukaryotic mRNAs are involved in many posttranscriptional regulatory pathways that control mRNA localization, stability and translation efficiency. In order to approach a systematic study of the structural and functional features of UTR sequences, we developed UTRdb [a specialized database of 5′ and 3′ untranslated regions of eukaryotic mRNAs]. . . . The various functional elements annotated in UTRdb entries are, in turn, collected in the UTRsite database, which is continually enriched with new functional patterns as they become described in the literature. For each functional pattern, corresponding to a UTRsite entry, a short description of its biological activity is reported with the relevant bibliography. . . .” (p.378)
“Silence of the Xs — Does junk DNA help women muffle one X chromosome?” (John Travis, Science News, Vol. 158 No. 6 [Aug 5, 2000], pp. 92-94)
It is a “remarkable fact that all female mammals are actually mosaics of cells with two different pedigrees. Early in embryonic life, when they’re merely balls of cells, female mammals silence one of their two X chromosomes within each cell. Each of the cells randomly decides which X—the one inherited from the mother or the one from the father—it will inactivate. As these embryonic cells replicate, their descendants in the adult animal retain the chromosomal choice that the original cells made.” The silenced X chromosomes appear as dark compacted “Barr bodies” in some cells. (p. 92)
But not all genes on the suppressed chromosome get silenced: “In one recent study, for example, biologists found that many more X chromosome genes than expected escape inactivation.” Most of the “escapees” are on the chromosome’s short arm. (p. 92)
“Supporting a theory proposed 2 years ago [by Mary F. Lyon, who in 1960 originated the idea of X inactivation, with the shutoff chromosome becoming the Barr body], a research team has uncovered evidence that DNA sequences usually dismissed as junk DNA without any function actually help determine what genes on the X chromosome become suppressed.” (p. 92)
The Xist gene [X chromosome-inactivated specific transcript] was identified in the early 1990s as a gene whose RNA remained untranslated. “These bits of RNA have a thing for X chromosomes. At the appropriate time in development, they appear to coat the entire length of one of the X chromosomes, which then condenses into a Barr body.” (p. 93)
The X chromosome is enriched in DNA sequences called LINE-1 elements or L1s. Such sequences have been considered as just self-propagating, non-functional “junk DNA,” but now it appears that, as suggested by Mary Lyon in 1998, they may facilitate gene suppression. (p. 93)
“More than 26 percent of a person’s X chromosome consists of L1s, about twice the proportion for other chromosomes,” according to a recent study by Evan Eichler of Case Western Reserve University Medical Center et al. Eichler’s team found that regions subject to inactivation are especially enriched in L1 elements. (p. 94)
“All the scientists studying X inactivation stress that there must be more to how Xist RNA shuts down a chromosome than the mere presence or absence of L1s. They continue to look, for example, for proteins that bind to that RNA.
“Still, Eichler’s latest work has given him a greater appreciation for the possibility that DNA sequences such as L1 elements have some role in human biology. ‘We as scientists have this preconceived notion, based on very little data, that this selfish DNA is nothing more than junk. I think junk is a very unfortunate term. It’s more a reflection of our ignorance,’ he says.” (p.94)
“Objection #2: Why Sequence The Junk?” (Gretchen Vogel, Science, Vol. 291 [Feb 16, 2001], p. 1184)
“Genes and their corresponding proteins get most of the attention, but they make up only a tiny fraction—1.5% or less—of the human genome. The other 98% of DNA sequence that does not code directly for proteins was once dismissed as ‘junk DNA,’ and numerous researchers argued that it would be a waste of time and money to include the repetitive, hard-to-sequence regions in the [Human] genome project. But scientists have discovered many riches hidden in the junk, and as the project nears completion, several researchers predict that some of the most intriguing discoveries may come from areas once written off as genetic wastelands.
“Included among the noncoding DNA, for example, are the crucial promoter sequences, which control when a gene is turned on or off. The repetitive sequences at the ends of chromosomes, called telomeres, prevent the ends of the chromosome from fraying during cell division and help determine a cell’s life-span. And several teams have begun to make a strong case that repetitive, noncoding sequences play a crucial role in X inactivation, the process by which one of the two X chromosomes in a female is turned off early in development. Other genes are turning up in areas previously dismissed as barren. Scientists had assumed, for example, that the regions next to telomeres were buffer zones with few important sequences. But in this week’s issue of Nature, H. C. Weithman of the Wistar Institute in Philadelphia and his colleagues report that these regions contain hundreds of genes. ‘The term “junk DNA” is a reflection of our ignorance,’ says Evan Eichler of Case Western Reserve University in Cleveland.” (p. 1184) <http://www.sciencemag.org/content/291/5507/1184.full>
“Biological Dark Matter: Newfound RNA suggests a hidden complexity inside cells” (John Travis, Science News, Vol. 161 No. 2 [Jan 12, 2002], pp. 24-25)
In the early 1990s, Victor Ambros and colleagues, working with the nematode [roundworm] Caenorhabditis elegans, discovered a small RNA molecule that doesn’t encode a protein but turns off developmental genes. Some years later, Gary Ruvkun’s team found a gene encoding another regulatory RNA in C. elegans — a gene which (his team discovered) is also shared by many other animals. (p. 24)
“Inspired by such research, biologists have now begun to systematically look for so-called RNA genes: DNA whose final product is RNA instead of protein. Several groups, including one led by [Sean] Eddy, recently surveyed the DNA of the bacterium Escherichia coli and uncovered dozens of such genes. Just a few months ago, Ambros’ team and two other research groups reported that worms, flies, and people contain dozens of previously undetected genes that spawn RNA instead of protein.” (p. 24)
“In the Oct. 26, 2001 Science, Ruvkun speculates that ‘the number of genes in the tiny RNA world may turn out to be very large, numbering in the hundreds or even thousands in each genome. Tiny RNA genes may be the biological equivalent of dark matter—all around us but almost escaping detection.’” (p. 24)
“To unearth much smaller RNAs, such as the 22-nucleotide C. elegans strand that Ambros initially identified, biologists have had to develop new search methods. To pick out traditional genes, scientists had developed computer programs that scan DNA sequences for distinctive protein-coding sequences. Those programs, however, are ineffective at finding genes for RNAs.
“‘Everything is biased towards proteins,’ says Stephen R. Holbrook of Lawrence Berkeley National Laboratory in California.” (p. 24)
“‘There’s a level of RNA regulation that we didn’t realize was there,’ [says Susan Gottesman of the National Institutes of Health in Bethesda, MD]. ‘It was just invisible.’” (p. 25)
According to calculations by geneticist John Mattick (University of Queensland, Brisbane, Australia), about 98% of eukaryotic RNA doesn’t encode proteins. “‘This will be the big story in genomics over the next few years,’ Mattick predicts. ‘You would have to be blind not to see that noncoding RNAs are a vastly underexplored world.’” (p. 25)
“Not Junk After All” (Wojciech Makalowski, Science, Vol. 300 [May 23, 2003], pp. 1246-1247)
“Early DNA association studies showed that the human genome is full of repeated segments, such as Alu elements, that are repeated hundreds of thousands of times. The vast majority of a mammalian genome does not code for proteins. So, the question is, ‘Why do we need so much DNA?’ Most researchers have assumed that repetitive DNA elements do not have any function: They are simply useless, selfish DNA sequences that proliferate in our genome, making as many copies as possible. The late Sozumu Ohno coined the term ‘junk DNA’ to describe these repetitive elements.” (p. 1246)
“Although catchy, the term ‘junk DNA’ for many years repelled mainstream researchers from studying noncoding DNA. . . . [But because a few did venture into this unpopular territory,] the view of junk DNA, especially repetitive elements, began to change in the early 1990s. Now, more and more biologists regard repetitive elements as a genomic treasure.” (p. 1246)
“It appears that transposable elements are not useless DNA. They interact with the surrounding genomic environment and increase the ability of the organism to evolve. They do this by serving as recombination hotspots, and providing a mechanism for genomic shuffling and a source of ‘ready-to-use’ motifs for new transcriptional regulatory elements, polyadenylation signals, and protein-coding sequences.” (p. 1246)
“The Unseen Genome: Gems among the Junk” (W. Wayt Gibbs, Scientific American, Vol. 289 No. 5 [Nov 2003], pp. 47-53)
[OVERVIEW:] “Geneticists have long focused on just the small part of DNA that contains blueprints for proteins. The remainder—in humans, 98 percent of the DNA—was often dismissed as junk. But the discovery of many hidden genes that work through RNA, rather than protein, has overturned that assumption.
“These RNA-only genes tend to be short and difficult to identify. But some of them play major roles in the health and development of plants and animals.
“Active forms of RNA also help to regulate a separate ‘epigenetic’ layer of heritable information that resides in the chromosomes but outside the DNA sequence.” (p. 48)
There are “vast ‘noncoding’ sequences of DNA that interrupt and separate genes. Though long ago written off as irrelevant because they yield no proteins, many of these sections have been preserved mostly intact through millions of years of evolution. That suggests they do something indispensable. And indeed a large number are transcribed into varieties of RNA that perform a much wider range of functions than biologists had imagined possible. Some scientists now suspect that much of what makes one person, and one species, different from the next are variations in the gems hidden within our ‘junk’ DNA.” (p. 48)
“In higher organisms (such as humans), genes ‘are fragmented into chunks of protein-coding sequences separated by often extensive tracts of nonprotein-coding sequences,’ [explains John S. Mattick, director of the Institute for Molecular Bioscience at the University of Queensland in Brisbane, Australia]. In fact, protein-coding chunks account for less than 2 percent of the DNA in human chromosomes. Three billion or so pairs of bases that we all carry in nearly every cell are there for some other reason. Yet the introns within genes and the long stretches of intergenic DNA between genes, Mattick says, ‘were immediately assumed to be evolutionary junk.’
“That assumption was too hasty. ‘Increasingly we are realizing that there is a large collection of “genes” that are clearly functional even though they do not code for any protein’ but produce only RNA, [remarks Michel Georges, a geneticist at the University of Liège in Belgium].” (p. 49)
“A team of scientists at the National Human Genome Research Institute (NHGRI) recently compared excerpts from the genomes of humans, cows, dogs, pigs, rats and seven other species. Their computer analysis turned up 1,194 segments that appear with only minor changes in several species, a strong indication that the sequences contribute to the species’ evolutionary fitness. To the researchers’ surprise, only 244 of the segments sit inside a protein-coding stretch of DNA. About two thirds of the conserved sequences lie in introns, and the rest are scattered among the intergenic ‘junk’ DNA.
“‘I think this will come to be a classic story of orthodoxy derailing objective analysis of the facts, in this case for a quarter of a century,’ Mattick says. ‘The failure to recognize the full implications of this—particularly the possibility that the intervening noncoding sequences may be transmitting parallel information in the form of RNA molecules—may well go down as one of the biggest mistakes in the history of molecular biology.’” (pp. 49f.)
“For decades, pseudogenes have been written off as molecular fossils, the remains of genes that were broken by mutation and abandoned by evolution. But this past May a group of Japanese geneticists led by Shinji Hirotsune of the Saitama Medical School reported their discovery of the first functional pseudogene.
“Hirotsune was genetically engineering mice to carry a fruit fly gene called sex-lethal. Most mice did fine with this foreign gene, but in one strain sex-lethal lived up to its name; all the mice died in infancy. Looking closer, the scientists discovered that in those mice sex-lethal happened to get inserted right into the middle of a pseudogene, clobbering it. This pseudogene (named makorin1-p1) is a greatly shortened copy of makorin1, an ancient gene that mice share with fruit flies, worms and many other species. Although researchers don’t know what makorin1 does, they do know that mice have lots of makorin1 pseudogenes and that none of them can make proteins. But if pseudogenes do nothing, why were these mice dying when they lost one?
“For some reason, makorin1—and apparently only makorin1—all but shuts down when its pseudogene p1 is knocked out. RNA made from the pseudogene, in other words, controls the expression of the ‘real’ gene whose sequence it mimics, even though the two lie on different chromosomes. There is nothing pseudo about that.” (p. 50)
“As biologists sift more and more novel kinds of active RNA genes out of the long-neglected introns and intergenic stretches of DNA, they are realizing that science is still far from having a complete parts list for humans or any other higher species. Unlike protein-making genes, which have standard ‘start’ and ‘stop’ codes, RNA-only genes vary so much that computer programs cannot reliably pick them out of DNA sequences. To spur the technology on, the NHGRI is launching this autumn an ambitious $36-million project to produce an ‘Encyclopedia of DNA Elements.’ The goal is to catalogue every kind of RNA and protein made from a select 1 percent of the human genome—in three years.
“No one knows yet just what the big picture of genetics will look like once this hidden layer of information is made visible. ‘Indeed, what was damned as junk because it was not understood may, in fact, turn out to be the very basis of human complexity,’ Mattick suggests.” (p.53)
“Heirlooms in the Attic” (Mark Johnston and Gary D. Stormo, Science, Vol. 302 [Nov 7, 2003], pp. 997, 999)
“Early in the Human Genome Project, people argued about what to sequence. Some advocated determining just the sequence of the protein-coding regions, because the vast majority of the genome is ‘junk’ DNA. This would, they argued, be cost effective because most of the important information is in protein-coding DNA. Given what we’ve learned about the jewels in the genome’s attic, aren’t we glad they sequenced it all?” (p. 999)
“Evolutionary Discrimination of Mammalian Conserved Non-Genic Sequences (CNGs)” (Emmanouil T. Dermitzakis, Alexandre Reymond, Nathalie Scamuffa, Catherine Ucla, Ewen Kirkness, Colette Rossier, Stylianos E. Antonarakis, Science, Vol. 302 [Nov 7, 2003], pp. 1033-1035)
[ABSTRACT:] “Analysis of the human and mouse genomes identified an abundance of conserved non-genic sequences (CNGs). . . . We have quantified levels and patterns of conservation of 191 CNGs of human chromosome 21 in 14 mammalian species [including green monkey, ring-tailed lemur, brush-tailed porcupine, rabbit, pig, cat, greater mouse-eared bat, white-toothed shrew, nine-banded armadillo, African elephant, tammar wallaby, and platypus]. We found that CNGs are significantly more conserved than protein-coding genes and noncoding RNAs (ncRNAs) within the mammalian class from primates to monotremes to marsupials. The pattern of substitutions in CNGs differed from that seen in protein-coding and ncRNA genes and resembled that of protein-binding regions. About 0.3% to 1% of the human genome corresponds to a previously unknown class of extremely constrained CNGs shared among mammals.” (p. 1033)
“We conclude that a large fraction of these CNGs, originally found conserved between human and mouse, are highly conserved in multiple mammals, strongly supporting functional importance.” (p. 1034)
“The Hidden Genetic Program of Complex Organisms” (John S. Mattick, Scientific American, Vol. 291 No.4 [Oct 2004], pp. 60-67)
[OVERVIEW:] “A perplexingly large portion of the DNA of complex organisms (eukaryotes) seems irrelevant to the production of proteins. For years, molecular biologists have assumed this extra material was evolutionary ‘junk.’ New evidence suggests, however, that this junk DNA may encode RNA molecules that perform a variety of regulatory functions. The genetic mechanisms of eukaryotes may therefore be radically different from those of simple cells (prokaryotes).” [Prokaryote DNA “consists almost entirely of genes encoding proteins, separated by flanking sequences that regulate the expression of the adjacent genes.”] (pp. 61f.)
“Although introns constitute 95 percent or more of the average protein-coding gene in humans, most molecular biologists have considered them to be evolutionary leftovers, or junk. Introns were rationalized as ancient remnants of a time before cellular life evolved, when fragments of protein-coding information crudely assembled into the first genes. Perhaps introns had survived in complex organisms because they had an incidental usefulness—for example, making it easier to reshuffle segments of proteins into useful new combinations during evolution. Similarly, biologists have assumed that the absence of introns from prokaryotes was a consequence of intense competitive pressures in the microbial environment: evolution had pruned away the introns as deadweight.
“One observation that made it easier to dismiss introns—and other seemingly useless ‘intergenic’ DNA that sat between genes—as junk was that the amount of DNA in a genome does not correlate well with the organism’s complexity. Some amphibians, for example, have more than five times as much DNA as mammals do, and astonishingly, some amoebae have 1,000 times more. For decades, researchers assumed that the underlying number of protein-coding genes in these organisms correlated much better with complexity but that the relationship was lost against the variable background clutter of introns and other junk sequences.
“But investigators have since sequenced the genomes of diverse species, and it has become abundantly clear that the correlation between numbers of conventional genes and complexity truly is poor. . . . Conversely, the relation between the amount of nonprotein-coding DNA sequences and organism complexity is more consistent.
“Put simply, the conundrum is this: less than 1.5 percent of the human genome encodes proteins, but most of it is transcribed into RNA. Either the human genome (and that of other complex organisms) is replete with useless transcription, or these nonprotein-coding RNAs fulfill some unexpected function.
“This line of argument and considerable other experimental evidence suggest that many genes in complex organisms—perhaps even the majority of genes in mammals—do not encode protein but instead give rise to RNAs with direct regulatory functions. . . . These RNAs may be transmitting a level of information that is crucial, particularly to development, and that plays a pivotal role in evolution.” (pp. 62f.)
“Hundreds of ‘microRNAs’ derived from introns and larger nonprotein-coding RNA transcripts have in fact already been identified in plants, animals and fungi. Many of them control the timing of processes that occur during development, such as stem cell maintenance, cell proliferation, and apoptosis (the so-called programmed cell death that remodels tissues).” (p. 64)
“We may have totally misunderstood the nature of the genomic programming and the basis of variations in traits among individuals and species. The rule [that organized complexity is a function of regulatory information] implies that the greater portion of the genomes in complex organisms is not junk at all—rather it is functional and subject to evolutionary selection.
“The most recent surprise is that vertebrate genomes contain thousands of noncoding sequences that have persisted virtually unaltered for many millions of years. These sequences are much more highly conserved than those coding for proteins, which was totally unexpected. The mechanism that has frozen these sequences is unknown, but their extreme constancy suggests that they are involved in complex networks essential to our biology. Thus, rather than the genomes of humans and other complex organisms being viewed as oases of protein-coding sequences in a desert of junk, they might better be seen as islands of protein-component information in a sea of regulatory information, most of which is conveyed by RNA.” (pp. 66f.)
“What was dismissed as junk because it was not understood may well turn out to hold the secrets to human complexity and a guide to the programming of complex systems in general.” (p. 67)
“Fewer Genes, More Noncoding RNA” (Jean-Michel Claverie, Science, Vol. 309 [Sep 2, 2005], pp. 1529-1530)
[ABSTRACT:] “Recent studies showing that most ‘messenger’ RNAs do not encode proteins finally explain the long-standing discrepancy between the small number of protein-coding genes found in vertebrate genomes and the much larger and ever-increasing number of polyadenylated transcripts identified by tag-sampling or microarray-based methods. Exploring the role and diversity of these numerous noncoding RNAs now constitutes a main challenge in transcription research.” (p. 1529)
“A few months before the publication of the first drafts of the human genome sequence, online bids predicting the number of human protein-coding genes ranged from 30,000 to 150,000. To the surprise of many, initial bioinformatic analyses revealed no more than 35,000 human genes, an estimate that has steadily declined to the present 25,000 genes. On the other hand, the largest estimates based on the number of distinct polyadenylated transcript 3′-ends identified through the single-pass sequencing of cDNA libraries . . . have not followed a diminishing trend. On the contrary, more transcripts keep being discovered, many of which do not correspond to annotated genes. . . . ” (p. 1529)
“A large fraction of the human (vertebrate) genome appears to give rise to polyadenylated transcripts that do not code for proteins. The notion of noncoding RNAs is not new—for example, the 17-kb X chromosome-inactivated specific transcript (Xist) was discovered in 1991. However, it is only recently that the sheer scale of the phenomenon has begun to be realized.” (p.1529)
“Noncoding short-lived ‘cryptic’ mRNAs have also recently been seen in yeast, the transcription of which may maintain chromatin in an open state. The consequences of certain RNA polymerase II mutations for the status of pericentromeric heterochromatin also suggest a direct coupling between the transcription of noncoding RNAs and chromatin structure.” (p. 1530)
“The intergenic, intronic, and antisense transcribed sequences that were once deemed artifactual are now a testimony to our collective refusal to depart from an oversimplified gene model.” (p. 1530)
“The sound of silent DNA” (Anonymous, Nature, Vol. 437 [Oct 20, 2005], p. xi)
“Time to junk the term ‘junk DNA,’ or to reserve it for DNA of proven uselessness. Geneticists favour the less judgmental term ‘non-coding DNA’ for those parts of the genome not translated into protein, and there is growing evidence that it is important in disease, development and evolution.” (p. xi)
“Bloated and Not-So-Bloated Genomes” (Stella Hurtley and Phil Szuromi [editors], Science, Vol. 311 [Mar 24, 2006], p. 1669)
“Eukaryotic genomes are bloated with so-called ‘junk’ DNA including introns, mobile elements, and large intergenic regions. Curiously, animal mitochondrial genomes are tiny, essentially junk-free, and conserved in gene structure, whereas plant mitochondrial genomes are relatively large, full of junk, and do not show a rigid conservation of gene structure.” (p. 1669)
“Mutation Pressure and the Evolution of Organelle Genomic Architecture” (Michael Lynch, Britt Koskella, Sarah Schaack, Science, Vol. 311 [Mar 24, 2006], pp. 1727-1730)
“Animal mitochondrial genomes are highly streamlined, whereas plant mitochondrial genomes contain large amounts of noncoding DNA. . . . Here we argue that when differences in mutation rates are accounted for, patterns of variation in organelle genome architecture support the theory that multiple aspects of genomic complexity owe their origins to non-adaptive processes.” (p. 1727)
Animal mitochondria are mostly intron-free; plant mitochondria are intron-rich. “. . . the only animal mitochondria known to harbor introns are those of cnidarians. . . . all observed green-algal mitochondria have 0 to 8 mitochondrial introns. . . . ” (p. 1729)
“. . . mRNA editing appears to be absent from animal mitochondria. In contrast, plant mitochondria use mRNA editing extensively.” (p. 1729)
“With the exception of euglenoids . . . the chloroplast genomes of the main algal groups . . . are completely lacking in introns or nearly so and also tend to have much lower levels of intergenic DNA (green algae being exceptions).” (p. 1729)
“The Nature and Dynamics of Bacterial Genomics” (Howard Ochman and Liliana M. Davalos, Science, Vol. 311 [Mar 24, 2006], pp. 1730-1733)
[ABSTRACT:] “Though generally small and gene rich, bacterial genomes are constantly subjected to both mutational and population-level processes that operate to increase amounts of functionless DNA. As a result, the coding potential of bacterial genomes can be substantially lower than originally predicted. Whereas only a single pseudogene was included in the original annotation of the bacterium Escherichia coli, we estimate that this genome harbors hundreds of inactivated and otherwise functionless genes. Such regions will never yield a detectable phenotype, but their identification is vital to efforts to elucidate the biological role of all the proteins within the cell.” (p. 1730)
“To understand the functional status of a genome, we need to recognize the pseudogenes within it. Unfortunately, there are inconsistencies in the methods by which pseudogenes are defined. Hence, the assignment of pseudogenes must be based on some a priori assumptions about the spectrum of alterations in a gene that will abolish the function of its encoded protein. . . . truncations cannot automatically be taken to reflect a global inactivation of genes.” (p. 1731)
“Functions have been specified for 15 of the E. coli K-12 pseudogenes we recognized . . . , including two found to be essential for growth in nutrient-rich conditions. Sequencing errors resulted in incorrect assignment as pseudogenes, which have since been corrected; the other cases, which represent less than 3% of the total sample, are likely to be false positives.” (p. 1732)
“There were 62 E. coli K-12 genes for which no transcripts were detected under any of the test conditions, but for nearly half of these genes a function had been assigned. . . . it is likely that we overlooked the particular contexts under which many genes are expressed.” (p. 1733)
“TUF Love for ‘Junk’ DNA” (Aarron T. Willingham and Thomas R. Gingeras, Cell, Vol. 125 [Jun 30, 2006], pp. 1215-1220)
[SUBHEADLINE:] “The widespread occurrence of noncoding (nc) RNAs — unannotated eukaryotic transcripts with reduced protein coding potential — suggests that they are functionally important.” (p. 1215)
“Over the past five years, researchers working with various organisms and using multiple technologies to explore genomewide gene expression have converged on the same surprising conclusion: transcription is widespread throughout the genome and many-fold higher than existing genome annotations would predict. The burgeoning number of these transcripts of unknown function (TUFs) . . . highlights a remarkably complex transcriptional architecture that includes alternative splice isoforms for almost all protein-coding genes, widespread transcription of antisense RNAs, and abundant noncoding RNAs (ncRNAs) with important biological functions. By some estimates, TUFs could rival protein-coding transcripts in number. . . . Such transcriptional diversity may explain how the relatively similar numbers of protein-coding genes estimated for fruit fly (13,985 . . .), nematode worm (21,009 . . .), and human (23,341 . . .) result in the remarkable phenotypic differences observed among these species.” (p. 1215)
Examples of “widespread transcription”: “In 2002, in a systematic analysis of transcription across human chromosomes 21 and 22, our group observed about an order of magnitude more transcriptional activity than could be accounted for by predicted protein-coding genes, suggesting that a significant portion of transcribed cytoplasmic poly A RNA may indeed be noncoding.” (p. 1215)
Study of Arabidopsis found that “~30% of annotated genes had associated antisense transcription, some of which was tissue-specific. Furthermore, about 20% of annotated pseudogenes were expressed, suggesting that examples of pseudogene-mediated regulation of gene activity may be common. . . .” (p. 1215)
“In Drosophila, ~40% of probes in intronic and intergenic areas detected RNA expression, much of which changed in a developmentally coordinated manner. Furthermore, alternative splicing was observed in ~40% of known genes, yielding over 5000 new splice forms.” (p. 1215)
“Recently, even the small, well-characterized yeast genome has yielded a more complex transcriptome than expected, with overlapping transcription and differential expression levels even within the same gene. . . .” (p. 1215)
“Together, these studies provide several observations about transcriptomes: (1) the widespread incidence of unannotated transcripts with limited protein-coding capacity, often expressed at low levels; (2) a large degree of overlapping transcription, evidenced in part by the presence of abundant antisense transcription; and (3) most coding genes have alternative splice forms.” (pp. 1215f.)
” Analysis of human gene expression by MPSS [massively parallel signature sequencing] found that >65% of signature sequences do not overlap with annotated transcripts; rather, 38% map to introns, 21% are antisense to known exons, and 5% map to intergenic areas.” (p. 1216)
“Genomewide surveys of expressed mouse and human sequences and cDNA sequencing experiments identified a surprisingly large number of transcripts antisense to protein-coding genes. This suggests that the majority of sense transcripts (e.g., 72% of mouse genes) may have an antisense partner. . . .” (p. 1216)
“Many other functions for noncoding transcripts have been identified, including transcriptional activation, gene silencing, imprinting, dosage compensation, translational silencing, modulation of protein function, and binding as riboswitches to regulatory metabolites. . . .” (p. 1217)
“One of the best characterized emerging classes of ncRNAs are the microRNAs (miRNAs) cloned over a decade ago in C. elegans and now recognized as a large conserved family of ~22-nucleotide regulatory RNAs essential for a variety of cellular processes. . . . The differential expression patterns of miRNAs determine cell fate and correct differentiation during development. . . . MicroRNAs can act as tumor suppressors and oncogenes . . . and they can regulate cellular proliferation and apoptosis. Surprisingly, miRNA expression profiles appear to reflect more accurately the developmental lineage and differentiation state of tumors than do mRNA profiles.” (p. 1217)
“Hundreds of miRNAs are estimated to be present in the human genome, and computational analysis suggests that more than 20%–30% of human genes are regulated by miRNAs. Microarray experiments support this view, revealing miRNA-mediated downregulation of large numbers of target mRNAs. In addition, miRNAs suppress initiation of protein translation, promote mRNA degradation and turnover, and initiate transcriptional silencing. However, the function of the vast majority of miRNAs is as yet unknown.” (p. 1217)
“Small nucleolar RNAs (snoRNAs) . . . reveal an ever-increasing retinue of cellular functions for ncRNAs. . . . Given that most snoRNAs are processed from the introns of other genes, their expression is inextricably linked to transcription of their host gene.” (p. 1217)
“. . . the rising ratio of noncoding to protein-coding DNA correlates with increasing organismal complexity. . . . ncRNAs probably function in more cellular pathways than their protein-coding brethren.” (p. 1218)
“How much of the nonredundant genome is transcribed? Based on published data, estimates range from 10% to 60%. However, this may be an underestimate given the limited number of cells and differentiation states surveyed so far.” (p. 1219)
[After considering several pieces of data:] “Weighing these factors together, we suggest that all of the non-repeat portions of the human genome are transcribed. This may seem an excessive estimate, yet recent data in yeast imply that more than 85% of its genome is transcribed. . . . Furthermore, large-scale cDNA sequencing and annotation in the mouse has shown that 62% of the mouse genome is transcribed . . . and there are estimates that 90% of the human genome is transcribed. . . . The overall architecture of transcribed portions of the genome is highly complex. Indeed, the landscape of most transcriptomes is a lattice-like network of overlapping transcription in which the same genomic sequences often serve as portions of separately regulated transcripts, making the boundaries and indeed the concept of the term gene less useful than it once was.” (p. 1219)
“The Real Life of Pseudogenes” (Mark Gerstein and Deyou Zheng, Scientific American, Vol. 295 No. 2 [Aug 2006], pp. 49-55)
Pseudogenes “are the molecular remains of broken genes, which are unable to function because of lethal injury to their structures.” Most pseudogenes “are damaged copies of working genes and serve as genetic fossils that offer insight into gene evolution and genome dynamics.” “Recent evidence of activity among pseudogenes, and their potential resurrection, suggests some are not entirely dead after all.” (p. 50)
“With ongoing annotation of the human genome sequence, our research group, along with others in Europe and Japan, have identified more than 19,000 pseudogenes, and more are likely to be discovered. Humans have only an estimated 21,000 protein-coding genes, so pseudogenes could one day be found to outnumber their functional counterparts.” (p. 50)
Mammals may have >1000 different genes coding olfactory receptors, cell-surface proteins that confer the sense of smell; humans have fewer than 500 working olfactory receptor genes. But “versions of about 300 human olfactory receptor pseudogenes are still functional genes in the genomes of rats and mice. . . . humans have considerably more olfactory receptor pseudogenes than chimpanzees do. . . .” (p. 53)
“Analysis of the mouse genome . . . has shown that 99 percent of human genes have a corresponding version in the mouse. . . . Yet despite this similarity in functional genes and overall genome structure, just a small fraction of the known human pseudogenes have an obvious counterpart in the mouse.” (p. 53)
“Recognition of pseudogenes . . . relies primarily on their similarity to genes and their lack of function. Computers can detect similarity by exhaustively aligning chunks of intergenic DNA against all possible parent genes. Establishing a suspected pseudogene’s inability to function is more challenging.” (p. 54)
“Comparison of pseudogenes among genomes has revealed a puzzling phenomenon, however: a few pseudogenes appear to be better preserved than one would expect if their sequences were drifting neutrally. Such pseudogenes may therefore be under evolutionary constraint, which implies that they might have some function after all.One way to try to ascertain whether pseudogenes are functioning is to see whether they are transcribed into RNA. Recent experiments by Thomas Gingeras of Affymetrix and by Michael Snyder of Yale University have found that a significant fraction of the intergenic regions in the human genome are actively transcribed. In their studies, in fact, more than half the heavily transcribed sequences map to regions outside of known genes. What is more, a number of those transcriptionally active intergenic areas overlap with pseudogenes, suggesting that some pseudogenes may have life left in them.” Preliminary data indicate that “at least one tenth of the pseudogenes in the human genome are transcriptionally active. Knowing that so many pseudogenes are transcribed does not tell us their function, but together with evidence that certain pseudogenes are better preserved than background intergenic sequences, it certainly challenges the classical view of pseudogenes as dead.” (p. 54)
“One possibility is that pseudogenes play some ongoing part in regulating the activity of functional genes.” (Two examples are given.) (pp. 54f.)
“Perhaps two dozen examples of specific pseudogenes that appear to be active in some way—often only in certain cells of an organism—have been identified, although the findings are still preliminary.” (p. 55)
“Junk DNA as an evolutionary force” (Christian Biémont and Cristina Vieira, Nature, Vol. 443 [Oct 5, 2006], pp. 521-524)
Transposable elements (TEs, “jumping genes”) can act “as genes or gene regulatory elements, and as a result constitute a source of genetic innovation for the organism.” (p. 521)
“. . . DNA transposon-like elements called helitron rolling-circle elements were recently found to be responsible for copying various gene segments into new locations in the maize genome, generating a huge diversity among individual maize plants.” (p. 521)
Today TEs “are acknowledged as a main component of most genomes” — around 45% of human DNA consists of TEs. (p. 522)
“. . . the influence of TEs on genomes has long been underestimated.” (p. 522)
“Even though they do not encode proteins, these TE remnants seem to be under selective constraints, suggesting that they have some function that is being conserved during evolution.” (p. 522).
“The expression of TEs — that is, the production of their encoded RNA — is tissue-specific; some elements are highly expressed during particular stages of the host organism’s life, and some are even expressed differently in male and female reproductive cells (the germ lines). Such high levels and specific patterns of expression seem unexpected for apparently non-functional ‘junk’ DNA. This paradox has been explained by invoking either complex interactions between the TE regulatory sequences and the activity of numerous host developmental genes, or the influence of particularly highly expressed host genes on the TEs adjacent to them (the ‘read through phenomenon’). But both of these explanations implied a puzzling waste of energy for the cells and the organisms involved, so the idea that these RNAs are deliberately expressed to fulfill a particular cellular function gained ground.
“This theory received a recent boost with the observation that some retrotransposons can influence the regulation of certain host genes and affect developmental processes in mouse oocytes (egg cells) and preimplantation embryos. . . . this finding reveals that TEs could have a role in the reorganization of genome structure and the gene silencing that occur during early embryonic developmenl.” (p. 522)
“What was once dismissed as junk DNA must now be regarded as a major player in many of the processes that shape the genome and control the activity of its genes.” (p. 524)
“A new paradigm for developmental biology” (John S. Mattick, The Journal of Experimental Biology, Vol. 210 No.9 [May 2007], pp. 1526-1547)
Introns were “undoubtedly the biggest surprise” and their “misinterpretation possibly the biggest mistake, in the history of molecular biology. Although introns are transcribed, since they did not encode proteins and it was inconceivable that so much non-coding RNA could be functional, especially in an unexpected way, it was immediately and almost universally assumed that introns are non-functional and that the intronic RNA is degraded (rather than further processed) after splicing. The presence of introns in eukaryotic genomes was then rationalized as the residue of the early assembly of genes that had not yet been removed and that had utility in the evolution of proteins by facilitating domain shuffling and alternative splicing. . . .” (p. 1529)
“. . . it may well be that most of the human genome is functional . . ., including many sequences such as introns and other mobile element-derived sequences that have been long considered as parasitic evolutionary debris rather than the historic raw material for genetic innovation and the current embodiment of higher levels of regulatory sophistication.” (p. 1540) <http://jeb.biologists.org/content/210/9/1526.full>
“Genome project turns up evolutionary surprises” (Erika Check, Nature, Vol. 447 [Jun 14, 2007], pp. 760-761)
The ENCODE project (Encyclopedia of DNA elements) “attempts to discover how our cells make sense of the DNA sequence in the human genome. Already ENCODE is up-ending one piece of conventional scientific wisdom: the idea that biologically relevant DNA resists change over evolutionary time.” (p. 760)
“ENCODE aims to catalogue all the ‘functional elements’ in the genome — the DNA sequences that control how and when our cells use our genes. Most of these controls seem to be written into so-called non-coding DNA, which does not make a detectable protein product. Because organisms depend on functional elements working correctly, scientists have long thought that such elements should not change much over evolutionary time. So researchers have mostly looked for key functional elements in non-coding DNA that is the same across species, known as conserved or constrained DNA. . . . But when the different groups compared their results, they found that their predictions about key portions of the genome didn’t always agree: the biologists’ list of functional sequences didn’t match the computational group’s list of constrained sequences. . . . Overall, biologists found no evidence of function for about 40% of the constrained regions. On the flipside, about half of the functional elements found in non-coding DNA were totally unconstrained.
“The finding that many constrained regions weren’t considered to be functional is not too surprising, because it is unlikely ENCODE included enough tests on enough different types of cells to capture every major aspect of biology. But the idea that important DNA might also be unstable is newer, and intriguing, because it undermines the assumption that biological function requires evolutionary constraint.” (p. 760)
“‘Junk’ DNA makes compulsive reading” (Andy Coghlan, New Scientist, Vol. 194 No. 2608 [Jun 16, 2007], p. 20)
“It turns out that DNA generates far more RNA than the standard dogma predicts it should – even some ‘junk’ DNA gets transcribed. The Encyclopedia of DNA Elements (ENCODE) project has quantified RNA transcription patterns and found that while the ‘standard’ RNA copy of a gene gets translated into a protein as expected, for each copy of a gene cells also make RNA copies of many other sections of DNA. None of the extra RNA fragments gets translated into proteins, so the race is on to discover just what their function is.” (p. 20)
” ‘It’s no longer the neat and tidy genome we thought we had,’ says John Greally of the Albert Einstein College of Medicine in New York City. . . . ‘It would now take a very brave person to call non-coding DNA junk,’ says Greally.” (p. 20)
“ENCODE labs analysed 30 million bases or ‘letters’ of human DNA – about 1 per cent of the total – covering 44 different and randomly chosen sites in our genome, and measured the associated RNA transcription in living cells. The whole sample was analysed independently by a range of methods in 38 labs, then cross-checked.
“With around 400 known genes in the chosen sample, researchers expected an equal number of different RNA transcripts according to the central dogma of one RNA copy per gene. Instead, they found about twice the predicted quantity of RNA transcripts. Moreover, they also found almost 10 times the expected number of gene switches – the points in DNA where transcription can be activated. . . .
“Many of the RNA transcripts were copies of sections lying across genes and their adjacent stretches of ‘junk’ DNA. . . . Even more surprising, many transcripts were copies of junk DNA situated further from genes. . . .
“Tom Gingeras of genomics firm Affymetrix in Santa Clara, California, and a co-leader of ENCODE . . . is convinced that the extra RNAs have a function, perhaps to transport molecules around the cell or fine-tune and modulate the activity of genes themselves. ‘We don’t think they’re produced by accident,’ he says.” (p. 20)
“Functional Demarcation of Active and Silent Chromatin Domains in Human HOX Loci by Noncoding RNAs” (John L. Rinn, Michael Kertesz, Jordon K. Wang, Sharon L. Squazzo, Xiao Xu, Samantha A. Brugmann, L. Henry Goodenough, Jill A. Helms, Peggy J. Farnham, Eran Segal, and Howard Y. Chang, Cell, Vol. 129 No. 7 [Jun 29, 2007], pp. 1311-1323)
[SUMMARY:] “Here we characterize the transcriptional landscape of the four human HOX loci at five base pair resolution in 11 anatomic sites and identify 231 HOX ncRNAs [non-protein-coding RNAs] that extend known transcribed regions by more than 30 kilobases. HOX ncRNAs are spatially expressed along developmental axes and possess unique sequence motifs, and their expression demarcates broad chromosomal domains of differential histone methylation and RNA polymerase accessibility. We identified a 2.2 kilobase ncRNA residing in the HOXC locus, termed HOTAIR [HOX Antisense Intergenic RNA], which represses transcription in trans across 40 kilobases of the HOXD locus. HOTAIR interacts with Polycomb Repressive Complex 2 (PRC2) and is required for PRC2 occupancy and histone H3 lysine-27 trimethylation of HOXD locus. Thus, transcription of ncRNA may demarcate chromosomal domains of gene silencing at a distance; these results have broad implications for gene regulation in development and disease states.” (p. 1311)
“Genome 2.0” (Patrick Barry, Science News, Vol. 172 No. 10 [Sep 8, 2007], pp. 154-156)
“Researchers slowly realized . . . that genes occupy only about 1.5 percent of the genome. The other 98.5 percent, dubbed ‘junk DNA,’ was regarded as useless scraps left over from billions of years of random genetic mutations. As geneticists’ knowledge progressed, this basic picture remained largely unquestioned. ‘At one time, people said, “Why even bother to sequence the whole genome? Why not just sequence the [protein-coding part]?” ‘ says Anindya Dutta, a geneticist at the University of Virginia in Charlottesville.” (p. 154)
“The Human Genome: RNA Machine” (John S. Mattick, The Scientist, Vol. 21 No. 10 [Oct. 2007], pp. 61ff.)
[SUBTITLE:] “Contrary to current dogma, most of the genome may be functional.“
“When introns were discovered 30 years ago it was immediately and universally assumed that these vast tracts of nonprotein-coding sequences within genes are nonfunctional, despite the fact they are transcribed. Their presence was rationalized as the leftovers of the early evolution of genes. At the same time, the finding that much of the mammalian genome (45% in humans) is derived from transposons, which are thought to be mainly parasitic hitchhikers, led to the related concept of ‘selfish DNA.’ This reinforced the increasingly conventional view that the genomes of complex eukaryotes largely comprise accumulated evolutionary debris.“
“The [ENCODE] project’s findings, released last spring, concluded that only 5% of the [ncRNA] sequences are evolutionarily constrained, suggesting that the remainder has evolved randomly over time and is unimportant, in keeping with orthodox thinking of genes and genetic function. That same project also confirmed that the majority of the genome is transcribed – as much as 93% in different cells – and that ‘surprisingly, many functional elements are seemingly unconstrained across mammalian evolution.’ “
“There are good reasons to think that these [noncoding] RNAs represent a hitherto hidden layer of regulation that encodes the developmental program of eukaryotesand hence has vastly expanded in complex organisms. Noncoding RNAs, including those derived from introns, are increasingly recognized to be involved in all aspects of regulation of cellular processes, including chromatin remodeling and epigenetic memory, transcription factor nuclear trafficking, and transcriptional activation or repression. miRNAs regulate gene expression by inhibition of translation and degradation of mRNAs.
“The correct functioning of ncRNAs is also important for human health: Changes in ncRNAs have been implicated in heart attacks and diseases such as cancer. Many ncRNAs are expressed in the brain, and at least one is involved in behavioral responses. The untranslated RNA BC1 is normally expressed in mouse brain; mice strains created without this RNA showed no physical abnormality, but were found to have reduced exploratory behavior and consequentially a higher mortality in field experiments.
“The evidence for large numbers of ncRNAs and for the central importance of ncRNAs as regulators of important developmental, physiological, and neural processes is compelling. If all these ncRNAs are functional, as the evidence increasingly suggests they may be, then much and perhaps most of the human genome is functional. If so, the genetic programming of the higher organisms has been fundamentally misunderstood for the past 50 years, because of the presumption – largely true in prokaryotes, but not in complex eukaryotes – that most genetic information is expressed as, and transacted by, proteins.“
“The Production Line” (Anna Petherick, Nature, Vol. 454 [Aug 28, 2008], pp. 1043-1045)
“HOTAIR is a molecule with a future. Created from a DNA sequence on human chromosome 12, it affects genes on chromosome 2, apparently working as part of a system that enables skin cells to tell where on the body’s surface they are, and thus what they should be doing.
“Beyond these specifics, HOTAIR may also serve as a model for understanding a whole slew of similar molecules, the existence of which was not even dreamed of ten years ago and the function of which — if any — is still hotly debated. HOTAIR stands out because it is a long piece of RNA that doesn’t encode a protein but still does something biologically important. ‘HOTAIR was a gem in a sea [of long RNAs],’ says John Rinn, a genome biologist who discovered the RNA while working at Stanford University in California. ‘It told us little about what the bulk of these things are doing. For that, we can’t even see a common trend.’
“It is hard to comprehend the upheaval that RNA has been causing in molecular biology over the past few years. Once viewed as a passive intermediary, it was thought to faithfully carry genetic messages from the DNA sequence to the protein-making machinery, where things were made that actually got things done. Biologists were comfortable in the knowledge that only 1–2% of the human genome made protein-coding RNA in this way, and most of the rest was filler. So when, in 2005, geneticist Thomas Gingeras announced that some cells churn out RNA molecules from about 80% of their DNA, he astonished scientists attending the Biology of Genomes meeting at Cold Springs Harbor Laboratory in New York. Why should cells bother with so much manufacturing if, as it seemed, such a tiny fraction was involved in the important business of protein-making?
“Over the past three years or so the case for this ‘pervasive transcription’ has strengthened. The phenomenon has now been ascribed to mice, fruitflies, nematode worms and yeast. These studies, and Gingeras’s original reports, came from microarrays — a technique that relies on the tendency of nucleic acids to find their complementary cousins in a solution. Gingeras works for the microarray manufacturer Affymetrix in Santa Clara, California. But not everyone has been persuaded of the extent of pervasive transcription, in part because microarrays are subject to background ‘noise’. Even using no RNA, control chips will give off some signals, and results can be a matter of interpretation.
“For anyone who still doubts that the genomes of nucleated organisms are first and foremost RNA machines rather than protein-coding ones, sequence data are starting to provide ‘ultimate information’, Gingeras says. There is something about the nitty gritty of nucleotide sequences that is enticingly reassuring to molecular biologists. New sequencing machines that can stream out data many times faster than their predecessors have made the mass sequencing of cellular transcripts possible.
“In 2008, this process was completed for two species of yeast using machines made by Illumina, based in San Diego, California. The results broadly agree with the microarray findings, showing transcription from 74% of the genome of brewer’s yeast (Saccharomyces cerevisiae) and 90% from that of fission yeast (Schizosaccharomyces pombe). Gingeras and other researchers are now working to sequence all the RNA produced by 44 kinds of human cell as part of the Encyclopedia of DNA Elements (ENCODE) project, which aims to identify all the functional parts of the human genome. At that point, any remaining sceptics will be able to overlay the many thousands of different human RNAs onto DNA regions from whence they came. At the end of this process, the covered regions will be those that give rise to RNA — and the uncovered ones, probably just a few naked holes.
“All this transcriptional accounting has hastened an already heady RNA rush. Even before the pervasive nature of transcription became clearer, molecular biologists had begun to trot out new classes of RNA molecules that are responsible for important happenings in cells. Thrust farthest into the limelight are the microRNAs (miRNAs), which stop the production of certain proteins, but they have been joined by a growing number of other RNA families, such as small nucleolar RNAs (snoRNAs) and Piwi-interacting RNAs (piRNAs), with vital roles in cellular and developmental processes — vital enough to earn the DNA that encodes them the label ‘RNA genes’.” (p. 1043)
“Those who doubt the importance of RNA bemoan their logical problem: it is impossible to prove lack of function. Even when an important cellular job does get pinned on a long RNA, as it did for HOTAIR, the doubters worry that it is too tempting to extrapolate across the board.
“Ewan Birney, a bioinformatician at the European Bioinformatics Institute and one of the leading scientists in ENCODE, says that the debate is now about what proportion of long RNAs serve a purpose. ‘I used to be a much stronger sceptic three to four years ago,’ he says. ‘Now I’m accepting that transcription is pretty complicated and that many transcripts are made that we don’t understand. Where I still have some scepticism — what we still don’t know — is what those transcripts do, if anything.’
“John Mattick, the director of the Centre for Molecular Biology and Biotechnology at the University of Queensland in Brisbane, Australia, has no such qualms. He is a long-time advocate of non-coding RNA’s importance. The doubters, he says, ‘keep regressing to the most orthodox explanation [that the long RNAs are junk]. But they can’t just sit on their intellectual backsides and tell us to prove it.’ But prove it is just what researchers are starting to do, with a growing number of examples that showcase these molecules’ capabilities.
“The idea of long non-coding RNAs is not new. Xist, the most famous example, was discovered in 1991. Its 17,000 nucleotides can be found in almost every cell of mice and humans, where it obviates gene expression along an entire X chromosome. Because females have two Xs to their male (XY) counterparts’ one, they use Xist to switch off the extra X and compensate for the disparity. . . .
“Xist RNA is transcribed from the chromosome it mutes, and coats it along its length. No one really knows exactly how it attaches and what makes it so effective at gene silencing. What is clear, however, is that part of the molecule attracts chromatin remodelling complexes — enzymes that turn genes on and off by tinkering with DNA’s packaging. Get enough of these complexes together, and it seems you can turn off a whole chromosome.
“Over the past few years, the RNA field has compiled a brief list of other long non-coding RNAs. Many of those that have been studied control the activity of protein-coding genes. As the pace of these discoveries has picked up, they have revealed that long RNAs can control genes in a surprising variety of ways, from both near and far, and that their function is not necessarily dependent on the exact sequence of the RNA, as it is when RNA is coding for proteins. This suggests that scientists have only begun to appreciate what RNA is capable of.
“In one example published last year, molecular biologist Igor Martianov and his colleagues at the University of Oxford, UK, studied the human gene for dihydrofolate reductase, an enzyme involved in biochemical syntheses that has two ‘on’ switches for protein production. They discovered that the first of these switches actually triggers the manufacture of a 583-nucleotide-long RNA molecule, and that this RNA directly interferes with the second switch. When this happens, the enzyme is no longer made.
“Working in a very different way, a long RNA called NRON seems to travel to the cytoplasm in order to influence the expression of protein-coding genes. Several thousand nucleotides long, NRON polices the trafficking of a transcription factor from the cytoplasm into the nucleus of the cells where it is active. By doing so, it seems to control the transcription factor’s activities, which include regulating T cells’ immune response.
“When Rinn discovered HOTAIR, it reinforced the idea that RNAs could be shuttling around the genome doing important jobs. Rinn was studying skin-cell lines cultured from the finger, foot, foreskin and eight other sites on the human body, trying to find out how these cells know their position.
“HOTAIR, which stretches for nearly 2,200 nucleotides, is produced from within a cluster of the HOX genes that specify an early embryo’s head end and foot end, as well as the order of the body segments in between. When Rinn found that this RNA [transcribed from DNA on human chromosome 12] affects the output of genes on chromosome 2, it was the first time such a cross-chromosome influence had been found. When he lowered levels of the RNA molecule, the activity of HOX genes on chromosome 2 jumped, and foreskin cells started behaving in an unusual way.” (pp. 1043f.)
“As Rinn has said, there is a vast sea of long RNAs out there. The ones with functions already ascribed to them comprise just a minuscule fraction, and those seem to be regulating genes by very diverse means. To many, this lack of common function infers [sic] that science has only scratched the surface of the diversity of long RNAs. The massive scale on which transcription is taking place could be the least of biologists’ problems compared with its mind-boggling functional complexity. What is needed, researchers say, is more data to show that RNAs do something useful on the genomic scale — but those data are proving remarkably difficult to collect.
“One problem, when it comes to surveying RNA’s usefulness, is that sequence does not provide any simple indicator of function. The sequence of non-coding RNA is not conserved between species in the same way that it is for protein-coding genes. If a sequence is doing something important for an organism because of the protein it codes for, then evolution is likely to have kept that region more constant across related species compared with any average stretch. But the same isn’t true of RNA, which does not necessarily pair up with a complementary nucleotide sequence at all. Xist is not conserved in this way, nor are any of the other non-coding RNA stars along their full lengths.
“Another way to seek evidence of function en masse is to get rid of long non-coding RNAs and watch how animals cope. But such an experiment may produce only subtle changes in an organism as a whole, and could still miss the importance of a transcript. ‘I think the cell will use these transcripts at very different times and in very different cell types and conditions.’ Gingeras says. ‘You may need to see them in a very specific context to see the function.’
“That is what Jürgen Brosius of the University of Münster, Germany, and his colleagues found when they removed a 150-nucleotide RNA from mouse neurons, where it is normally transported down the cellular fingers that communicate with other cells. The engineered animals looked and acted more or less the same as the control animals — but Brosius says that on close inspection they weren’t as inquisitive and had unusual exploratory behaviours. Such activity might be lethal in the wild, Mattick says, ‘but it was affecting their behaviour in ways that were far too subtle to be assessed in a cage.’ ” (pp. 1044f.)
“There are already known examples in which RNA production seems more important than the actual product. In 2004, Fred Winston and his colleagues at Harvard Medical School in Boston, Massachusetts, studied a 551 nucleotide RNA called SRG1 that is made by brewer’s yeast. It switches on and off the adjacent gene SER3, which helps make serine (an amino acid that the yeast needs to be healthy). But in this case it is the process of making the non-coding RNA that regulates SER3, rather than the RNA itself. The trick here is that the DNA sequence from which SRG1 is transcribed runs through the on switch for SER3. So when a yeast cell is manufacturing a lot of RNA for SRG1, it blocks access to the SER3 switch. This is what happens when the yeast sits happily in a flask of rich medium and has no need to generate it own serine.” (p. 1045) <http://www.nature.com/news/2008/080827/full/4541042a.html>
“Evolution and Functions of Long Noncoding RNAs” (Chris P. Ponting, Peter L. Oliver, and Wolf Reik, Cell, Vol. 136 No. 4 [Feb 20, 2009], pp. 629-641)
[ABSTRACT:] “RNA is not only a messenger operating between DNA and protein. Transcription of essentially the entire eukaryotic genome generates a myriad of non-protein-coding RNA species that show complex overlapping patterns of expression and regulation. Although long noncoding RNAs (lncRNAs) are among the least well-understood of these transcript species, they cannot all be dismissed as merely transcriptional ‘noise.’ Here, we review the evolution of lncRNAs and their roles in transcriptional regulation epigenetic gene regulation, and disease.” (p. 629)
“The large proportion of a eukaryotic genome that is transcribed . . . produces a huge array of RNA molecules differing in size, abundance and protein-coding capacity.
“In stark contrast to this diversity of RNA species, only a small number of non-protein-coding transcripts currently have experimentally-derived functions. Moreover, only rarely have disease-associated mutations been identified outside of protein-coding genes. Might, therefore, this colorful pageant of genomic transcription be a mirage? Might such a genome’s repertoire of non-protein-coding transcripts be inconsequential transcriptional ‘noise’? Here, we review evidence for whether pervasive transcription is consequential, drawing first upon evolutionary signatures of functionality in genome sequences, and then upon experimental findings about the functions of noncoding transcripts, particularly with respect to transcriptional regulation. We will focus on long noncoding RNAs (lncRNAs, >200 nucleotides) that are, perhaps, the least well-understood products of transcription from genomes.” (p. 629)
“The near ubiquity of transcription across genomes has been demonstrated by diverse methods, including whole genome tiling arrays and transcriptome sequencing. . . . It has also been shown for diverse eukaryotes ranging from plants to animals and, most recently, to fungi such as the fission yeast Schizosaccharomyces pombe . . . and the budding yeast Saccharomyces cerevisiae. . . .” (pp. 629f.)
“Classes of noncoding transcripts can be divided between housekeeping noncoding RNAs and regulatory noncoding RNAs. Housekeeping noncoding RNAs include ribosomal, transfer, small nuclear and small nucleolar RNAs and are usually expressed constitutively. Among short regulatory noncoding RNAs are microRNAs, small interfering RNAs and Piwi-associated RNAs. . . . Most transcribed, yet not protein-coding, sequence, however, is associated with lncRNAs. . . .” (p. 630)
“An lncRNA can be placed into one or more of five broad categories: (1) sense, or (2) antisense, when overlapping one or more exons of another transcript on the same, or opposite, strand, respectively; (3) bidirectional, when the expression of it and a neighboring coding transcript is initiated in close genomic proximity, (4) intronic, when it is derived wholly from within an intron of a second transcript . . ., or (5) intergenic when it lies within the genomic interval between two genes. . . .” (p. 631)
“Transcription of lncRNAs is now known to regulate the expression of genes in close genomic proximity (cis-acting regulation) and to target distant transcriptional activators or repressors (trans-acting) via a variety of mechanisms. . . .
“Transcription of an lncRNA may promote the accessibility of protein-coding genes to RNA polymerases. . . .
“In other contexts, lncRNA sequences themselves convey functions through binding to DNA or protein. . . .
“Noncoding RNAs also regulate transcription in cis indirectly, without binding to DNA.” (pp. 634-635)
“Mechanisms of lncRNA Function in Transcriptional Regulation
“(A) Transcriptional interference. Transcription of the lncRNA SRG1 through the promoter of the adjacent SER3 gene.
“(B) Initiation of chromatin remodeling. RNA pol II processivity upstream of fbp1 is normally repressed by Tup proteins, however, rare lncRNAs are transcribed. Upon glucose starvation, the Atf1 activator binds to the UAS1 element, facilitating chromatin remodeling by RNA pol II and the subsequent binding of Rst2 to a second UAS2 element. As further lncRNAs are transcribed, the chromatin structure around the fbp1 initiation site is then accessible to the transcriptional machinery allowing induction of the gene to occur.
“(C) Promoter inactivation by binding to basal transcription factors. Formation of a complex between an lncRNA and both the DHFR promoter and TFIIB prevents normal preinitiation of transcription.
“(D) Activation of an accessory protein. In response to stress, lncRNAs upstream of CCND1 form a complex with an RNA-binding protein TLS (trans-located in liposarcoma) in which the inactive conformation of the protein is altered, facilitating repression of CCND1 via chromatin-binding protein (CBP).
“(E) Activation of transcription factors. The lncRNA Evf2 cooperates with the Dlx2 homeodomain protein to activate the Dlx5/6 enhancer.
“(F) Oligomerization of an activator protein. In response to heat shock, an lncRNA assists the trimerization of the HSF1 protein, which in turn forms a complex with the translation factor EIF to facilitate HSP expression.
“(G) Transport of transcription factors. Dephosphorylated NFAT is prevented from translocating to the nucleus and activating its targets due to interactions between the lncRNA NRON and importin proteins.
“(H) Epigenetic silencing of gene clusters by lncRNAs. The Xist, Kcnq1ot1, and Air RNAs establish a nuclear domain (or “coating”) for gene silencing of genes in cis. The lncRNAs may directly or indirectly attract epigenetic modifiers such as histone methyltransferases (G9a or Ezh2) to bring about repressive epigenetic marks in the cluster.
“(I) Epigenetic repression of genes by an intergenic lncRNA in trans. HOTAIR RNA, transcribed within the HOXC cluster, interacts with the Polycomb repressor complex 2 (PRC2) resulting in the methylation and silencing of several genes in the HOXD locus.
“MicroRNAs and Cancer: Short RNAs Go a Long Way” (Andrea Ventura and Tyler Jacks, Cell, Vol. 136 No. 4 [Feb 20, 2009], pp. 586-591)
[ABSTRACT:] “MicroRNAs (miRNAs) may be important regulators of gene expression. By modulating oncogenic and tumor suppressor pathways they could, in principle, contribute to tumorigenesis. Consistent with this hypothesis, recurrent genetic and epigenetic alterations of individual miRNAs are found in some tumors. Functional studies are now elucidating the mechanism of action of putative oncogenic and tumor suppressor miRNAs.” (p. 586)
“More recently, biochemical and genetic studies have begun to reveal the physiological functions of individual miRNAs. We now know that miRNAs act by modulating the expression of target genes through sequence complementarity between the so-called ‘seed’ sequence of the miRNA and the ‘seed-match’ present in the target messenger RNA (mRNA). Such binding inhibits the translation and reduces the stability of the mRNA, leading to decreased expression of the target protein. MicroRNAs control a wide array of biological processes, including differentiation, proliferation, and apoptosis. As the deregulation of these very same processes is a hallmark of cancer, there has been speculation that mutations affecting miRNAs or their functional interactions with oncogenes and tumor suppressor genes might also contribute to tumorigenesis. Here, we summarize recent findings that now strongly support an important role for these tiny RNAs in controlling cell transformation and tumor progression.” (p. 586)
“Several miRNAs have been implicated as tumor suppressors based on their physical deletion or reduced expression in human cancer. Beyond these associations, functional studies of a subset of these miRNAs indicate that their overexpression can limit cancer cell growth or induce apoptosis in cell culture or upon transplantation in suitable host animals. This increasingly long list includes at least a dozen miRNAs and miRNA clusters. . . .
“The miR-15a~16-1 cluster of miRNAs has recently emerged as an excellent candidate for the long sought-after tumor suppressor gene on 13q14. This chromosomal region is deleted in the majority of CLLs [chronic lymphocytic leukemias] and in a subset of mantle cell lymphomas and prostate cancers. . . .
“The tumor suppressor activity of miR-15a~16-1 is not limited to B cells. More than 50% of human prostate cancers carry a deletion of 13q14. Accordingly, a recent study has shown that inhibition of miR-15a and miR-16 activity leads to hyperplasia of the prostate in mice and promotes survival, proliferation, and invasion of primary prostate cells in vitro. . . .” (p. 588)
“Transcriptional Scaffolds for Heterochromatin Assembly” (Hugh P. Cam, Ee Sin Chen, and Shiv I. S. Grewal, Cell, Vol. 136 No. 4 [Feb 20, 2009], pp. 610-614)
[ABSTRACT:] “Heterochromatin is dynamically regulated during the cell cycle and in response to developmental signals. Recent findings from diverse systems suggest an extensive role for transcription in the assembly of heterochromatin, highlighting the emerging theme that transcription and noncoding RNAs can provide the initial scaffold for the formation of heterochromatin, which serves as a versatile recruiting platform for diverse factors involved in many cellular processes.” (p. 610)
“Heterochromatin is a unique type of chromatin that is characterized by its transcriptionally repressed state and highly condensed structure. . . .
“The assembly of heterochromatin is believed to be a multistep process. Heterochromatin structures are nucleated at specific regulatory sequences and can spread into neighboring sequences, thereby influencing gene expression in a region-specific manner. Importantly, the ability of heterochromatin to propagate far from its original nucleation site provides a molecular platform for the recruitment of effector complexes involved in various chromosomal processes. . . .
“In addition to DNA binding factors that are important for the targeting of heterochromatin, recent evidence suggests that transcription, in particular noncoding products of transcription, play critical roles in heterochromatin assembly.” (p. 610)
“Perhaps no better example illustrates the versatility and power of the RNAi-mediated heterochromatin assembly platform in other cellular processes that the process of selective DNA elimination in the ciliate Tetrahymena thermophila. The single-celled Tetrahymena contains two nuclei called the micronucleus and the macronucleus. During mating (conjugation), the differentiation of the new macronucleus from a micronucleus is accompanied by elimination of DNA elements known as internal eliminated segment sequences. Remarkably, this DNA elimination process requires the RNAi machinery and small RNAs that specifically target heterochromatin marks such as H3K9me and chromodomain proteins to the internal eliminated segment sequences. . . .” (p. 614)
“Origins and Mechanisms of miRNAs and siRNAs” (Richard W. Carthew and Erik J. Sontheimer, Cell, Vol. 136 No. 4 [Feb 20, 2009], pp. 642-655)
[ABSTRACT:] “Over the last decade, ~20–30 nucleotide RNA molecules have emerged as critical regulators in the expression and function of eukaryotic genomes. Two primary categories of these small RNAs—small interfering RNAs (siRNAs) and microRNAs (miRNAs)—act in both somatic and germline lineages in a broad range of eukaryotic species to regulate endogenous genes and to defend the genome from invasive nucleic acids. Recent advances have revealed unexpected diversity in their biogenesis pathways and the regulatory mechanisms that they access. Our understanding of siRNA- and miRNA-based regulation has direct implications for fundamental biology as well as disease etiology and treatment.” (p. 642)
“In the last decade, few areas of biology have been transformed as thoroughly as RNA molecular biology. This transformation has occurred along many fronts, as detailed in this issue, but one of the most significant advances has been the discovery of small (~20–30 nucleotide [nt]) noncoding RNAs that regulate genes and genomes. This regulation can occur at some of the most important levels of genome function, including chromatin structure, chromosome segregation, transcription, RNA processing, RNA stability, and translation. The effects of small RNAs on gene expression and control are generally inhibitory, and the corresponding regulatory mechanisms are therefore collectively subsumed under the heading of RNA silencing.” (p. 642)
“Single-stranded forms of both miRNAs and siRNAs were found to associate with effector assemblies . . . known as RNA-induced silencing complexes (RISCs). . . .
“In all cases, the identities of the genes to be silenced are specified by the small RNA component, which recognizes each target by Watson-Crick base pairing. Accordingly, miRNA and siRNA silencing is readily reprogrammable. When changing circumstances require different expression patterns of endogenous genes, the silencing machinery can be redirected through the expression of new miRNAs and the dilution or removal of old ones. Similarly, when the genome faces new threats from novel invaders, it can exploit the foreign sequences themselves by co-opting them into the siRNA mechanism, thereby suppressing expression from invasive genes and responding adaptively to the threat.” (p. 643)
“Small RNAs as Guardians of the Genome” (Colin D. Malone and Gregory J. Hannon, Cell, Vol. 136 No. 4 [Feb 20, 2009], pp. 656-668)
[ABSTRACT:] “Transposons populate the landscape of all eukaryotic genomes. Often considered purely genomic parasites, transposons can also benefit their hosts, playing roles in gene regulation and in genome organization and evolution. Peaceful existence with mobile elements depends upon adaptive control mechanisms, since unchecked transposon activity can impact long-term fitness and acutely reduce the fertility of progeny. Here, we review the conserved roles played by small RNAs in the adaptation of eukaryotes to coexist with their genomic colonists. An understanding of transposon-defense pathways has uncovered reurring themes in the mechanisms by which genomes distinguish ‘self’ from ‘non-self’ and selectively silence the latter.” (p. 656).
“For decades, researchers have sought to understand our relationship to and coexistence with the mobile elements that colonize our genomes. Genetic studies have sought to probe mechanisms of transposon control by understanding circumstances in which it is lost. These studies tended to underscore the deleterious effects of unregulated activity. However, it has been apparent from the moment of their discovery, inherent in their being dubbed ‘control elements’ by McClintock (1951), that the relationship between host genomes and transposons might be more mutualistic. Proposed positive roles for transposons have taken many forms, and a few selected case studies serve as examples.” (p. 657)
“Despite some clear benefits of colonization, any symbiotic relationship between a transposon and its host depends heavily on the ability of the host to tame an element’s more aggressive tendencies. The heterogeneous nature of transposon families requires flexible recognition and control mechanisms. That niche has been filled in many eukaryotic organisms by pathways that use small RNAs to guide silencing, which we discuss below. . . .
“The discovery of RNA interference (RNAi) has tranformed our understanding of gene regulation, mechanisms of heterochromatin formation, and transposon control. . . . The term RNAi has come to encompass an increasingly broad family of related pathways, in which small RNAs from ~20–30 nucleotides in length serve as guides to target recognition and regulation. In the canonical RNAi pathway, small RNAs are generated from double-stranded precursors by a ribonuclease enzyme termed, Dicer. . . . Small RNAs act in complex with a second defining component of RNAi-related pathways, the Argonaute (AGO) proteins, together forming the RNA-induced silencing complex (RISC). . . . AGO proteins are characterized by the presence of a PAZ and PIWI domain, which fold to form a channel in which a single-stranded small RNA guide is held at each end by one of its constituent domains. . . . The PIWI domain also harbors nuclease activity. This is formed from a ribonuclease H-like motif and is capable of cleaving RNA transcripts as directed by the small RNA. In addition to target cleavage, RISC can also inhibit protein synthesis and direct chromatin modifications that ultimately lead to transcriptional repression. . . .” (pp. 657f.)
“The slow, painful death of junk DNA” (Robert W. Carter, Jun 9, 2009)
This noteworthy creationist article explains why it is that so many evolutionists continue to cling to the evolutionary concept of “junk DNA.” <http://creation.com/junk-dna-slow-death>
“Long Noncoding RNA as Modular Scaffold of Histone Modification Complexes” (Miao-Chih Tsai, Ohad Manor, Yue Wan, Nima Mosammaparast, Jordon K. Wang, Fei Lan, Yang Shi, Eran Segal, Howard Y. Chang, Science, Vol. 329 [Aug 6, 2010], pp. 689-693)
[ABSTRACT:] “Long intergenic noncoding RNAs (lincRNAs) regulate chromatin states and epigenetic inheritance. Here, we show that the lincRNA HOTAIR serves as a scaffold for at least two distinct histone modification complexes. A 5′ domain of HOTAIR binds polycomb repressive complex 2 (PRC2), whereas a 3′ domain of HOTAIR binds the LSD1/CoREST/REST complex. The ability to tether two distinct complexes enables RNA-mediated assembly of PRC2 and LSD1 and coordinates targeting of PRC2 and LSD1 to chromatin for coupled histone H3 lysine 27 methylation and lysine 4 demethylation. Our results suggest that lincRNAs may serve as scaffolds by providing binding surfaces to assemble select histone modification enzymes, thereby specifying the pattern of histone modifications on target genes.” (p. 689) <http://pubmedcentralcanada.ca/pmcc/articles/PMC2967777/>
“A Large Intergenic Noncoding RNA Induced by p53 Mediates Global Gene Repression in the p53 Response” (Maite Huarte, Mitchell Guttman, David Feldser, Manuel Garber, Magdalena J. Koziol, Daniela Kenzelmann-Broz, Ahmad M. Khalil, Or Zuk, Ido Amit, Michal Rabani, Laura D. Attardi, Aviv Regev, Eric S. Lander, Tyler Jacks, and John L. Rinn, Cell, Vol. 142 No. 3 [Aug 6, 2010], pp. 409-419)
[ABSTRACT:] “Recently, more than 1000 large intergenic noncoding RNAs (lincRNAs) have been reported. . . . Here, we report the identification of lincRNAs that are regulated by p53. One of these lincRNAs (lincRNA-p21) serves as a repressor in p53-dependent transcriptional responses. Inhibition of lincRNA-p21 affects the expression of hundreds of gene targets enriched for genes normally repressed by p53. The observed transcriptional repression by lincRNA-p21 is mediated through the physical association with hnRNP-K. This interaction is required for proper genomic localization of hnRNP-K at repressed genes and regulation of p53 mediates apoptosis. We propose a model whereby transcription factors activate lincRNAs that serve as key repressors by physically associating with repressive complexes and modulate their localization to sets of previously active genes.” (p. 409)
“In an attempt to understand the potential biological roles of lincRNAs, a method to infer putative function based on correlation in expression between lincRNAs and protein-coding genes was developed. These studies led to preliminary hypotheses about the involvement of lincRNAs in diverse biological processes, from stem cell pluripotency to cell-cycle regulation. . . . In particular, we observed a group of lincRNAs that are strongly associated with the p53 transcriptional pathway. p53 is an important tumor suppressor gene involved in maintaining genomic integrity. . . . In response to DNA damage, p53 becomes stabilized and triggers a transcriptional response that causes either cell arrest or apoptosis. . . .
“The p53 transcriptional response involves both activation and repression of numerous genes. While p53 is known to transcriptionally activate numerous genes, the mechanisms by which p53 leads to gene repression have remained elusive. We recently reported evidence that many lincRNAs are physically associated with repressive chromatin modifying complexes and suggested that they may serve as repressors in transcriptional regulatory networks. . . . We therefore hypothesized that p53 may repress genes in part by directly activating lincRNAs, which in turn regulate downstream transcriptional repression.
“Here, we show that lincRNAs play a key regulatory role in the p53 transcriptional response. By exploiting multiple independent cell-based systems, we identify lincRNAs that are transcriptional targets of p53. Moreover, we find that one of these p53-activated lincRNAs—termed lincRNA-p21—serves as a transcriptional repressor in the p53 pathway and plays a role in triggering apoptosis. We further demonstrate that lincRNA-p21 binds to hnRNP-K. This interaction is required for proper localization of hnRNP-K and transcriptional repression of p53-regulated genes. Together, these results reveal insights into the p53 transcriptional response and lead us to propose that lincRNAs may serve as key regulatory hubs in transcriptional pathways.” (pp. 409f.)
“Thus, the apoptosis response is both p53 dependent and lincRNA-p21 dependent, with this dependence confirmed in multiple cell types and conditions. . . . Collectively, these observations demonstrate that lincRNA-p21 plays an important role in the p53-dependent induction of cell death.” (p. 415)
“Collectively, our results indicate that lincRNA-p21 is a direct p53 transcriptional target in response to DNA damage, acts to repress genes that are downregulated as part of the canonical p53 transcriptional response, is necessary for p53 dependent apoptotic responses to DNA damage in our cell-based systems, and functions at least in part through interaction with hnRNP-K by modulating hnRNP-K localization. . . .
“It is clear that mammalian genomes encode numerous large noncoding RNAs. . . . Here, we demonstrate that numerous lincRNAs are key constituents in the p53-dependent transcriptional pathway. Moreover, we observed that some of these lincRNAs are bound by p53 in their promoter regions and sufficient to drive p53-dependent reporter activity that requires the consensus p53-binding motif, suggesting that these lincRNAs are bona fide p53 transcriptional targets.
“Having discovered multiple lincRNAs in the p53 pathway, we decided to focus on one such lincRNA in particular: lincRNA-p21. Intrigued by its properties (genomic location upstream of p21, p53-dependent activation requiring the consensus p53 motif, which is bound by p53 and conserved p53-dependent activation of this gene in both human and mouse cell-based systems), we explored the functional roles of lincRNA-p21. Our studies revealed a role for lincRNA-p21 in a p53-dependent apoptotic response after DNA damage.
“We further observed that siRNA-mediated inhibition of lincRNA-p21 affects the expression of hundreds of gene targets that are enriched for genes normally repressed by p53 in both the MEF and RAS cell-based systems. Strikingly, the vast majority of these common target genes are derepressed upon inhibition of either p53 or lincRNA-p21—suggesting that lincRNA-p21 functions as a downstream repressor in the p53 transcriptional response.” (pp. 416f.)
“Long Noncoding RNAs with Enhancer-like Function in Human Cells” (Ulf Andersson Ørom, Thomas Derrien, Malte Beringer, Kiranmai Gumireddy, Alessandro Gardini, Giovanni Bussotti, Fan Lai, Matthias Zytnicki, Cedric Notredame, Qihong Huang, Roderic Guigo, and Ramin Shiekhattar, Cell, Vol. 143 No. 1 [Oct 1, 2010], pp. 46-58)
[SUMMARY:] “While the long noncoding RNAs (ncRNAs) constitute a large portion of the mammalian transcriptome, their biological functions has [sic] remained elusive. A few long ncRNAs that have been studied in any detail silence gene expression in processes such as X-inactivation and imprinting. We used a GENCODE annotation of the human genome to characterize over a thousand long ncRNAs that are expressed in multiple cell lines. Unexpectedly, we found an enhancer-like function for a set of these long ncRNAs in human cell lines. Depletion of a number of ncRNAs led to decreased expression of their neighboring protein-coding genes, including the master regulator of hematopoiesis, SCL (also called TAL1), Snai1 and Snai2. Using heterologous transcription assays we demonstrated a requirement for the ncRNAs in activation of gene expression. These results reveal an unanticipated role for a class of long ncRNAs in activation of critical regulators of development and differentiation.” (p. 46)
[INTRODUCTION:] “Recent technological advances have allowed the analysis of the human and mouse transcriptomes with an unprecedented resolution. These experiments indicate that a major portion of the genome is being transcribed and that protein-coding sequences only account for a minority of cellular transcriptional output. . . . Discovery of RNA interference (RNAi) . . . in C. elegans and the identification of a new class of small RNAs known as microRNAs . . . led to a greater appreciation of RNA’s role in regulation of gene expression. MicroRNAs are endogenously expressed noncoding transcripts that silence gene expression by targeting specific mRNAs on the basis of sequence recognition. . . . Over 1000 microRNA loci are estimated to be functional in humans, modulating roughly 30% of protein-coding genes. . . .” (p. 46)
“We identified 3019 putative long ncRNAs that display differential patterns of expression. Functional knockdown of multiple ncRNAs revealed their positive influence on the neighboring protein-coding genes. Furthermore, detailed functional analysis of a long ncRNA adjacent to the Snai1 locus using reporter assays demonstrated a role for this ncRNA in an RNA-dependent potentiation of gene expression. Our studies suggest a role for a class of long ncRNAs in positive regulation of protein-coding genes.” (pp. 46f.)
“Taken together, the novelty of our work lies in the following. First we show that at multiple loci of the human genome depletion of a long ncRNA leads to a specific decrease in the expression of neighboring protein-coding genes. Previous studies analyzing the function of long ncRNAs in X-inactivation or the imprinting phenomenon point to their role in silencing of gene expression. . . . Second, we show that the enhancement of gene expression by ncRNAs is not cell specific as we observe the effect in five different cell lines. Third, this enhancement of gene expression is mediated through RNA, as depletion of such activating ncRNAs abrogate increased transcription of the neighboring genes. Fourth, through the use of heterologous reporter assays, we suggest that activating ncRNAs mediate this RNA-dependent transcriptional responsiveness in cis. Fifth, we show that similar to classically defined distal activating sequences, ncRNA-mediated activation of gene expression is orientation independent. Sixth, we present evidence that similar to defined activating sequences, ncRNAs cannot drive transcription in the absence of a proximal promoter. Finally, we demonstrate that the activation of gene expression in the heterologous reporter system is mediated through RNA as multiple approaches depleting the RNA levels lead to abrogation of the stimulatory response. Therefore, we have uncovered a new biological function in positive regulation of gene expression for a class of ncRNAs in human cells.” (p. 55)
“The Long Noncoding RNA, Jpx, Is a Molecular Switch for X Chromosome Inactivation” (Di Tian, Sha Sun, and Jeannie T. Lee, Cell, Vol. 143 No. 3 [Oct 29, 2010], pp. 390-403)
“. . . the X-inactivation center (Xic) is . . . dominated by large noncoding RNAs (ncRNA). X chromosome inactivation (XCI) equalizes gene expression between mammalian males and females by inactivating one X in female cells. XCI requires Xist, an ncRNA that coats the X and recruits Polycomb proteins. How Xist is controlled remains unclear but likely involves negative and positive regulators. For the active X, the antisense Tsix [Xist spelled backwards] RNA is an established Xist repressor. For the inactive X, here, we identify Xic-encoded Jpx as an Xist activator. Jpx is developmentally regulated and accumulates during XCI. Deleting Jpx blocks XCI and is female lethal.” (p. 390)
“Our work demonstrates that Xist is controlled by two parallel switches—Tsix for Xa [the active X chromosome] and Jpx for Xi [the inactivated X chromosome]. Whereas Tsix represses Xist on Xa, Jpx activates Xist on Xi. How Jpx RNA transactivates Xist is yet to be determined, but it is intriguing that expression of one long ncRNA would be controlled by another.” (p. 397)
“Once induced, Jpx RNA remains at high levels in somatic cells . . ., implying that continued presence of the activator may be necessary for lifelong Xist expression in the female. Jpx may also play other roles during development, given that the Tsix-Jpx double mutant rescues Xist expression but does not fully rescue cell death.
“In conclusion, our study identifies Jpx as an RNA-based activator of Xist and supports a dynamic balance of activators and repressors for XCI control. The fate ofXist appears to be determined by a series of Xic-encoded RNA switches, reinforcing the idea that long ncRNAs may be ideally suited to epigenetic regulation involving allelic and locus-specific control. . . .” (p. 399)
“Genome may be full of junk after all” (Tina Hesman Saey, Science News, Vol. 178 No. 12 [Dec 4, 2010], p. 17)
“Most of the human genome may actually be junk.
“In recent years scientists have stopped dismissing as nonfunctional the part of the genome that doesn’t produce proteins. But a new study comparing the human genetic blueprint with those of other mammals concludes that very little of the human genome is really necessary.
“About 7 percent of the human genome is similar to the DNA of other mammals, said Arend Sidow of Stanford University. Because it is similar, or ‘conserved,’ geneticists assume this DNA is the most integral. In all, Sidow concludes, these important parts of the genome comprise only 225 million of the 3 billion chemical letters of DNA found in the complete human genetic instruction book. . . .
“Sidow’s studies rely on the principle that if certain pieces of DNA are retained throughout evolution, they must be important. Things that aren’t conserved by evolution are less likely to be required for basic functions. . . .
“But some of Sidow’s colleagues think his analysis may be missing some crucial elements. Recent studies of RNA molecules that don’t code for proteins show that those molecules have definite functions, even though they aren’t conserved in the DNA codes of other mammals, said Job Dekker of the University of Massachusetts Medical School in Worcester. ‘Lots of things that are important are not conserved,’ he said. And current computer programs may not be very good at picking out small DNA regions shared among many species, he added.” (p. 17)
“No-Nonsense Functions for Long Noncoding RNAs” (Minireview by Takashi Nagano and Peter Fraser, Cell, Vol. 145 [Apr 15, 2011], pp. 178-181)
[SUBHEADLINE:] “The mysterious secrets of long noncoding RNAs, often referred to as the Dark Matter of the genome, are gradually coming to light. Several recent papers dig deep to reveal surprisingly complex and diverse functions of these enigmatic molecules.” (p. 178)
“Noncoding RNAs (ncRNAs) differ from their better known counterpart messenger RNAs (mRNAs), by virtue of the fact that the sequence of bases contained within them do not encode proteins. They are generally divided into two classes based on an arbitrary length cutoff. Those under 200 nucleotides are usually referred to as short/small ncRNAs, including the microRNAs (miRNAs), and those greater than 200 bases are known as long noncoding RNAs (lncRNAs). Though several lncRNAs have been known for decades, the looming giant of lncRNAs was not fully exposed until genome-wide transcriptome studies revealed that approximately 10- to 20-fold more genomic sequence is transcribed to lncRNA than to protein-coding RNA. This potential treasure trove of thousands of lncRNAs has attracted intense scientific interest with the alluring possibility of finding new molecules and mechanisms that could shed light on organismal complexity. However, as lncRNA sequences are by definition noncoding, their potential functions are opaque to classical methods of making sense of genomic sequence. A rash of recent papers reveals that lncRNAs are important and powerful cis– and trans-regulators of gene activity that can function as scaffolds for chromatin-modifying complexes and nuclear bodies, as enhancers and as mediators of long-range chromatin interactions. . . .
“The most well-known lncRNA is Xist, which plays an essential role in X inactivation. During female development, Xist RNA is expressed from the inactive X and ‘coats’ the X chromosome from which it is transcribed, leading to recruitment of Polycomb repressive complex 2 (PRC2), which trimethylates histone H3 at lysine 27 to silence transcription. Through its interaction with the X chromosome, Xist appears to create a nuclear compartment that excludes RNA polymerase II (RNAPII). . . . Other lncRNAs such asAir and Kcnq1ot1 also create repressive environments that may recruit and silence specific cis-linked gene loci by interacting with chromatin and targeting repressive histone modifiers. . . . Though regulation of Xist transcription is not fully understood, it is clear that an overlapping antisense lncRNA, called Tsix, represses Xistexpression in cis. Other lncRNAs such as Xcite and RepA also contribute to ensure that only one X chromosome is inactivated, by enhancing Tsix expression on the active X and upregulating Xist on the inactive X, respectively. Recent evidence suggests that both Tsix and RepA are able to bind PRC2 directly. . . . Thus the major effector of X chromosome silencing, Xist, is itself controlled by a complex interplay of other cis-acting lncRNAs, some which have been shown to function through recruitment of chromatin modification complexes. . . .
“Unlike the cis-acting lncRNAs described above, a recent screen for lincRNAs (long intergenic noncoding RNAs) regulated by the tumor suppressor transcription factor p53 has revealed a lincRNA that targets silencing activity to multiple genes located throughout the genome. . . . In response to DNA damage, p53 triggers the activation or repression of numerous genes resulting in either cell-cycle arrest or apoptosis. Using inducible p53 cell systems, Huarte et al. showed that p53 regulates several lincRNAs, and one of them, lincRNA-p21, acts as a transcriptional repressor turning off multiple genes during the p53 response. Knockdown of either p53 or lincRNA-p21 resulted in changes of expression of over 1000 genes, most of which were common to both knockdowns, and most of these resulted in gene derepression. The promoter of lincRNA-p21 is directly activated by p53 binding in response to DNA damage. lincRNA-p21 activity appears to trigger apoptosis rather than cell-cycle arrest. A search for factors that interact with lincRNA-p21 identified heterogeneous nuclear ribonucleoprotein K (hnRNP-K), a component of a repressor complex that acts in the p53 pathway. hnRNP-K interacted with a 5′ domain of lincRNA-p21 that was necessary but not sufficient to induce apoptosis, suggesting that other regions of the RNA are required to recruit other factors or target the complex to chromatin or both. Thus, lincRNA-p21 is a trans-acting downstream repressor of multiple genes in the p53 pathway, potentially explaining how p53 can activate many genes while simultaneously repressing many others.” (p. 178)
“Another potentially large lncRNA group is enhancer-related RNAs. Kim et al. (2010) found that many of the ~12,000 neuronal activity-regulated enhancers in the mouse genome are transcribed bidirectionally by RNAPII to yield noncoding enhancer RNAs (eRNAs). The expression level of eRNAs generally correlates with that of nearby protein-coding (target) genes, and in at least one example, eRNA expression required an intact target gene promoter, suggesting a reciprocal interaction between enhancers and promoters during promoter activation.“(p. 179)
“. . . further examination of the Xist regulation paradigm has revealed a new, potentially trans-acting activator lncRNA. The Jpx lncRNA is located upstream of the Xisttranscription unit and positively regulates Xist expression. . . . Deletion or knockdown of Jpx led to failure of Xist upregulation and Xist coating of the X chromosome during differentiation of female [mouse] ES [embryonic stem] cells whereas it had no effect in male cells. Surprisingly, deletion of a single copy of Jpx in female ES cells did not result in preferential inactivation of the wild-type chromosome. Such skewing of the normally random X inactivation process usually occurs when Xist expression is disrupted on one of the X chromosomes. Instead, Jpx deletion heterozygotes had less than the expected 50% of residual Jpx RNA and showed a dramatic failure in Xist coating and X inactivation. Xist expression and X inactivation could be rescued by a Jpx transgene located on another chromosome, indicating that Jpx can exert its effects in trans. Exactly how Jpxaugments Xist expression or indeed how the two Jpx alleles cooperate to control their expression in female cells are [sic] not known. The fact that Jpx is also upregulated during male ES cell differentiation without consequent upregulation of Xist suggests that it does not work alone. Chureau and colleagues (2010) report that Ftx, another conserved lncRNA located just downstream of Jpx, also positively affects Xist expression. Like Jpx, Ftx partially escapes X inactivation, meaning that it is transcribed from both the active and inactive X chromosomes. However, unlike Jpx, Ftx is upregulated specifically in female cells at the time of Xist upregulation and X inactivation. . . . Importantly, neither Jpx nor Ftx appear [sic] to function merely as negative regulators of Tsix. Together with Tsix, RepA, and Xcite, they begin to flesh out a complex and elaborate regulatory network of multiple lncRNAs that affect Xist expression and X inactivation [through] cis and trans silencing and activation mechanisms.
“With all the varied and powerful functions of lncRNAs, it is perhaps not surprising that they have been implicated in global remodeling of the epigenome and gene expression during reprogramming of somatic cells to induced pluripotent stem cells (iPSCs).” (p. 180)
“In a slightly different twist on the emerging theme of lncRNAs acting as scaffolds for factors that target chromatin and gene expression, recent live-cell results show that lncRNAs can also act as platforms for the assembly of dynamic nuclear structures. . . .” (p. 181)
“These exciting new functions and potential mechanisms of lncRNAs, combined with the unexplored enormity of noncoding transcripts in higher organisms, suggest that many new roles in gene control and genome and nuclear organization are likely to be uncovered. How many of the remaining thousands of lncRNAs will be functional is difficult to say, but it is now clear that it is not all junk, derived from promiscuous transcription. A strong emerging theme is the apparent ability to function as scaffolds for regulatory factors that then target those factors to gene loci, which might be accomplished in several ways. Some lncRNAs may recruit chromatin-modifying complexes to the site of their transcription, whereas others target chromatin modifiers to distant loci. Formation of a nuclear compartment enriched with chromatin modifiers or other regulatory factors may enable efficient control of multiple loci simultaneously; however, it is also possible that lncRNAs act as mobile scaffolds that target individual genes in a manner analogous to a transcription factor. In addition, lncRNAs are involved in forming higher-order chromatin loops and can act as scaffolds for the assembly of proteins involved in formation of nuclear structures and functional nuclear subcompartments. It appears that dynamic protein assembly onto nascent lncRNA seeds is a common theme, suggesting that synthesis of new lncRNAs could rapidly form regulatory complexes with the potential to target ubiquitous regulatory factors fo implement diverse gene expression patterns during differentiation, development, and reprogramming.” (p. 181)
“A Long Noncoding RNA Controls Muscle Differentiation by Functioning as a Competing Endogenous RNA” (Marcella Cesana, Davide Cacchiarelli, Ivano Legnini, Tiziana Santini, Olga Sthandler, Mauro Chinappi, Anna Tramontano, and Irene Bozzoni, Cell, Vol. 147 [Oct 14, 2011], pp. 358-369)
[SUMMARY:] “Recently, a new regulatory circuitry has been identified in which RNAs can crosstalk with each other by competing for shared microRNAs. Such competing endogenous RNAs (ceRNAs) regulate the distribution of miRNA molecules on their targets and thereby impose an additional level of post-transcriptional regulation. Here we identify a muscle-specific long noncoding RNA, linc-MD1, which governs the time of muscle differentiation by acting as a ceRNA in mouse and human myoblasts.Downregulation or overexpression of linc-MD1 correlate with retardation or anticipation of the muscle differentiation program, respectively. We show that linc-MD1 ‘sponges’ miR-133 and miR-135 to regulate the expression of MAML1 and MEF2C, transcription factors that activate muscle-specific gene expression. Finally, we demonstrate that linc-MD1 exerts the same control over differentiation timing in human myoblasts, and that its levels are strongly reduced in Duchenne muscle cells. We conclude that the ceRNA network plays an important role in muscle differentiation.” (p. 358)
“So far, a large range of functions has been attributed to lncRNAs [long noncoding RNAs] . . ., such as modulation of apoptosis and invasion . . ., reprogramming of induced pluripotent stem cells . . ., marker of cell fate . . . and parental imprinting . . ., indicating that they may represent a major regulatory component of the eukaryotic genome.
“A specific mode of action in mediating epigenetic changes through recruitment of the Polycomb Repressive Complex (PRC) was described for the Xist and HOTAIR transcripts. . . . lncRNAs were also found to act in the nucleus as antisense transcripts or as decoy for splicing factors leading to splicing malfunctioning. . . . In the cytoplasm, lncRNAs were described to transactivate STAU1-mediated mRNA decay by duplexing with 3′ UTRs via Alu elements . . . or, in the case of pseudogenes, to compete for miRNA binding, thereby modulating the derepression of miRNA targets. . . .
“These findings have prompted studies directed toward the identification of the circuitries that are regulated by these molecules.” (p. 358)
“Among miRNAs specifically expressed in muscle tissue, the most widely studied are members of the miR-1/206 and miR-133a/133b families, which originate from three separate chromosomes. . . .
“In this study, through a detailed analysis of the genomic region of miR-206/133b, we discovered the existence of a muscle specific lncRNA and defined its expression profile and function. We demonstrated that this lncRNA is involved in the timing of muscle differentiation and acts as a natural decoy for miRNAs, playing a crucial role in the control of factors involved in the myogenic program.” (p. 359)
“MicroRNAs in Stress Signaling and Human Disease” (Joshua T. Mendell and Eric N. Olson, Cell, Vol. 148 [Mar 16, 2012], pp. 1172-1187)
[ABSTRACT:] “Disease is often the result of an aberrant or inadequate response to physiologic and pathophysiologic stress. Studies over the last 10 years have uncovered a recurring paradigm in which microRNAs (miRNAs) regulate cellular behavior under these conditions, suggesting an especially significant role for these small RNAs in pathologic settings. Here, we review emerging principles of miRNA regulation of stress signaling pathways and apply these concepts to our understanding of the roles of miRNAs in disease. These discussions further highlight the unique challenges and opportunities associated with the mechanistic dissection of miRNA functions and the development of miRNA-based therapeutics.” (p. 1172)
“MicroRNAs (miRNAs) were first discovered in the early 1990s through the analysis of developmental timing mutants in C. elegans. . . . It was not until after 2001, however, that a dedicated field focused on the study of these regulatory RNAs coalesced, following the identification of numerous endogenously expressed small RNAs in worms, flies, and mammals. . . . In the ensuing decade, the study of miRNA biology has attracted remarkable attention, resulting in rapid advances. We have since learned that mammalian genomes encode ~300 conserved miRNA genes, and high-throughput sequencing studies have identified ~1,000 or more additional loci that produce small RNAs structurally resembling miRNAs. However, since these additional miRNAs tend to be poorly conserved and expressed at low levels, their functional significance is unclear. . . . Small RNA cloning and analysis have also revealed the presence of other types of silencing RNAs in mammals, including endogenous short-interfering RNAs (endo-siRNAs) and germ-line-restricted piwi-interacting RNAs (piRNAs). . . . Nevertheless, among the varied classes of small RNAs in mammals, miRNAs appear to have a uniquely important role in disease phenotypes and we therefore focus our attention on their functions herein.
“In this review, we synthesize our current understanding of the physiologic roles of miRNAs in mammalian biology and the manner in which miRNA activities, both normal and aberrant, contribute to disease. Rather than attempting to present a comprehensive survey of the numerous studies that have linked miRNAs to individual disease phenotypes, we emphasize what appear to be the emerging themes of miRNA function gained from their evaluation in vivo. Within this context of normal miRNA biology, we can begin to understand the consequences when miRNA activities go awry. From recent studies, it has become apparent that miRNAs rarely contribute significantly to the establishment of the mammalian body plan and the specification of diverse cell lineages. While at first blush this appears to indicate a lesser role for miRNAs in mammalian biology compared to gene products such as developmentally regulated transcription factors, their extreme evolutionary conservation would appear to argue otherwise. Accordingly, further study has revealed that miRNAs often profoundly influence the responses of fully developed tissues to physiologic and pathophysiologic stress. . . . This functional niche suggests a central role for miRNA-regulated networks in disease states, which often represent an insufficient or aberrant response under conditions of stress or injury. Moreover, this role dictates that, in contrast to classic developmental regulators identified through forward genetic screens, miRNA loss of function may rarely result in highly penetrant phenotypes in controlled laboratory environments. Although somewhat ironic given that the founding miRNA lin-4 was discovered by virtue of its strong developmental phenotype in C. elegans . . ., an appreciation of this prominent role for mammalian miRNAs in dictating cellular responses in fully developed tissues is essential for appropriate hypothesis generation, since phenotypes resulting from miRNA deletions may only be revealed in the setting of the appropriate perturbagen.” (p. 1172)
“Approximately one-third of known miRNAs are embedded within introns of protein-coding genes and are co-transcribed with the host gene, allowing for coordinate regulation of miRNA and protein expression. In some cases, intronic miRNAs have been shown to modulate the same biological process as the protein encoded by that gene. This is exemplified by miR-33 family members, which cooperate with the sterol regulatory element binding protein (SREBP) genes in which they are embedded to reduce cholesterol efflux and increase cholesterol biosynthesis. . . .” (p. 1173)
“Many different algorithms exist for the bioinformatic prediction of miRNA targets and all generally predict hundreds of targets for each miRNA. . . . These highly complex target networks pose a significant challenge to the mechanistic dissection of miRNA-mediated phenotypes. The prevailing model posits that miRNAs function by fine-tuning the expression of numerous targets. While each target is regulated subtly (typically less than a 2-fold change in individual target protein abundance results from gain or loss of miRNA function), the additive effect of coordinated regulation of a large suite of transcripts is believed to result in strong phenotypic outputs.Unfortunately, this hypothesis is nearly impossible to test directly since it is not logistically feasible to simultaneously restore the levels of many targets to their natural levels in the setting of miRNA gain or loss of function in vivo. Therefore, any conclusions drawn from these types of target analyses will be correlative.” (p. 1173)
“As described in detail throughout this review, a predominant paradigm that has emerged from miRNA gain- and loss-of-function studies is that miRNA dysregulation is well-tolerated in normal tissues yet can profoundly influence the behavior of cells and tissues experiencing pathologic stress. It follows from this concept that miRNA inhibition or delivery may provide a highly potent means to modulate a disease process while avoiding unwanted toxic effects in normal tissues. This potential for a wide therapeutic window has stimulated significant effort to develop miRNA-targeted therapeutics. . . .
“Not surprisingly, many miRNAs appear to play beneficial rather than pathologic roles in settings of disease. Thus, the development of miRNA mimics represents an important therapeutic goal.” (p. 1182)
“A Role for Small RNAs in DNA Double-Strand Break Repair” (Wei Wei, Zhaoqing Ba, Min Gao, Yang Wu, Yanting Ma, Simon Amiard, Charles I. White, Jannie Michaela Rendtlew Danielsen, Yun-Gui Yang, and Yijun Qi, Cell, Vol. 149 [Mar 30, 2012], pp. 101-112)
[SUMMARY:] “Here we show that ~21-nucleotide small RNAs are produced from the sequences in the vicinity of DSB [DNA double-strand break] sites in Arabidopsis and in human cells. We refer to these as diRNAs for DSB-induced small RNAs. . . . In Arabidopsis, diRNAs are recruited by Argonaute 2 (AGO2) to mediate DSB repair. Knock down of Dicer or Ago2 in human cells reduces DSB repair. Our findings reveal a conserved function for small RNAs in the DSB repair pathway. We propose that diRNAs may function as guide molecules directing chromatin modifications or the recruitment of protein complexes to DSB sites to facilitate repair.” (p. 101)
“DNA double-strand breaks (DSBs) are deleterious forms of DNA damage that cause mutations, genome instability, and cell death. Efficient repair of DSBs is thus critical for the maintenance of genome integrity and cell survival.” (p. 101)
“In light of the expanding universe of small RNAs and their increasingly diverse biological roles, we explored whether small RNAs could play a role in DSB repair. Using well-established reporter assays for DSB repair in Arabidopsis thaliana and human cells, we found that DSBs trigger the production of small RNAs from the sequences in the vicinity of DSB sites and these small RNAs are required for efficient DSB repair. Our results reveal an unsuspected and conserved role for small RNAs in the DSB repair pathway.” (p. 102)
“In this study, we established an important role for small RNAs in DSB repair, adding an unsuspected RNA component to the DSB repair signaling pathway. Importantly, we demonstrated that this layer of DSB repair regulation is conserved: diRNAs are produced in both plant and human cells and interfering with their production has severe effects on DSB repair.
…“In summary, we have demonstrated that small RNAs generated from the sequences flanking a DSB are important for efficient DSB repair.” (p. 109)
“A piRNA to Remember” (Danesh Moazed, Cell, Vol. 149 [Apr 27, 2012], pp. 512-514)
[ABSTRACT:] “In this issue of Cell, Rajasethupathy et al. report a surprising role for piRNAs, previously thought to act mainly in the animal germline to silence transposons, in transcriptional regulation of plasticity-related genes in the central nervous system of the sea slug Aplysia californica. The findings expand the functions of small RNAs and have important implications for our understanding of how transient signals can give rise to long-term memories.” (p. 512)
“In a series of beautifully executed and compelling experiments, Kandel, Tuschl, and co-workers now provide evidence that piRNAs are expressed in the central nervous system (CNS) and other somatic tissues in Aplysia and mediate CpG methylation and transcriptional silencing of a key plasticity-related gene, CREB2. . . .
…“In the course of using high-throughput sequencing to screen for miRNAs in the Aplysia CNS that might regulate long-term memory, the authors noticed the presence of a second class of small RNAs 27-30 nt in length, the characteristic size of piRNAs. . . .
…“In all, the authors define 372 distinct piRNA clusters in Aplysia. . . . a convincing set of results establishes the presence of CNS piRNAs and Piwi in Aplysia.” (p. 513)
“Roles for MicroRNAs in Conferring Robustness to Biological Processes” (Margaret S. Ebert and Phillip A. Sharp, Cell, Vol. 149 [Apr 27, 2012], pp. 515-524)
[ABSTRACT:] “Increasing evidence suggests that, among their roles as posttranscriptional repressors of gene expression, microRNAs (miRNAs) help to confer robustness to biological processes by reinforcing transcriptional programs and attenuating aberrant transcripts, and they may in some network contexts help suppress random fluctuations in transcript copy number. These activities have important consequences for normal development and physiology, disease, and evolution.” (p. 515)
“MicroRNAs (miRNAs) are hairpin-derived RNAs ~20–24 nucleotides (nt) long, which posttranscriptionally repress the expression of target genes usually by binding to the 3′ UTR of messenger RNA (mRNA). As a class, miRNAs constitute about 1%–2% of genes in worms, flies, and mammals. . . . Their regulatory potential is vast: more than 60% of protein-coding genes are computationally predicted as targets based on conserved base-pairing between the 3′ UTR and the 5′ region of the miRNA, which is called the seed. . . . Although many miRNAs and their target binding sites are deeply conserved, which suggests important function, a typical miRNA-target interaction produces only subtle reduction (<2-fold) in protein level, and many miRNAs can be deleted without creating any obvious phenotype. Early observations of miRNA expression profiles revealed that miRNAs tend to be anticorrelated with target gene expression in contiguous developmental stages or tissues (Stark et al., 2005; Farh et al., 2005). Correspondingly, a view emerged that miRNA evolved primarily to play the role of a reinforcer, in that its activities cohere with transcriptional patterns to sharpen developmental transitions and entrench cellular identities. It is also possible that miRNAs buffer fluctuations in gene expression and more faithfully signal outcomes in the context of certain regulatory networks.
…“Robustness refers to a system’s ability to maintain its function in spite of internal or external perturbations. . . . The involvement of miRNAs in regulatory networks that provide developmental robustness is indicated by recent experiments in a variety of model organisms. It is also suggested by three general observations: (1) genes with tissue-specific expression have longer 3′ UTRs with more miRNA-binding sites . . .; (2) miRNA expression increases and diversifies over the course of embryonic development . . ., as 3′ UTRs are lengthened via alternative polyadenylation site choice . . .; and (3) the diversity of the miRNA repertoire in animal genomes has increased with increasing organismal complexity. . . . In this Review, we examine the current evidence for how miRNAs contribute to the robustness of biological processes.” (p. 515)
“One of the earliest functions attributed to miRNAs was sharpening developmental transitions by suppressing residual transcripts that were specific to the previous stage. Global gene expression analyses in fly, fish, and mouse have shown that miRNAs and their targets often have mutually exclusive RNA expression across tissues, especially in neighboring tissues derived from common progenitors. . . . This suggests that miRNAs can act to reinforce the transcriptional gene expression program by repressing leaky transcripts.
“More recently, sensitive gene expression profiling of cell types in the zebrafish embryo revealed not so much a stark mutual exclusion pattern but, rather, a tendency for anticorrelated but still overlapping expression of miRNAs and targets. . . . This suggests that miRNAs play a more prominent role than only reinforcing the patterns dictated by transcriptional regulation. In fact, a strongly transcribed, ubiquitously expressed actin transcript has its levels spatially sculpted by muscle-specific miRNAs in zebrafish. . . .” (pp. 515f.)
“The effect of an individual miRNA on a target’s protein level tends to be subtle, usually less than 2-fold. . . . Most loss-of-function mutations are recessive; thus, organisms are commonly able to compensate for a 2-fold loss of gene expression. Such differences may even be within the range of random variation in mRNA or protein level between different cells in a genetically identical population or in a given cell at different times. So how do miRNAs and target sites experience selective pressure, and how do miRNAs accomplish any significant regulation? For starters, there are miRNA-target interactions that involve multiple sites for a given target and confer much stronger repression, such as the interaction between the micro-RNA let-7 and the oncogene HMGA2. . . . More often, different miRNAs work together to cotarget a given mRNA, so their combined repressive effect greatly exceeds the individual contributions. On average, there are more than four highly conserved seed match sites per UTR considering all miRNAs and many more sites when more weakly conserved sequences are considered. . . .
“. . . a small change in the level of protein can sometimes have a large physiological effect, such as when a positive feedback loop amplifies the change. . . .
“Another mechanism by which a miRNA can increase its impact is by targeting a set of genes that are in a shared pathway or protein complex.” (p. 517)
“In spite of the large numbers of target genes predicted to be affected by miRNA loss of function, gene knockout experiments for individual miRNAs have yielded many disappointing results. In worms, most individual miRNA mutants show no gross phenotype . . .; the same is true for several of the mouse knockouts generated to date, including miR-21, miR-210, miR-214, miR-206, and miR-143. . . . A partial explanation for these results resides in the functional redundancy of many miRNAs that share their seed sequence with others. For example, the let-7 family members miR-48, miR-84, and miR-241 operate redundantly to control the L2-to-L3 larval transition in C. elegans. . . . Additionally, many miRNAs of different seed families work together to cotarget a given gene or set of genes, providing overlapping functions. To generate an observable impairment in the animal, it might be necessary to delete all members of a seed family and also nonseed family members that have a high degree of cotargeting.
“It is also possible that a mutant phenotype would only arise upon acute miRNA deletion if, during development, miRNA loss can be compensated at the level of gene expression or by one cell type populating a niche to assist an impaired or underpopulated cell type within an organ or system such as the immune system. . . . Even once an organ has developed, miRNAs may be required for maintenance: Dicer loss in the mouse thymic epithelium or the highly structured retina leads to progressive degeneration of tissue architecture. . . . However, there are several contrary examples in which deletion of Dicer and loss of all miRNAs in mature tissue do not appear to generate a phenotype. Deletion of Dicer in the mouse olfactory system had no apparent phenotype over periods of several months . . ., whereas the same deletion in developing olfactory tissue led to severe neurogenesis defects.
“Finally, a miRNA phenotype may appear only upon the application of certain internal or external stresses. The most well-characterized example of this mechanism is in the Drosophila eye, in which miR-7 plays a role in the determination of sensory organs. . . . Loss of miR-7 had little observable impact on the development of the sensory organs under normal, uniform conditions, and expression of the proneural transcription factor Atonal was also detected at wild-type level. . . . But when an environmental perturbation was added during larval development (i.e., fluctuating the temperature between 31°C and 18°C roughly every 90 min), the miR-7 mutant eyes showed abnormally low Atonal expression and abnormally high, irregular expression of the antineural transcription factor Yan. Sensory organ precursor (SOP) defects also appeared: some groups of antennal SOPs failed to develop or developed with abnormal patterning; their cells showed low Atonal levels. The ability of miR-7 to confer developmental robustness against temperature perturbations likely depends on its placement in a network of feedback and feedforward loops with Atonal and Yan. . . .
“In mice, deletion of the heart muscle-specific miRNA miR-208 has little phenotype under normal conditions but results in a failure to induce cardiac remodeling upon stress. . . . When the mice were treated to induce pressure overload or hypothyroidism, miR-208 activity was required in the cardiomyocytes to upregulate βMHC by targeting the thyroid receptor signaling pathway. The embryonic stem cell-specific miR-290-295 cluster is not required for cell viability until DNA damage stress, upon which it promotes cell survival. . . . In worms sensitized by mutations in a variety of regulatory pathways, 25 of 31 deleted miRNAs revealed a mutant phenotype . . .; these same deletions in a wild-type background did not produce a phenotype. These examples show the utility of assessing animal systems not only under standard laboratory conditions, but also with treatments that mimic the natural hardships and flaws that they might experience in the wild.” (pp. 517f.)
“miRNAs are surely not the only regulatory factors that contribute to system robustness. Whole-genome bioinformatic analysis of worm and fly reveal transcription factors enriched in feedforward loops as well. . . . Compared to transcriptional regulators, however, miRNAs do have some distinguishing features that may make them well suited in this role. As posttranscriptional regulators acting in the cytoplasmic compartment, miRNAs can intervene late in the pipeline of gene expression to counteract variation from the upstream processes of transcription, splicing, and nuclear export. They are able to regulate transcripts in special compartments, such as maternally deposited transcripts in the early embryo . . . and locally translated transcripts of dendrites far from the cell body of neurons. They can also be present at high concentrations (10,000s of molecules per cell) by virtue of being very stable (e.g., the heart muscle-specific miR-208 has an in vivo half-life of > 1 week). . . . This is consistent with theoretical constraints indicating the need for many more molecules of a regulator to achieve a small reduction in the noise of a target gene. . . . miRNA expression profiling from progressive stages of T-lymphocyte development found that the total number of miRNAs expressed per cell changed in parallel with changes in total cellular RNA content, suggesting that global miRNA levels are tuned to the translational capacity of the cell. . . .” (pp. 519f.)
[CONCLUDING REMARKS:] “Multicellular organisms must manage the tasks of development and physiology in unpredictable, changing environments and with imperfect genetic and biochemical components. Random noise in gene expression must be dampened or, as in the case of some cell fate decisions, harnessed in a system control network to designate one fate or another among neighboring cells. Robustness goes beyond the job of keeping one state the same in the face of perturbations. In development, it can mean not sending a signal until the right time and then sending it strongly and irreversibly. Although miRNAs act to confer accuracy and uniformity to developmental transitions, the loss of a miRNA may result not in catastrophic defects but, rather, in imprecise, variable phenotypes. If other feedback or back-up mechanisms are in place, then the loss of robustness may only be detected by applying additional perturbations. The addition of miRNAs to metazoan genomes over time and the diversity of miRNA repertoires among different tissues of developing animals suggest that miRNAs are involved in reinforcing developmental decisions to make organismal complexity reliable and heritable from one generation to the next.” (p. 521)
Editorial (Magdalena Skipper, Ritu Dhand, and Philip Campbell, Nature, Vol. 489 [Sep 6, 2012], p. 45)
“2001 WILL ALWAYS BE REMEMBERED AS THE YEAR OF THE HUMAN GENOME. The availability of its sequence transformed biology, and the exemplary way in which hundreds of researchers came together to form a public consortium paved the way for ‘big science’ in biology. It was an incredible achievement but it was always clear that knowing the ‘code’ was only the beginning. To understand how cells interpret the information locked within the genome much more needed to be learnt. This became the task of ENCODE, the Encyclopedia Of DNA Elements, the aim of which was to describe all functional elements encoded in the human genome. Nine years after launch, its main efforts culminate in the publication of 30 coordinated papers, 6 of which are in this issue of Nature.
“Collectively, the papers describe 1,640 data sets generated across 147 different cell types. Among the many important results there is one that stands out above them all: more than 80% of the human genome’s components have now been assigned at least one biochemical function.” (p. 45)
“The Human Encyclopaedia” (Brendan Maher, Nature, Vol. 489 [Sep 6, 2012], pp. 46-48)
“The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes (see page 57). But the job is far from done, says [Ewan] Birney, a computational biologist at the European Molecular Biology Laboratory’s European Bioinformatics Institute in Hinxton, UK, who coordinated the data analysis for ENCODE. He says that some of the mapping efforts are about halfway to completion, and that deeper characterization of everything the genome is doing is probably only 10% finished. A third phase, now getting under way, will fill out the human instruction manual and provide much more detail.” (p. 46)
“ENCODE explained: Serving up a genome feast” (Joseph R. Ecker, Nature, Vol. 489 [Sep 6, 2012], pp. 52-53)
“One of the more remarkable findings described in the consortium’s ‘entrée’ paper (page 57) is that 80% of the genome contains elements linked to biochemical functions, dispatching the widely held view that the human genome is mostly ‘junk DNA’. The authors report that the space between genes is filled with enhancers (regulatory DNA elements), promoters (the sites at which DNA’s transcription into RNA is initiated) and numerous previously overlooked regions that encode RNA transcripts that are not translated into proteins but might have regulatory roles. Of note, these results show that many DNA variants previously correlated with certain diseases lie within or very near non-coding functional DNA elements, providing new leads for linking genetic variation and disease.” (p. 52)
“ENCODE explained: Non-coding but functional” (Inês Barroso, Nature, Vol. 489 [Sep 6, 2012], p. 54)
“The vast majority of the human genome does not code for proteins and, until now, did not seem to contain defined gene-regulatory elements. Why evolution would maintain large amounts of ‘useless’ DNA had remained a mystery, and seemed wasteful. It turns out, however, that there are good reasons to keep this DNA. Results from the ENCODE project show that most of these stretches of DNA harbour regions that bind proteins and RNA molecules, bringing these into positions from which they cooperate with each other to regulate the function and level of expression of protein-coding genes. In addition, it seems that widespread transcription from non-coding DNA potentially acts as a reservoir for the creation of new functional molecules, such as regulatory RNAs. . . .
“The ENCODE project provides a detailed map of additional functional non-coding units in the human genome, including some that have cell-type-specific activity. In fact, the catalogue contains many more functional non-coding regions than genes.” (p. 54)
“An integrated encyclopedia of DNA elements in the human genome” (The ENCODE Project Consortium, Nature, Vol. 489 [Sep 6, 2012], p. 57-74)
[ABSTRACT:] “The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions.” (p. 57)
“The Encyclopedia of DNA Elements (ENCODE) project aims to delineate all functional elements encoded in the human genome. Operationally, we define a functional element as a discrete genome segment that encodes a defined product (for example, protein or non-coding RNA) or displays a reproducible biochemical signature (for example, protein binding, or a specific chromatin structure). Comparative genomic studies suggest that 3–8% of bases are under purifying (negative) selection and therefore may be functional, although other analyses have suggested much higher estimates. In a pilot phase covering 1% of the genome, the ENCODE project annotated 60% of mammalian evolutionarily constrained bases, but also identified many additional putative functional elements without evidence of constraint. The advent of more powerful DNA sequencing technologies now enables whole-genome and more precise analyses with a broad repertoire of functional assays.” (p. 57)
“The vast majority (80.4%) of the human genome participates in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type. Much of the genome lies close to a regulatory event: 95% of the genome lies within 8 kilobases (kb) of a DNA–protein interaction (as assayed by bound ChIP-seq motifs or DNase I footprints), and 99% is within 1.7 kb of at least one of the biochemical events measured by ENCODE.” (p. 57)
“Given that the ENCODE project did not assay all cell types, or all transcription factors, and in particular has sampled few specialized or developmentally restricted cell lineages, these proportions must be underestimates of the total amount of functional bases.” (p. 60)
“So far, ENCODE has sampled 119 of 1,800 known transcription factors and general components of the transcriptional machinery on a limited number of cell types, and 13 of more than 60 currently known histone or DNA modifications across 147 cell types. . . . An important future goal will be to enlarge this data set to additional factors, modifications and cell types. . . .” (p. 71) <http://www.nature.com/nature/journal/v489/n7414/full/nature11247.html>
“Architecture of the human regulatory network derived from ENCODE data” (Mark B. Gerstein et al., Nature, Vol. 489 [Sep 6, 2012], pp. 91-100)
“This study provides the first detailed analysis of how human regulatory information is organized. A number of clear design principles emerge from it. Many of these are shared with model organisms . . ., demonstrating that they are general features of transcription factor regulation. First, we found that the connectivity and hierarchical organization of regulatory factors is reflected in many genomic properties. For instance, top-level transcription factors have their binding more strongly correlated with the expression of their targets, perhaps indicating that they are more influential, as reported for model organisms. Next, the middle-level contains information-flow bottlenecks and much connectivity with miRNA and distal regulation. Targeting these bottlenecks (for example, by drugs) is likely to most strongly affect the flow of information through regulatory circuits. To some degree, the cell mitigates the effect of bottlenecks by having pairs of middle-level transcription factors collaborate in regulation. (Co-regulation mitigates bottlenecks.) Third, the regulatory network seems to be built from repeated reuse of small, modular motifs. In particular, regulation between levels involves many feed-forward loops, which could be used to filter fluctuations in input stimuli. Again, these properties are shared with model organisms; the network motifs and cooperating middle-level have been observed in yeast.
“By contrast, the differences in proximal and distal regulation seem to be a unique feature of human regulation. This finding is evident in the analysis of both transcription factor co-association and network structure. The proximal–distal differences reflect the much larger intergenic space in humans than model organisms and the commensurately larger amount of distal binding. Finally, analysis of conservation indicates that more highly connected parts of the network are under stronger selection, consistent with results from model organisms. However, one unique finding for humans is ‘allelic’ effects. More highly connected transcription factors are more likely to exhibit allele-specific binding. Interestingly, we found that the actual allele-specific binding sites tend to be under less selection. Unravelling this interaction between selection and regulatory networks will be crucial to interpreting variants in the many personal genome sequences expected in the future.” (pp. 98f.)
“Landscape of transcription in human cells” (Sarah Djebali et al., Nature, Vol. 489 [Sep 6, 2012], pp. 101-108)
[ABSTRACT:] “Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell’s regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs.” (p. 101)
“As the technologies for RNA profiling and for cell-type isolation and culture continue to improve, the catalogue of RNA types has grown and led to an increased appreciation for the numerous biological functions carried out by RNA, arguably putting them on par with the functional importance of proteins. The Encyclopedia of DNA Elements (ENCODE) project has sought to catalogue the repertoire of RNAs produced by human cells as part of the intended goal of identifying and characterizing the functional elements present in the human genome sequence. The five-year pilot phase of the ENCODE project examined approximately 1% of the human genome and observed that the gene-rich and gene-poor regions were pervasively transcribed, confirming results of previous studies. During the second phase of the ENCODE project, lasting 5 years, the scope of examination was broadened to interrogate the complete human genome. Thus, we have sought to both provide a genome-wide catalogue of human transcripts and to identify the subcellular localization for the RNAs produced. Here we report identification and characterization of annotated and novel RNAs that are enriched in either of the two major cellular subcompartments (nucleus and cytosol) for all 15 cell lines studied, and in three additional subnuclear compartments in one cell line.” (p. 101)
“The general lower level of gene expression measured in lncRNAs may not necessarily be the result of consistent low RNA copy number in all cells within the population interrogated, but may also result from restricted expression in only a subpopulation of cells. In some cell lines, individual lncRNAs can exhibit steady-state expression levels as high as those of protein-coding genes.” (p. 103)
“As a class, only protein-coding genes seem to be enriched in the cytosol, making the nucleus a centre for the accumulation of ncRNAs. . . . Other gene classes, such as pseudogenes and small annotated ncRNAs, also show subcellular compartmental enrichment . . . .
“Higher variability and lower pairwise correlation of expression across all cell lines is consistent with lncRNAs contributing more to cell-line specificity than protein-coding genes. Indeed, a considerable fraction (29%) of all expressed lncRNAs are detected in only one of the cell lines studied when considering the whole cell polyadenylated RNAs, whereas only 10% were expressed in all cell lines. Conversely, whereas a large fraction (53%) of expressed protein-coding genes were constitutive (expressed in all cell lines), only ~7% were cell-line specific. . . .” (p. 103)
“Currently, a total of 7,053 small RNAs are annotated by GENCODE, 85% of which correspond to four major classes: small nuclear (sn)RNAs, small nucleolar (sno)RNAs, micro (mi)RNAs and transfer (t)RNAs. . . . Overall we find 28% of all annotated small RNAs to be expressed in at least one cell line. . . . The distribution of annotated small RNAs differs markedly between cytosolic and nuclear compartments. . . . We found that the small RNA classes were enriched in those compartments where they are known to perform their functions: miRNAs and tRNAs in the cytosol, and snoRNAs in the nucleus. Interestingly, snRNAs were equally abundant in both the nucleus and the cytosol. When specifically interrogating the subnuclear compartments of the K562 cell line, however, snRNAs seem to be present in very high abundance in the chromatin-associated RNA fraction. . . . This striking enrichment is consistent with splicing being predominantly co-transcriptional.” (p. 105)
“Overall, about 6% of all annotated long transcripts overlap with small RNAs and are probably precursors to these small RNAs. Although most of these small RNAs reside in introns, when controlling for relative exon/intron length, we found that exons from lncRNAs are comparatively enriched as hosts for snoRNAs. . . . Additionally, 8.4% of GENCODE annotated small RNAs map within novel intergenic transcripts, with most overlapping annotated tRNAs. The enrichment for tRNAs was mostly in novel intergenic transcripts derived from non-polyadenylated RNAs. . . . Many long RNAs, both novel and annotated, thus seem to have dual roles, as functional (protein coding) RNAs, and as precursors for many important classes of small RNAs.” (p. 105)
“ENCODE Project Writes Eulogy For Junk DNA” (Elizabeth Pennisi, Science, Vol. 337 [Sep 7, 2012], pp. 1159, 1161)
“When researchers first sequenced the human genome, they were astonished by how few traditional genes encoding proteins were scattered along those 3 billion DNA bases. Instead of the expected 100,000 or more genes, the initial analyses found about 35,000 and that number has since been whittled down to about 21,000. In between were megabases of ‘junk,’ or so it seemed.
“This week, 30 research papers, including six in Nature and additional papers published by Science, sound the death knell for the idea that our DNA is mostly littered with useless bases. A decadelong project, the Encyclopedia of DNA Elements (ENCODE), has found that 80% of the human genome serves some purpose, biochemically speaking. ‘I don’t think anyone would have anticipated even close to the amount of sequence that ENCODE has uncovered that looks like it has functional importance,’ says John A. Stamatoyannopoulos, an ENCODE researcher at the University of Washington, Seattle.” (p. 1159)
“The ENCODE effort has revealed that a gene’s regulation is far more complex than previously thought, being influenced by multiple stretches of regulatory DNA located both near and far from the gene itself and by strands of RNA not translated into proteins, so-called noncoding RNA. ‘What we found is how beautifully complex the biology really is,’ says Jason Lieb, an ENCODE researcher at the University of North Carolina, Chapel Hill.
“Throughout the 1990s, various researchers called the idea of junk DNA into question. With the human genome in hand, the National Human Genome Research Institute (NHGRI) in Bethesda, Maryland, decided it wanted to find out once and for all how much of the genome was a wasteland with no functional purpose. In 2003, it funded a pilot ENCODE, in which 35 research teams analyzed 44 regions of the genome—30 million bases in all, about 1% of the total genome. In 2007, the pilot project’s results revealed that much of this DNA sequence was active in some way. The work called into serious question our gene-centric view of the genome, finding extensive RNA-generating activity beyond traditional gene boundaries (Science, 15 June 2007, p. 1556). But the question remained whether the rest of the genome was like this 1%. ‘We want to know what all the bases are doing,’ says Yale University bioinformatician Mark Gerstein.” (p. 1159)
Controversy over ENCODE!
Creationists and “Intelligent Design” advocates have celebrated the ENCODE news:
Meanwhile, some evolutionists have complained that the ENCODE definition of “functional” is too broad, and that the “80%” figure is therefore just so much hype.
A blog post of one of the anti-ENCODE complainers, John Timmer:
Some “Intelligent Design” and creationist responses to the complaints:
<http://www.create.ab.ca/encode-project-discarding-junk-dna-for-good/> (by Margaret Helder, CSAA)
ENCODE head (“lead analysis coordinator”) Ewan Birney’s own blog:
The Discover magazine piece by Ed Yong, containing a wealth of information:
John Mattick and Marcel Dinger critique arguments of evolutionists who complain against ENCODE:
A creationist response to an evolutionist challenge based in part on ENCODE:
Nathaniel T. Jeanson, “Does ‘Junk’ DNA Exist?” Acts & Facts 42(4):20, April 2013 <http://www.icr.org/article/7316/>
“Long non-coding antisense RNA controls Uchl1 translation through an embedded SINEB2 repeat” (Claudia Carrieri, Laura Cimatti, Marta Biagioli, Anne Beugnet, Silvia Zucchelli, Stefania Fedele, Elisa Pesce, Isidre Ferrer, Licio Collavin, Claudio Santoro, Alistair R. R. Forrest, Piero Carninci, Stefano Biffo, Elia Stupka & Stefano Gustincich, Nature, Vol. 491 [Nov 15, 2012], pp. 454-459)
[ABSTRACT:] “Antisense lncRNAs may form sense–antisense pairs by pairing with a protein-coding gene on the opposite strand to regulate epigenetic silencing, transcription and mRNA stability. Here we identify a nuclear-enriched lncRNA antisense to mouse ubiquitin carboxy-terminal hydrolase L1 (Uchl1), a gene involved in brain function and neurodegenerative diseases. Antisense Uchl1 increases UCHL1 protein synthesis at a post-transcriptional level, hereby identifying a new functional class of lncRNAs.Antisense Uchl1 activity depends on the presence of a 5′ overlapping sequence and an embedded inverted SINEB2 element [an ’embedded repetitive sequence’]. . . . Antisense Uchl1 RNA is then required for the association of the overlapping sense protein-coding mRNA to active polysomes for translation. These data reveal another layer of gene expression control at the post-transcriptional level.” (p. 454)
“Targeted deletion of the region containing the embedded SINEB2 and Alu repetitive sequences (ΔAS) was also able to prevent UCHL1 protein induction. Deletion of each repetitive element separately revealed that SINEB2 is the functional unit required by antisense Uchl1 for increasing UCHL1 protein synthesis.” (p. 455)
“This new function for SINEB2 sequences in the cytoplasm adds to their well-established role in the nucleus as inhibitors of RNA polymerase II.” (pp. 456f.)
“Long Noncoding RNAs: Cellular Address Codes in Development and Disease” (Pedro J. Batista and Howard Y. Chang, Cell, Vol. 152 [Mar 14, 2013], pp. 1298-1307)
[ABSTRACT:] “Long noncoding RNAs (lncRNAs) have emerged as key components of the address code, allowing protein complexes, genes, and chromosomes to be trafficked to appropriate locations and subject to proper activation and deactivation. lncRNA-based mechanisms control cell fates during development, and their dysregulation underlies some human disorders caused by chromosomal deletions and translocations.” (p. 1298)
“Here, we review the evidence that lncRNAs are a rich source of molecular addresses in the eukaryotic nucleus. . . .
“In this Review, we focus on a particular class of noncoding transcripts known as long noncoding RNAs (lncRNAs) and the roles that they play in nuclear organization.” (p. 1298)
“The repertoire of roles performed by lncRNAs is growing, as there is now evidence that lncRNAs participate in multiple networks regulating gene expression and function. Several characteristics of lncRNAs make them the ideal system to provide the nucleus with a system of molecular addresses. lncRNAs, unlike proteins, can function both in cis, at the site of transcription, or in trans. An RNA-based address code may be deployed more rapidly and economically than a system that relies only on proteins. lncRNAs do not need to be translated and do not require transport between the cytoplasm and the nucleus. lncRNAs can also interact with multiple proteins, enabling scaffolding functions and combinatorial control. . . . As such, the act of transcription can rapidly create an anchor that will lead to the formation, or remodeling, of nuclear domains through the recruitment or sequestration of proteins already present in the nuclear compartment. Using lncRNAs allows cells to create addresses that are regional-, locus- or even allele-specific. . . . At the regional level, lncRNAs can influence the formation of nuclear domains and the transcriptional status of an entire chromosome, and they can participate in the interaction of two different chromosomal regions. At a more fine-grained level, lncRNAs can control the chromatin state and activity of a chromosomal locus or specific gene. We explore each of these concepts below with recently published examples.” (p. 1299)
“Cells can use noncoding RNAs to modulate gene expression by changing the accessibility of gene promoters. These mechanisms can be used to fine-tune gene expression in response to environmental conditions or to silence a gene as part of a developmental program.
“First, the act of noncoding RNA (ncRNA) transcription itself can be purposed for regulatory function. For example, transcription through a regulatory sequence, such as a promoter, can block its function, a mechanism termed transcriptional interference . . . first identified in yeast. . . . In such instances, the lncRNA promoter is finely tuned to receive appropriate inputs to exert regulatory function; the lncRNA product is typically a faithful biomarker of transcriptional interference in action but is not required for its success.” (p. 1299)
“Second, lncRNAs can silence or activate gene expression in cis, acting on neighboring genes of the lncRNA locus. Some of the first studied examples of lncRNA function involve dosage compensation and genomic imprinting, whereby lncRNAs provide allele-specific gene regulation to differentially control two copies of the same gene within one cell. . . . one lncRNA gene can employ multiple mechanisms to regulate nearby and distantly located genes. In genome-wide studies, numerous lncRNAs have now been found to interact with chromatin modification complexes. . . . (p. 1300)
“DNA methylation can occur as a long-term silencing mecha-nism downstream of repressive histone modifications, and lncRNAs may also guide DNA methylation in addition to histone modification. . . .
“A distinct family of lncRNAs serves to activate gene expression. Many active enhancer elements transcribe lncRNAs, termed ‘eRNAs’ . . ., and several lncRNAs are required to activate gene expression, which are termed ‘enhancer-like RNAs’ . . . .” (p. 1301)
“Third, lncRNAs can control chromatin states at distantly located genes (i.e., in trans) for both gene silencing and activation. . . . These lncRNAs bind to some of the same effector chromatin modification complexes but target them to genomic loci genome-wide.” (p. 1301)
“The concept of lncRNA recruitment of factors to genes may be more properly considered a two-way street, with genes being moved into specific cytotopic locations by lncRNAs. One type of molecular address can be found in the formation of nuclear domains. These are regions of the nucleus where specific functions are performed. Unlike cellular organelles, these domains are not membrane delimited. They are instead characterized by the components that form them. These domains are believed to form through molecular interactions between its components. Once a stable interaction is found, the components remain associated. These domains are often formed around the sites of transcription of RNA components, which function as molecular anchors. . . .” (p. 1301)
“The ultimate function of mRNAs is to be translated, and like other steps of gene expression, multiple layers of posttranscriptional regulation exist in the cytoplasm. . . . lncRNAs can also ‘identify’ mRNAs in the cytoplasm and modulate their life cycle. Recent works demonstrated that lncRNAs impact both the mRNA half-life and translation of mRNAs. . . . These emerging examples illustrate that lncRNAs can provide a rich palette of regulatory capacities in the cytoplasm.” (p. 1303)
“lncRNAs are well poised to be molecular address codes, particularly in the nucleus. On the one hand, transcription of lncRNAs is often exquisitely regulated, reflecting the particular developmental stage and external environment that the cell has experienced. On the other, the capacity of lncRNAs to function as guides, scaffolds, and decoys endows them with enormous regulatory potential in gene expression and for spatial control within the cell.” (p. 1304)
“The Noncoding RNA Revolution—Trashing Old Rules to Forge New Ones” (Thomas R. Cech and Joan A. Steitz, Cell, Vol. 157 [Mar 27, 2014], pp. 77-94)
[ABSTRACT:] “Noncoding RNAs (ncRNAs) accomplish a remarkable variety of biological functions. They regulate gene expression at the levels of transcription, RNA processing, and translation. They protect genomes from foreign nucleic acids. They can guide DNA synthesis or genome rearrangement. For ribozymes and riboswitches, the RNA structure itself provides the biological function, but most ncRNAs operate as RNA-protein complexes, including ribosomes, snRNPs, snoRNPs, telomerase, microRNAs, and long ncRNAs. Many, though not all, ncRNAs exploit the power of base pairing to selectively bind and act on other nucleic acids. Here, we describe the pathway of ncRNA research, where every established ‘rule’ seems destined to be overturned.” (p. 77)
“Today, the ncRNA revolution has engulfed all living organisms, as deep sequencing has uncovered the existence of thousands of long (l)ncRNAs with a breaktaking variety of roles in both gene expression and remodeling of the eukaryotic genome.” (p. 77)
“The discovery of introns sparked a lively debate about the evolutionary nature of noncoding (then considered ‘junk’) DNA. . . .
“Relegating introns to the junk pile turned out to be premature. A clear-cut ‘use’ of intronic sequences that redefines them as not-junk is in alternative splicing . . ., whereby sequences that are sometimes eliminated from the mRNA appear instead as exonic (coding) regions. This occurs through the selection of alternative 5′- or 3′-splice sites or by cassette exons being included (or not) during the splicing process. Alternative splicing is pervasive with the latest estimates from deep-sequencing data assigning detectable alternatively-spliced transcripts to 95% of human genes. . . .
“Most small nucleolar (sno)RNAs are pieces of intron (~70 nt) that lead a second life after their release from excised introns through exonucleolytic processing. . . . SnoRNPs use intermolecular base pairing to direct the modification of ribose 2′-hydroxyl groups or the isomerization of uridines to pseudouridines within pre-rRNAs. . . .
“A recent revelation concerning intronic ‘junk’ is the discovery that entire introns or portions thereof, called stable intronic sequence (sis)RNAs, can sometimes accumulate to significant levels, rather than undergo rapid turnover. In the Xenopus oocyte, such sequences dominate the nuclear transcriptome. . . . Some sisRNAs are selectively nuclear and others cytoplasmic, hinting at special functions in early development.” (p. 83)
“Early work on transcription in mammalian cells identified hnRNA, a heterogeneous population of huge nuclear RNAs that were short-lived. . . . the pendulum of scientific opinion has now swung away from the idea that much of this RNA could be ‘transcriptional noise’ or junk RNA transcribed from junk DNA. . . .
“Reviewing the biological functions and mechanisms of lncRNAs is a daunting task for several reasons. New lncRNA papers are published daily, and entire new categories and paradigms are proposed annually. And although our human penchant for categorization drives a desire to assign individual functions to individual lncRNAs, a single 1 kb lncRNA is long enough to carry out a large number of functions with perhaps different subsets of these functions being active in different tissues and at different stages of development.” (p. 87)
“Notwithstanding the fact that there are definable classes of ncRNAs that work by similar principles (e.g., tRNAs, riboswitches, miRNAs), it could be argued that every ncRNA studied has a different function. Certainly no two mammalian lncRNAs appear to have the same function. Thus, with perhaps 10,000 lncRNAs yet to be studied in the human genome alone, it seems safe to predict that many new functions of ncRNAs will be identified—perhaps thousands of functions.” (p. 89)
See also my brief polemical article complementary to the above quote collection as a whole:
“Evolutionary Thinking leads to Retarded Science“