Thursday, April 17, 2008

genome

A genome is the complete collection of hereditary information for an individual organism. In cellular life forms, the hereditary information exists as DNA. There are two fundamentally distinct types of cells in the living world, prokaryotic and eukaryotic, and the organization of genomes differs in these two types of cells.

Prokaryotes comprise the bacteria and archaea. The latter were originally designated "extremophiles" because they favor such extreme environments as high acidity, salinity, or temperature. Prokaryotic cells tend to be very small, have few or no cytoplasmic organelles, and have the cellular DNA arranged in a "nucleoid region" that is not separated from the remainder of the cell by any membrane. Eukaryotes exist as unicellular or multicellular organisms. Among the unicellular eukaryotes are the protozoa, some types of algae, and a few forms of fungi, while the multicellular organisms include animals, plants, and most fungi.

Eukaryotic cells are larger than prokaryotic cells, have a complex array of cytoplasmic structures, and have a prominent nucleus that communicates with components in the cytoplasm through an elaborate nuclear envelope. The hereditary information occurs principally in the nucleus of eukaryotic cells; in addition, minuscule (but essential) amounts of hereditary information occur in some cytoplasmic organelles (specifically, in chloroplasts for plants and algae, and in mitochondria for all eukaryotic groups).

Eukaryotic cells pass through a "cycle," progressing from a newly formed cell to a cell that is dividing to produce the next generation of progeny cells. Prior to division, the cell is in an "interphase"; during division, the cell is in a "division phase." During interphase, the nuclear DNA is organized in a dispersed network of chromatin, which is a complex consisting of nucleic acid and basic proteins. Immediately prior to and during division, the chromatin condenses to a series of discrete, compact structures called chromosomes. Thus, the physical organization of the genome varies from inter-phase to division phase. Finally, viruses (which are noncellular, parasitic "life forms") have genomes of double-stranded DNA, single-stranded DNA, double-stranded RNA, or single-stranded RNA.

Eukaryotes

In sexually reproducing eukaryotes, progeny organisms receive a portion of their genetic information from each parent, receiving half the information from each. These parental contributions are designated haploid complements. The haploid complement can be represented as a "C value," which expresses the haploid complement as an amount of DNA measured in base pairs. Alternatively, the haploid complement can be expressed as the number of chromosomes contributed by each parent: This number of chromosomes is characteristic of each species. Finally, the haploid complement can be expressed as the number of genes on the haploid set of chromosomes.

Chromosome Number

Each species has a characteristic number of chromosomes. For species with genetically determined sexes, the haploid set is composed of autosomes plus a sex chromosome. Homo sapiens, for example, have 22 autosomes plus an X chromosome or Y chromosome. The haploid DNA content of chimpanzees is nearly identical, but is organized into 23 autosomes plus a sex chromosome.

The record for minimum number of chromosomes belongs to a sub-species of the ant, Myrmecia pilosula. The females have a single pair of chromosomes, while males have only a single chromosome. Like some other members of the insect class, these ants reproduce by a process called haplodiploidy, in which diploid fertilized eggs develop into females, while haploid unfertilized eggs develop into males.

The record for maximum number of chromosomes is found in the plant kingdom, due to a condition known as polyploidy. In polyploidy, many extra sets of chromosomes beyond the normal diploid number may accumulate over time. Cultivars of wheat exist with diploid numbers of chromosomes equaling 14, 28, or 42 (multiples of the haploid number, which is 7). Polyploids exist for many cultivated plants, including potatoes, strawberries, and cotton, as well as in wild plants such as dandelions. Polyploidy has led to striking numbers, and the known record is held by the fern Ophioglossum reticulatum, which has approximately 630 pairs.

Genome Size or C Value

The C value is the amount of DNA in a haploid complement. Currently, the amount is reported as the total number of base pairs. Generally, more complex organisms have more DNA. For example, the haploid complement of Homo sapiens DNA contains between 3.12 and 3.2 gigabases (the prefix "giga" denotes billions), while the haploid complement of yeast (Saccharomyces cerevisiae) DNA contains 12,057,500 base pairs.

Unexpected genomic sizes occur, however, in a condition called the C value paradox. Two closely related species can have widely divergent amounts of DNA. For example, Paramecium caudatum has a C value of 8,600,000 kilobases (where the prefix "kilo" denotes thousands) while its near relative P. aurelia has a C value of just 190,000 kilobases. Another paradoxical circumstance occurs when a simpler organism has a C value higher than a more complex organism. For example, Amphiuma means (a newt) and Amoeba dubia (an amoeba) have, respectively, C values that are 26 and 209 times the C value of humans.

Number of Nuclear Genes, "Gene Density," and Intergenic Sequences

An important trend in genome evolution has been the accumulation, both within the genes (intragenic) and between genes (intergenic), of DNA that does not code for any gene products. Homo sapiens have between 31,000 and 70,000 genes; mice have 24,780; Caenorhabditis elegans (a roundworm) has more than 19,099; fruit flies have 13,601; and yeast approximately 6,000. A ratio of gene number to C value indicates that lower organisms have both smaller genes and lower numbers of nongene base pairs between adjacent genes. Higher eukaryotes have a larger number of intragenic inserts (introns), greater intergenic distances, and more abundant repeated sequences.

In higher eukaryotes, only a small portion of the genome is organized into genes. For example, in humans less than 2 percent of the genome specifies protein products. Another portion (about 20 percent in humans) is present as gene fragments, pseudogenes (sequences that resemble genes but are not expressed as proteins), and surrounding stretches of nucleotides. The vast majority of nucleotides (approximately 75 percent in humans) constitute extragenic sequences. Two forms of extragenic sequences are prominent: unique sequences and repetitive sequences.

For repetitive sequences, two types of organization occur: short tandem repeats (called satellite sequences) and widely distributed, interspersed repeats. Satellites are recurrent short sequences present in essential chromosomal structures such as centromeres and telomeres. Interspersed repeats are generated from transposons, which are nucleotide sequences that can replicate themselves and become distributed throughout the genome. An example of interspersed repeats that occurs in humans is a sequence of a few hundred nucleotides called Alu, which occurs approximately a million times. In higher plants, satellites and interspersed sequences constitute the bulk of the genome.

Ploidy

Ploidy reflects the reproductive mechanisms of an organism. Animals commonly have both a maternal and a paternal parent. Through meiosis, the former forms a haploid gamete called an ovum (or egg); the latter forms a haploid gamete called a sperm. During fertilization, the egg and sperm unite to form a diploid zygote that matures to an adult organism. Thus, the genome of adult animals is diploid, while the genome of their gametes is haploid.

Plants exhibit an alternation of generations; sporophytes (the mature, visible plant) are diploid; through meiosis, they produce spores that germinate into gametophytes; the gametophytes are haploid and produce gametes that fuse to reestablish the diploid state. Fungi also exhibit an alternation of generations. They commonly exist as multinucleate tubes of cytoplasm called hyphae. The individual nuclei are most often haploid (though may be diploid in the lower fungi).

Hyphae of different members of a fungal species sometimes fuse; in this circumstance (called heterokaryosis) the genome becomes the sum of the two (dikaryotic) haploid complements. Unicellular protistan organisms, a group that includes protozoans and most algae, exhibit many variations. For example, the ciliates (such as paramecia) have diploid micronuclei and polyploid macronuclei; the former are the basis of inheritance; the latter establish the genetic character of an existing organism.

Mitochondrial and Chloroplast Genomes

Two cytoplasmic organelles responsible for the production of energy are the mitochondria (present in nearly all eukaryotic cells) and chloroplasts (present only in photosynthetic organisms). Both contain small, circular DNA molecules that constitute the nonnuclear portion of a eukaryotic genome. These organelles are descended from formerly free-living bacteria that took up residence in the first eukaryotes.

The human mitochondrial genome contains 16,569 base pairs specifying 13 protein products and 24 RNA products. In both lower eukaryotes and especially plants, larger mitochondrial genomes are present. In extreme cases, mitochondrial genomes may be several hundred thousand or millions of base pairs. Chloroplast genomes contain between 100 and 200 kilobases. It is thought that each was once larger, but over time their genes have been moved to the nucleus.

Prokaryotes

Prokaryotic genomes are composed of a chromosome plus various accessory elements. The former is most commonly a circular double-stranded DNA molecule but may be a linear molecule in some major groups, such as Streptomyces and Borrelia (the causative agent of Lyme disease). Accessory elements most prominently include plasmids (commonly circular but linear in Actinomycetes and some Proteobacteria) as well as insertion sequence (IS) elements, transposons, and prophages (derived from viruses). Other variations in chromosomal geometry exist: multiple circular chromosomes are found in some organisms; combinations of circular and linear chromosomes occur in others; and, in the extreme (observed in Streptomyces), circular and linear chromosomes can convert between those two topologies.

The smallest bacterial chromosome, with only 580 kilobase pairs (kbp) occurs in Mycoplasma genitalium, and the largest, with 9,200 kbp, occurs in Myxococcus xanthus. Representative sizes cluster between 2,000 and 5,000 kbp (e.g., Escherichia coli MG1655 has 4,649,221 bp). A typical bacterial gene contains approximately a thousand base pairs. M. genitalium has approximately 470 genes, while M. xanthus has more than 10,000, and E. coli has approximately 4,288.

By 2002 the nucleotide sequences of more than seventy-five prokaryotic chromosomes had been mapped. One goal of these sequencing projects is gene annotation: establishing the location, function, and allelic variation for each gene. In E. coli MG1655, for example, the positions of the 4,288 protein-coding genes have been identified; the average distance between genes is 118 base pairs; and the noncoding sequences (some of which may function as regulatory sites) constitute less than 11 percent of the genome. The function of approximately 40 percent of the genes, however, remains unknown. Notably, the chromosomal size and gene content of another isolate of E. coli, the pathogenic H157:O7 strain, are quite different. The H157:O7 chromosome is 20 percent larger, while MG1655 and H157:O7 share 4.1 million base pairs (mbp) in common. H157:O7 has 1.34 mbp that are not found in MG1655 and MG1655 has 0.53 mbp that are not found in H157:O7.

The genomes of closely related prokaryotes often have different organizations. These differences arise from rearrangements (such as inversions) between repeated elements, IS elements, and transposons and from the "horizontal transfer" of nucleotide sequences between cells. The latter phenomenon is mediated most commonly by conjugative plasmids, which are nonessential, autonomous accessory genetic elements that can acquire genes (such as antibiotic resistance genes) and then move them from a donor organism to a recipient. The dynamic character of genomic organization in prokaryotes is often designated as "genomic plasticity."

A series of repeated elements exist in the chromosomes of prokaryotes. In some instances the repeats are redundant copies of essential, long nucleotide sequences, as is seen in ribosomal RNA loci. Other repeats are small and have known functions (as in the Chi sequences in E. coli that facilitate genetic crossing over) or unknown functions (as in the REP [repeated extragenic palindromic] sequences in E. coli).

Viruses

Viral genomes are composed of single-stranded or double-stranded DNA or RNA. Single-stranded RNAs are either positive (capable of being immediately translated into protein) or negative. Double-stranded RNA genomes are most often segmented, with each segment being a single gene, while the other genomes are single circular or linear molecules. The Retroviridae have single-stranded RNA genomes that are converted by an enzyme (reverse transcriptase) into double-stranded DNA that becomes incorporated into the genome of the host.

The smallest known virus, containing 5,386 bases, is a member of the Microviridae, which infects bacteria and is designated fX174. The largest viral genomes occur in Poxviridae, which can possess as many as 309 kbp.

Viruses are extraordinarily efficient in using the coding capacity of their genomes. The virus known as fX174 contains ten genes, and the end of one gene commonly overlaps with the beginning of the following gene. In addition, two smaller genes are nested within larger genes (this compaction being achieved by having the two genes expressed in alternate "reading frames"). As a consequence of this efficiency, only 36 bases are not translated into an amino acid sequence. At the opposite extreme, the various pox viruses share more than 100 similar genes and may have an equal number of unique genes.

Bibliography

Brown, T. A. Genomes. New York: Wiley-Liss, 1999.

Casjens, Sherwood. "The Diverse and Dynamic Structure of Bacterial Genomes." Annual Review of Genetics 32 (1998): 339-377.

Gould, Stephen J. "The Ant and the Plant." In Bully for Brontosaurus. New York:

W. W. Norton, 1991.

—Steven Krawiec

No comments:

 
Cool Web Site Listings
Search Engine Optimization - AddMe