Tải bản đầy đủ
8: Comparative Genomics Studies How Genomes Evolve

8: Comparative Genomics Studies How Genomes Evolve

Tải bản đầy đủ

Molecular Genetic Analysis, Biotechnology, and Genomics

377

Experiment
Question: Can variation in gene expression, detected by microarrays, be used to predict the
recurrence of breast cancer?
Methods Microarray chip
with DNA probes

Hybridization

Cancer
cells

RNA

cDNA with
fluorescent bases

Noncancer
cells

1 Cancer and
noncancer cells
removed from
78 women with
breast cancer.

Results

2 Messenger RNA
from the cells…

4 The cDNAs
3 …is converted into
are mixed…
cDNA and labeled
with red (cancer cells)
or green (noncancer
cells) fluorescent
nucleotides.

5 …and hybridized
to DNA probes
on a chip.

Each row represents the primary
tumor from a patient and each
column represents one of the
70 genes in the initial tumors.

6 The chip is scanned spot by spot.
Yellow fluorescence (red + green)
indicates equal expression of the
gene in both types of cells; red
indicates more expression in cancer
cells; and green indicates more
expression in noncancer cells.

Tumors above the solid yellow line
came primarily from patients who
remained cancer free for at least 5 years.

Tumors below the solid yellow line came
primarily from patients in whom the
cancer spread within 5 years of diagnosis.

Conclusion: Seventy genes were identified whose expression patterns accurately predicted the
recurrence of breast cancer within 5 years of treatment.

14.28 Microarrays can be used to examine gene expression associated with disease
progression. [After L. J. van’t Veer, Nature 405:532, 2002.]
the bacterium that causes cholera, which has two circular
chromosomes, and Borrelia burgdorferi, which has one large
linear chromosome and 21 smaller chromosomes.

Genome size and number of genes The total amount of
DNA in prokaryotic genomes ranges from 490,885 bp in
Nanoarchaeum equitans, an archaeon that lives entirely within
another archaeon, to 9,105,828 bp in Bradyrhizobium japonicum, a soil bacterium (Table 14.3). Although this range in

genome size might seem extensive, it is much less than the
enormous range of genome sizes seen in eukaryotes, which
can vary from a few million base pairs to hundreds of billions
of base pairs. Escherichia coli, the most widely used bacterium
for genetic studies, has a fairly typical genome size at 4.6 million base pairs. Archaea and bacteria are similar in their ranges
of genome size. Surprisingly, genome size shows extensive
variation within some species; for example, different strains of
E. coli vary in genome size by more than 1 million base pairs.

378

Chapter 14

Table 14.3

about 1 gene per 1000 bp. Thus, prokaryotes with larger
genomes will have more genes, in contrast with eukaryotes,
for which there is little association between genome size and
number of genes (see the section on eukaryotic genomes).
The evolutionary factors that determine the size of genomes
in prokaryotes (as well as in eukaryotes) is still largely
unknown. Only about half of the genes identified in
prokaryotic genomes can be assigned a function (Figure
14.29). Almost a quarter of the genes have no significant
sequence similarity to any known genes in other bacteria.

Characteristics of some
completely sequenced
representative prokaryotic
genomes
Size (millions
of base pairs)

Number of
Predicted Genes

Archaeoglobus fulgidus

2.18

2407

Methanobacterium
thermoautotrophicum

1.75

1869

Methanococcus jannaschii

1.66

1715

Nanoarchaeum equitans

0.490

536

Bacillus subtilis

4.21

4100

Bradyrhizobium japonicum

9.11

8317

Buchnera species

0.64

564

Escherichia coli

4.64

4289

Haemophilus influenzae

1.83

1709

Eukaryotic Genomes

Mesorhizobium loti

7.04

6752

Mycobacterium tuberculosis

4.41

3918

Mycoplasma genitalium

0.58

480

Staphylococcus aureus

2.88

2697

Vibrio cholerae

4.03

3828

The genomes of more than 100 eukaryotic organisms have
been completely sequenced, including a number of fungi and
protists, several insects, several plants, and a number of
vertebrates such as the mouse, rat, dog, chimpanzee, and
human. Hundreds of additional eukaryotic genomes are in
the process of being sequenced. It is important to note that,
even though the genomes of these organisms have been
“completely sequenced,” many of the final assembled
sequences contain gaps, and regions of heterochromatin may
not have been sequenced at all. Thus, the sizes of eukaryotic
genomes are often estimates, and the number of base pairs
given for the genome size of a particular species may vary.
Predicting the number of genes that are present in a genome
also is difficult and may vary, depending on the assumptions
made and the particular gene-finding software used.

Species
Archaea

Concepts
Comparative genomics compares the content and organization of
whole genomic sequences from different organisms. Prokaryotic
genomes are small, usually ranging from 1 million to 3 million base
pairs of DNA, with several thousand genes.

Eubacteria

✔ Concept Check 9
What is the relation between genome size and gene number in
prokaryotes?

Source: Data from the Genome Atlas of the Center for Biological Sequence
Analysis, http://www.cbs.dtu.dk/services/GenomeAtlas/.

Among prokaryotes, the number of genes typically
varies from 1000 to 2000, but some species have as many as
6700 and others as few as 480. Interestingly, the density of
genes is rather constant across all species, with an average of
13
1

12

11

10

14.29 The functions of many
genes in prokaryotes cannot
be determined by comparison
with genes in other
prokaryotes. Percentages of
genes affecting various known and
unknown functions in E. coli.

9

2
8
6
34 5

7

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.

Metabolism
Unknown
Ionic homeostasis
Protein synthesis
Energy
Transport facilitation
Cellular biogenesis
Intracellular transport
Protein destination
Cellular communication and signal transduction
Cell rescue, defense, cell death, and aging
Cell growth, cell division, and DNA synthesis
Transcription

Molecular Genetic Analysis, Biotechnology, and Genomics

Table 14.4

Characteristics of some
eukaryotic genomes that have
been completely sequenced

Species
Saccharomyces
cerevisiae (yeast)

Genome Size
(millions of
base pairs)

Number of
Predicted Genes

12

6,144

Arabidopsis thaliana
(plant)

125

25,706

Caenorhabditis elegans
(nematode worm)

103

20,598

Drosophila melanogaster
(fruit fly)

170

13,525

Anopheles gambiae
(mosquito)

278

14,707

1,465

22,409

329

22,089

Mus musculus (mouse)

2,627

26,762

Ratus novegicus
(Norway rat)

2,571

23,761

Pan troglodytes
(chimpanzee)

2,733

22,524

Homo sapiens
(human)

3,223

~24,000

Danio rerio (zebrafish)
Takifugu rubripes
(tiger pufferfish)

Source: Ensembl Web site: http//www.ensembl.org.

Genome size and number of genes The genomes of
eukaryotic organisms (Table 14.4) are larger than those of
prokaryotes, and, in general, multicellular eukaryotes have
more DNA than do simple, single-celled eukaryotes such as
yeast (see p. 212 in Chapter 8). However, there is no close
relation between genome size and complexity among the
multicellular eukaryotes. For example, the nematode worm
Caenorhabditis elegans is structurally more complex than the
plant Arabidopsis thaliana but has considerably less DNA.
In general, eukaryotic genomes also contain more genes
than do prokaryotes (but there are some large bacteria that
have more genes than single-celled yeasts do), and the
genomes of multicellular eukaryotes have more genes than
do the genomes of single-celled eukaryotes. In contrast with
bacteria, there is no correlation between genome size and
number of genes in eukaryotes. The number of genes among
multicellular eukaryotes also is not obviously related to phenotypic complexity: humans have more genes than do invertebrates but only twice as many as fruit flies and only slightly

more than the plant A. thaliana. The nematode C. elegans has
more genes than does D. melanogaster but is less complex.
Additionally, the pufferfish has only about one-tenth the
amount of DNA present in humans and mice but has almost
as many genes. Eukaryotic genomes contain multiple copies
of many genes, indicating that gene duplication has been an
important process in genome evolution.

Gene deserts The density of genes in a typical eukaryotic
genome varies greatly, with some chromosomes having a
high density of genes and others being relatively gene poor.
In some areas of the genome, long stretches of DNA, often
consisting of hundreds of thousands to millions of base pairs
are completely devoid of any known genes or other functional sequences; these regions are known as gene deserts.
Gene deserts are surprisingly common in eukaryotes. The
human genome contains about 500 gene deserts or more,
making up approximately 25% of the total euchromatin in
the human genome. Gene deserts are particularly common
on human chromosomes 4, 5, and 13, where they cover as
much as 40% of the entire chromosome. What is the purpose
of a gene desert? Why does it exist? A possible answer is that
it contains DNA sequences that have a functional role—perhaps in regulating genes or in the overall architecture of the
chromosome—but we are unable to recognize the function
of the sequences that it contains.
Transposable elements A substantial part of the
genomes of most multicellular organisms consists of moderately and highly repetitive sequences (see Chapter 8), and the
percentage of repetitive sequences is usually higher in those
species with larger genomes (Table 14.5). Most of these
repetitive sequences appear to have arisen through transposition and are particularly evident in the human genome:
45% of the DNA in the human genome is derived from
transposable elements, many of which are defective and no
longer able to move. Most of the DNA in multicellular

Table 14.5

Percentage of genome
consisting of interspersed
repeats derived from
transposable elements

Organism
Plant (Arabidopsis thaliana)

Percentage
of Genome
10.5

Nematode worm (Caenorhabditis elegans)

6.5

Fly (Drosophila melanogaster)

3.1

Tiger pufferfish (Takifugu rubripes)

2.7

Human (Homo sapiens)

44.4

379

Chapter 14

organisms is noncoding, and many genes are interrupted by
introns. In the more complex eukaryotes, both the number
and the length of the introns are greater.

60

Percentage of introns

380

Protein diversity In spite of only a modest increase in
gene number, vertebrates have considerably more protein
diversity than do invertebrates. One way to measure the
amount of protein diversity is by counting the number of
protein domains, which are characteristic parts of proteins
that are often associated with a function. Vertebrate genomes
do not encode more protein domains than do invertebrate
genomes; for example, there are 1262 domains in humans
compared with 1035 in fruit flies. However, the existing
domains in humans are assembled into more combinations,
leading to many more types of proteins. For example, the
human genome contains almost twice as many arrangements of protein domains as do worms or flies and almost
six times as many as does yeast.

Concepts
Genome size varies greatly among eukaryotic species. For multicellular eukaryotic organisms, there is no clear relation between
organismal complexity and amount of DNA or gene number. A
substantial part of the genome in eukaryotic organisms consists of
repetitive DNA, much of which is derived from transposable elements. Vast regions of DNA may contain no genes or other functional sequences.

The Human Genome
The human genome, which is fairly typical of mammalian
genomes, has been extensively studied and analyzed because
of its importance to human health and evolution. It is

Human

50

Worm
40

Fly

30
20
10
0

<1

1–2
2–5
5–30
Intron length (kb)

>30

14.30 The introns of genes in humans are generally longer
than the introns of genes in worms and flies.

3.2 billion base pairs in length, but only about 25% of the
DNA is transcribed into RNA, and less than 2% encodes proteins. Active genes are often separated by vast regions of noncoding DNA, much of which consists of repeated sequences
derived from transposable elements.
The average gene in the human genome is approximately 27,000 bp in length, with about 9 exons. (One exceptional gene has 234 exons.) The introns of human genes are
much longer, and there are more of them than in other
genomes (Figure 14.30). The human genome does not
encode substantially more protein domains, but the domains
are combined in more ways to produce a relatively diverse
proteome. Gene functions encoded by the human genome
are presented in Figure 14.31. Similarly to the situation in
bacteria, the function of many genes in the human genome
is still unknown. A single gene often encodes multiple proteins through alternative splicing; each gene encodes, on the

28
4

5

1
32

29

25
27 26

23
24

22
21 20
19
18

6
7

8
9
10
11

12

13
15

14

17
16

1. Miscellaneous
2. Viral protein
3. Transfer or carrier
protein
4. Transcription factor
5. Nucleic acid enzyme
6. Signaling molecule
7. Receptor
8. Kinase
9. Select regulatory
molecule
10. Transferase

11. Synthase and
synthetase
12. Oxidoreductase
13. Lyase
14. Ligase
15. Isomerase
16. Hydrolase
17. Molecular function
unknown
18. Transporter
19. Intracellular
transporter
20. Select calciumbinding protein

14.31 Functions for many human genes have yet to be determined. Percentages of genes
affecting various known and unknown functions.

21. Proto-oncogene
22. Structural protein
of muscle
23. Motor
24. Ion channel
25. Immunoglobulin
26. Extracellular matrix
27. Cytoskeletal
structural protein
28. Chaperone
29. Cell adhesion

Molecular Genetic Analysis, Biotechnology, and Genomics

average, two or three different mRNAs, meaning that the
human genome, with approximately 24,000 genes, might
encode 72,000 mRNAs or more.

(a)
P T K

V P

V

A protein is treated
with the enzyme
trypsin,…

W
I

T

N

Proteomics

Y
A R

D
E
K

R
L A F

T S

Trypsin

G

I

V

P

P

T K

V W Y R
A

I
T

…which breaks it
into short peptides.

T N D E K
S

L

A F

R

(b)
The peptides are analyzed
with a mass spectrometer,
which determines their
mass-to-charge ratio.

Accelerator

Mass
spectrometer

Detector

(c)
Counts

DNA sequence data are tremendous sources of insight into
the biology of an organism, but they are not the whole story.
In recent years, molecular biologists have turned their attention to analysis of the protein content of cells. The ultimate
goal is to determine the proteome, the complete set of proteins found in a given cell, and the study of the proteome is
termed proteomics.
The traditional method for identifying a protein is to
remove its amino acids one at a time and determine the identity of each amino acid removed. This method is far too slow
and labor intensive for analyzing the thousands of proteins
present in a typical cell. Today, researchers use mass spectrometry, which is a method for precisely determining the
molecular mass of a molecule. In mass spectrometry, a molecule is ionized and its migration rate in an electrical field is
determined. Because small molecules migrate more rapidly
than larger molecules, the migration rate can accurately
determine the mass of the molecule.
To analyze proteins with mass spectrometry, a protein is
broken up into small peptide fragments and mass spectrometry is then used to separate the peptides on the basis of their
mass-to-charge (m/z) ratio (Figure 14.32). A computer program then searches through a database of proteins to find a
match between the profile generated and the profile expected
with a known protein. Using bioinformatics, the computer
creates “virtual digests” and predicts the profiles of all proteins found in a genome, given the DNA sequences of the
protein-encoding genes.
Protein–protein interactions can be analyzed with protein microarrays, which are similar to the microarrays used
for examining gene expression. With this technique, a large
number of different proteins are applied to a glass slide as a
series of spots, with each spot containing a different protein.
In one application, each spot is an antibody for a different
protein and is labeled with a tag that fluoresces when bound.
An extract of tissue is applied to the protein microarray. A
spot of fluorescence appears when a protein in the extract
binds to antibody, indicating the presence of that particular
protein in the tissue.

I

G

A profile of peaks
is produced.
Mass (m/z)

(d)
A computer program
compares the profile
with those of known
and predicted proteins.

Concepts
The proteome is the complete set of proteins found in a cell.
Techniques of protein separation and mass spectrometry are used
to identify the proteins present within a cell. Microarrays are used
to determine sets of interacting proteins. Structural proteomics
attempts to determine the structure of all proteins.

GIVPPTKVWYRAITNDEKTSLAFR

A match identifies
the protein.

14.32 Mass spectrometry is used to identify proteins.

381

382

Chapter 14

Concepts Summary
• Restriction endonucleases are enzymes that make double•




stranded cuts in DNA at specific base sequences.
DNA fragments can be separated with the use of gel
electrophoresis and visualized by staining the gel with a dye
that is specific for nucleic acids or by labeling the fragments
with a radioactive or chemical tag.
In gene cloning, a gene or a DNA fragment is placed into a
bacterial cell, where it will be multiplied as the cell divides.
Plasmids, small circular pieces of DNA, are often used as
vectors to ensure that a cloned gene is stable and replicated
within the recipient cells. Expression vectors contain
sequences necessary for foreign DNA to be transcribed and
translated.

• Genomics is the field of genetics that attempts to understand





• The polymerase chain reaction is a method for amplifying
DNA enzymatically without cloning.

• Genes can be isolated by creating a DNA library—a set of













bacterial colonies or viral plaques that each contain a different
cloned fragment of DNA. A genomic library contains the
entire genome of an organism; a cDNA library contains
DNA fragments complementary to all the different mRNAs
in a cell.
Positional cloning uses linkage relations to determine
the location of genes without any knowledge of their
products.
The Sanger (dideoxy) method of DNA sequencing uses
special substrates for DNA synthesis (dideoxynucleoside
triphosphates, ddNTPs) that terminate synthesis after they
are incorporated into the newly made DNA.
Short tandem repeats (STRs) and microsatellites are used
to identify people by their DNA sequences (DNA
fingerprinting).
Forward genetics begins with a phenotype and conducts
analyses to locate the responsible genes. Reverse genetics starts
with a DNA sequence and conducts analyses to determine its
phenotypic effect.
Transgenic animals, produced by injecting DNA into
fertilized eggs, contain foreign DNA that is integrated into a
chromosome. Knockout mice are transgenic mice in which a
normal gene is disabled. Knock-in mice are transgenic mice
in which a particular DNA sequence is inserted into a known
location.
The mouse Mus musculus is an excellent model genetic organism because of its similarity to humans, small size, and short
generation time.
RNA interference is used to silence the expression of specific
genes.
Techniques of molecular genetics are being used to create
products of commercial importance, to develop diagnostic
tests, and to treat diseases.

the content, organization, and function of genetic information
contained in whole genomes.
Genetic maps position genes relative to other genes by
determining rates of recombination and are measured in
percent recombination. Physical maps are based on the
physical distances between genes and are measured in base
pairs.
The Human Genome Project is an effort to determine the
entire sequence of the human genome.
Sequencing a whole genome requires breaking the
genome into small overlapping fragments whose DNA
sequences can be determined in sequencing reactions. The
individual sequences can be ordered into a whole-genome
sequence with the use of a map-based approach, in which
fragments are assembled in order by using previously created
genetic and physical maps, or with the use of a wholegenome shotgun approach, in which overlap between
fragments is used to assemble them into a whole-genome
sequence.

• Single-nucleotide polymorphisms are single-base differences











in DNA between individual organisms and are valuable as
markers in linkage studies.
Bioinformatics is a synthesis of molecular biology and
computer science that develops tools to store, retrieve, and
analyze DNA-, cDNA-, and protein-sequence data.
Homologous genes are evolutionarily related. Gene
function may be determined by looking for homologous
sequences whose function has been previously
determined.
A microarray consists of DNA fragments fixed in an orderly
pattern to a solid support, such as a nylon filter or glass slide.
Microarrays can be used to monitor the expression of
thousands of genes simultaneously.
Most prokaryotic species have between 1 million and 3 million
base pairs of DNA and from 1000 to 2000 genes. Compared
with that of eukaryotic genomes, the density of genes in
prokaryotic genomes is relatively uniform, with about one
gene per 1000 bp.
Eukaryotic genomes are larger and more variable in size than
prokaryotic genomes. There is no clear relation between
organismal complexity and the amount of DNA or number
of genes among multicellular organisms.
Proteomics determines the protein content of a cell and
the functions of those proteins. Proteins within a cell
can be separated and identified with the use of mass
spectrometry.

Molecular Genetic Analysis, Biotechnology, and Genomics

383

Important Terms
recombinant DNA technology (p. 348)
genetic engineering (p. 348)
biotechnology (p. 348)
restriction enzyme (p. 349)
restriction endonuclease (p. 349)
cohesive end (p. 349)
gel electrophoresis (p. 351)
autoradiography (p. 351)
probe (p. 352)
gene cloning (p. 352)
cloning vector (p. 352)
cosmid (p. 354)
bacterial artificial chromosome (BAC)
(p. 354)
expression vector (p. 354)
polymerase chain reaction (PCR)
(p. 355)
Taq polymerase (p. 356)
DNA library (p. 356)
genomic library (p. 356)

cDNA library (p. 356)
positional cloning (p. 358)
restriction fragment length
polymorphism (RFLP) (p. 358)
DNA sequencing (p. 359)
dideoxyribonucleoside triphosphate
(ddNTP) (p. 359)
DNA fingerprinting (p. 361)
microsatellite (p. 362)
short tandem repeat (STR) (p. 362)
forward genetics (p. 364)
reverse genetics (p. 364)
transgene (p. 364)
knockout mice (p. 365)
knock-in mice (p. 365)
gene therapy (p. 368)
genomics (p. 369)
structural genomics (p. 369)
genetic map (p. 369)

physical map (p. 369)
map-based sequencing (p. 371)
contig (p. 371)
whole-genome shotgun sequencing
(p. 373)
single-nucleotide polymorphism (SNP)
(p. 374)
haplotype (p. 374)
bioinformatics (p. 374)
functional genomics (p. 375)
transcriptome (p. 375)
proteome (p. 375)
homologous genes (p. 375)
microarray (p. 375)
comparative genomics (p. 376)
gene desert (p. 379)
proteomics (p. 381)
mass spectrometry (p. 381)
protein microarray (p. 381)

Answers to Concept Checks
1. First, the gene must be located and isolated from the rest of the
genomic DNA. Then, the gene must be inserted into bacteria in a
form that is stable and will be replicated. The gene must be placed
in the bacteria in a way that ensures that it will be transcribed and
translated. Finally, those bacteria that have taken up an active
form of the gene must be separated from other bacteria.
2. Restriction enzymes exist naturally in bacteria, which use
them to prevent the entry of viral DNA.
3. c
4. The gene and plasmid are cut with the same restriction
enzyme and mixed together. DNA ligase is used to seal nicks in
the sugar–phosphate bonds.

5. d
6. b
7. b
8. b
9. Species with larger genomes generally have more genes than
do species with smaller genomes, and so gene density is relatively
constant.

Worked Problems
1. A molecule of double-stranded DNA that is 5 million base
pairs long has a base composition that is 62% G ϩ C. How many
times, on average, are the following restriction sites likely to be
present in this DNA molecule?
a. BamHI (recognition sequence is GGATCC)
b. HindIII (recognition sequence is AAGCTT)
c. HpaII (recognition sequence is CCGG)

• Solution
The percentages of G and C are equal in double-stranded DNA; so,
if G ϩ C ϭ 62%, then %G ϭ %C ϭ 62%/2 ϭ 31%. The
percentage of A ϩ T ϭ (100% Ϫ G ϩ C) ϭ 38%, and %A= %T ϭ
38%/2 ϭ 19%. To determine the probability of finding a particular

base sequence, we use the multiplication rule, multiplying together
the probably of finding each base at a particular site.
a. The probability of finding the sequence GGATCC ϭ 0.31 ϫ
0.31 ϫ 0.19 ϫ 0.19 ϫ 0.31 ϫ 0.31 ϭ 0.0003333. To determine
the average number of recognition sequences in a 5-million-basepair piece of DNA, we multiply 5,000,000 bp ϫ 0.00033 ϭ 1666.5
recognition sequences.
b. The number of AAGCTT recognition sequences is 0.19 ϫ
0.19 ϫ 0.31 ϫ 0.31 ϫ 0.19 ϫ 0.19 ϫ 5,000,000 ϭ 626
recognition sequences.
c. The number of CCGG recognition sequences is 0.31 ϫ
0.31 ϫ 0.31 ϫ 0.31 ϫ 5,000,000 ϭ 46,176 recognition
sequences.

384

Chapter 14

2. You are given the following DNA fragment to sequence:
5Ј-GCTTAGCATC-3Ј. You first clone the fragment in bacterial
cells to produce sufficient DNA for sequencing. You isolate the
DNA from the bacterial cells and carry out the dideoxy-sequencing
method. You then separate the products of the polymerization
reactions by gel electrophoresis. Draw the bands that should
appear on the gel from the four sequencing reactions.

• Solution
In the dideoxy-sequencing reaction, the original fragment is used
as a template for the synthesis of a new DNA strand; the sequence
of the new strand is the sequence that is actually determined. The
first task, therefore, is to write out the sequence of the newly
synthesized fragment, which will be complementary and
antiparallel to the original fragment. The sequence of the newly
synthesized strand, written 5Ј : 3Ј is: 5Ј-GATGCTAAGC-3Ј.
Bands representing this sequence will appear on the gel, with the
bands representing nucleotides near the 5Ј end of the molecule at
the bottom of the gel.

Let’s begin to determine the location of these sites by
examining the HpaII fragments. Notice that the 21-kb fragment
produced when the DNA is cut by HpaII is not present in the
fragments produced when the DNA is cut by BamHI and HpaII
together (the double digest); this result indicates that the 21-kb
HpaII fragment has within it a BamHI site. If we examine the
fragments produced by the double digest, we see that the 20-kb
and 1-kb fragments sum to 21 kb; so a BamHI site must be 20 kb
from one end of the fragment and 1 kb from the other end.
Bam HI site

20 kb

1 kb

Similarly, we see that the 9-kb HpaII fragment does not appear
in the double digest and that the 5-kb and 4-kb fragments in the
double digest add up to 9 kb; so another BamHI site must be 5 kb
from one end of this fragment and 4 kb from the other end.
Bam HI site

Reaction containing
5 kb

ddATP ddTTP ddCTP ddGTP
Origin

4 kb

Now, let’s examine the fragments produced when the DNA is
cut by BamHI alone. The 20-kb and 4-kb fragments are also
present in the double digest; so neither of these fragments contains
an HpaII site. The 6-kb fragment, however, is not present in the
double digest, and the 5-kb and 1-kb fragments in the double
digest sum to 6 kb; so this fragment contains an HpaII site that is
5 kb from one end and 1 kb from the other end.
Hpa II site

5 kb 1 kb

3. A linear piece of DNA that is 30 kb long is first cut with
BamHI, then with HpaII, and, finally, with both BamHI and
HpaII together. Fragments of the following sizes were obtained
from this reaction:
BamHI: 20-kb, 6-kb, and 4-kb fragments
HpaII: 21-kb and 9-kb fragments
BamHI and HpaII: 20-kb, 5-kb, 4-kb, and 1-kb fragments
Draw a restriction map of the 30-kb piece of DNA, indicating the
locations of the BamHI and HpaII restriction sites.

• Solution
This problem can be solved correctly through a variety of
approaches; this solution applies one possible approach.
When cut by BamHI alone, the linear piece of DNA is
cleaved into three fragments; so there must be two BamHI
restriction sites. When cut with HpaII alone, a clone of the same
piece of DNA is cleaved into only two fragments; so there is a
single HpaII site.

We have accounted for all the restriction sites, but we must
still determine the order of the sites on the original 30-kb
fragment. Notice that the 5-kb fragment must be adjacent to both
the 1-kb and the 4-kb fragments; so it must be in between these
two fragments.
Bam HI site

Hpa II site
1 kb 5 kb

4 kb

We have also established that the 1-kb and 20-kb fragments
are adjacent; because the 5-kb fragment is on one side, the 20-kb
fragment must be on the other, completing the restriction map:
BamHI site
BamHI site HpaII site

20 kb

1 kb 5 kb

4 kb

Molecular Genetic Analysis, Biotechnology, and Genomics

385

Comprehension Questions
Section 14.1
1. What role do restriction enzymes play in bacteria? How do
bacteria protect their own DNA from the action of
restriction enzymes?
*2. Explain how gel electrophoresis is used to separate DNA
fragments of different lengths.
*3 Give three important characteristics of cloning vectors.
*4. Briefly explain how an antibiotic-resistance gene and the
lacZ gene can be used as markers to determine which cells
contain a particular plasmid.
*5. Briefly explain how the polymerase chain reaction is used to
amplify a specific DNA sequence.

Section 14.2
*6. How does a genomic library differ from a cDNA library?
How is a genomic library created?
7. Briefly explain how a gene can be isolated through positional cloning.

Section 14.3
8. What is the purpose of the dideoxynucleoside triphosphate
in the dideoxy sequencing reaction?
*9. What is DNA fingerprinting? What types of sequences are
examined in DNA fingerprinting?

Section 14.4
10. How does a reverse genetics approach differ from a forward
genetics approach?
*11. What are knockout mice and for what are they used?

Section 14.5
13. What is gene therapy?

Section 14.6
*14. What is the difference between a genetic map and a physical
map? Which generally has higher resolution and accuracy
and why?
*15. What is the difference between a map-based approach to
sequencing a whole genome and a whole-genome shotgun
approach?
16. What is a single-nucleotide polymorphism (SNP)? How are
SNPs used in genomic studies?
17. What is a haplotype?

Section 14.7
18. What is a microarray? How can it be used to obtain
information about gene function?

Section 14.8
19. What is the relation between genome size and gene number
in prokaryotes?
20. DNA content varies considerably among different multicellular organisms. Is this variation closely related to the number of genes and the complexity of the organism? If not,
what accounts for the differences?
21. What is a gene desert?
22. How does proteomics differ from genomics?
23. How is mass spectrometry used to identify proteins in a
cell?

12. How is RNA interference used in the analysis of gene
function?

Application Questions and Problems
Section 14.1
24. How often, on average, would you expect a restriction
endonuclease to cut a DNA molecule if the recognition
sequence for the enzyme had 5 bp? (Assume that the four
types of bases are equally likely to be found in the DNA and
that the bases in a recognition sequence are independent.)
How often would the endonuclease cut the DNA if the
recognition sequence had 8 bp?
*25. A microbiologist discovers a new restriction endonuclease.
When DNA is digested by this enzyme, fragments that
average 1,048,500 bp in length are produced. What is the
most likely number of base pairs in the recognition
sequence of this enzyme?

*26. Restriction mapping of a linear piece of DNA reveals the
following EcoRI restriction sites.
EcoRI site 1
2 kb

EcoRI site 2
4 kb

5 kb

a. This piece of DNA is cut by EcoRI, the resulting
fragments are separated by gel electrophoresis, and the
gel is stained with ethidium bromide. Draw a picture of
the bands that will appear on the gel.
b. If a mutation that alters EcoRI site 1 occurs in this piece
of DNA, how will the banding pattern on the gel differ
from the one that you drew in part a?

386

Chapter 14

c. If mutations that alter EcoRI sites 1 and 2 occur in this
piece of DNA, how will the banding pattern on the gel
differ from the one that you drew in part a?
d. If 1000 bp of DNA were inserted between the two
restriction sites, how would the banding pattern
on the gel differ from the one that you drew in
part a?
e. If 500 bp of DNA between the two restriction sites
were deleted, how would the banding pattern
on the gel differ from the one that you drew in
part a?
*27. Which vectors (plasmid, phage ␭, cosmid, bacterial artificial
chromosome) can be used to clone a continuous fragment
of DNA with the following lengths?
a. 4 kb
c. 35 kb
b. 20 kb
d. 100 kb
28. A geneticist uses a plasmid for cloning that has the lacZ
gene and a gene that confers resistance to penicillin. The
geneticist inserts a piece of foreign DNA into a restriction
site that is located within the lacZ gene and uses the
plasmid to transform bacteria. Explain how the geneticist
can identify bacteria that contain a copy of a plasmid with
the foreign DNA.

Reaction containing
ddATP ddTTP ddCTP ddGTP
Origin

*31. Suppose that you are given a short fragment of DNA to
sequence. You clone the fragment, isolate the cloned DNA
fragment, and set up a series of four dideoxy reactions. You
then separate the products of the reactions by gel
electrophoresis and obtain the following banding pattern:
Reaction containing
ddATP ddTTP ddCTP ddGTP
Origin

Section 14.2
*29. Suppose that you have just graduated from college and have
started working at a biotechnology firm. Your first job
assignment is to clone the pig gene for the hormone
prolactin. Assume that the pig gene for prolactin has not yet
been isolated, sequenced, or mapped; however, the mouse
gene for prolactin has been cloned and the amino acid
sequence of mouse prolactin is known. Briefly explain two
different strategies that you might use to find and clone the
pig gene for prolactin.

Section 14.3

Write out the base sequence of the original fragment that
you were given.
Original sequence: 5Ј– __________________ –3Ј
32. The adjoining autoradiograph is from the original study
DATA
that first sequenced the cystic fibrosis gene (J. R. Riordan et
ANALYSIS al. 1989. Science 245:1066–1073). From the autoradiograph,
determine the sequence of the normal copy of the gene and
the sequence of the mutated copy of the gene. Identify the
location of the mutation that causes cystic fibrosis (CF).
CTAG CTAG

30. Suppose that you want to sequence the following DNA
fragment:
5Ј–TCCCGGGAAA-primer site–3Ј
You first clone the fragment in bacterial cells to produce
sufficient DNA for sequencing. You isolate the DNA
from the bacterial cells and apply the dideoxy sequencing
method. You then separate the products of the
polymerization reactions by gel electrophoresis. Draw
the bands that should appear on the gel from the four
sequencing reactions.

DNA from a healthy person

DNA from a person with CF

Molecular Genetic Analysis, Biotechnology, and Genomics

Section 14.4
*33. You have discovered a gene in mice that is similar to a gene
in yeast. How might you determine whether this gene is
essential for development in mice?

Section 14.6

6

*34. A piece of DNA that is 14 kb long is cut first by EcoRI alone,
then by SmaI alone, and, finally, by both EcoRI and SmaI
together. The following results are obtained:
Digestion by
EcoRI alone
3-kb fragment
5-kb fragment
6-kb fragment

1

Digestion by
SmaI alone
7-kb fragment
7-kb fragment

11
16
21

Digestion by both
EcoRI and SmaI
2-kb fragment
3-kb fragment
4-kb fragment
5-kb fragment

387

fluorescent nucleotides. The cDNAs from the resistant and
nonresistant cells are mixed and hybridized to a chip
containing spots of DNA from genes 1
through 25. The results are shown in
2 3 4 5
the adjoining illustration. What
conclusions can you make about which
7 8 9 10
genes might be implicated in antibiotic
12 13 14 15
resistance in these bacteria? How might
17 18 19 20
this information be used to design new
antibiotics that are less vulnerable to
22 23 24 25
resistance?

Section 14.8

Draw a map of the EcoRI and SmaI restriction sites on this
14-kb piece of DNA, indicating the relative positions of the
restriction sites and the distances between them.

Section 14.7
35. Microarrays can be used to determine the levels of gene
expression. In one type of microarray, hybridization of
the red (experimental) and green (control) cDNAs is
proportional to the relative amounts of mRNA in the
samples. Red indicates the overexpression of a gene and
green indicates the underexpression of a gene in the
experimental cells relative to the control cells, yellow
indicates equal expression in experimental and control
cells, and no color indicates no expression in either
experimental or control cells.
In one experiment, mRNA from a strain of antibioticresistant bacteria (experimental cells) is converted into
cDNA and labeled with red fluorescent nucleotides; mRNA
from a nonresistant strain of the same bacteria (control
cells) is converted into cDNA and labeled with green

36. Dictyostelium discoideum is a soil-dwelling, social amoeba:
DATA much of the time, the organism consists of single, solitary
cells, but, during times of starvation, individual amoebae
ANALYSIS
come together to form aggregates that have many
characteristics of multicellular organisms. Biologists have
long debated whether D. discoideum is a unicellular or
multicellular organism. In 2005, the genome of D.
discoideum was completely sequenced. The table at the
bottom of the page lists some genomic characteristics of
D. discoideum and other eukaryotes (L. Eichinger et al.
2005. Nature 435:43–57).
a. On the basis of the organisms listed in the table other
than D. discoideum, what are some differences in
genome characteristics between unicellular and
multicellular organisms?
b. On the basis of these data, do you think the genome of
D. discoideum is more like those of other unicellular
eukaryotes or more like those of multicellular eukaryotes? Explain your answer.
37. What are some of the major differences in the ways in
which genetic information is organized in the genomes of
prokaryotes versus eukaryotes?

Table for Problem 36
Feature

Organism
Cellularity
Genome size (millions bp)
Number of genes
Average gene length (bp)
Genes with introns (%)
Mean number of introns*
Mean intron size (bp)
*nd ϭ not determined

Dictyostelium
discoideum

Plasmodium
falciparium

amoeba
?
34
12,500
1,756
69
1.9
146

malaria parasite
uni
23
5,268
2,534
54
2.6
179

Saccharhomyces Arabidopsis
cerevisiae
thaliana

yeast
uni
13
5,538
1,428
5
1.0
nd*

plant
multi
125
25,498
2,036
79
5.4
170

Drosophila
melanogaster

Caenorhabditis
elegans

Homo
sapiens

fruit fly
multi
180
13,676
1,997
38
4.0
nd*

worm
multi
103
19,893
2,991
5
5.0
270

human
multi
2,851
22,287
27,000
85
8.1
3,365