close

Вход

Забыли?

вход по аккаунту

?

Evolution of regulatory interactions in bacteria

код для вставкиСкачать
Comparative genomics and
evolution of regulatory
interactions in bacteria
Mikhail Gelfand
Research and Training Center of Bioinformatics
Institute for Information Transmission Problems
Russian Academy of Sciences
September 2006
Это – ряд наблюдений. В углу – тепло.
Взгляд оставляет на вещи след.
Вода представляет собой стекло.
Человек страшней, чем его скелет.
Иосиф Бродский
A list of some observations. In a corner, it’s warm.
A glance leaves an imprint on anything it’s dwelt on.
Water is glass’s most public form.
Man is more frightening than its skeleton.
Joseph Brodsky
Basic assumptions and techniques
• Phylogenetic footprinting (Ross Hardison, eukaryotes, 1988):
regulatory (transcription factor-binding) sites are more
conserved than surrounding non-coding regions
=> TF-binding sites are seen as conserved islands in
multiple alignments of gene upstream regions.
• Works for close genomes (e.g. E.coli – Salmonella, sometimes
Yersinia), where upstream regions are alignable.
• Ignores site turnover
• Consistency filtering (Gelfand and Mironov, 1999, bacteria):
regulatory systems are biologically reasonable
=> regulons are conserved (more or less)
=> true sites occur upstream of orthologous genes
(false sites are scattered at random)
• need to take care of the operon structure
• assumes conservation of TF-binding motif in DNA
• ignores evolution of regulatory systems
Conserved motif upstream of nrd genes
Identification of the candidate regulator
by the analysis of phyletic patterns
• COG1327: the only COG with exactly the
same phylogenetic pattern as the motif
– “large scale” on the level of major taxa
– “small scale” within major taxa:
• absent in small parasites among alpha- and gammaproteobacteria
• absent in Desulfovibrio spp. among delta-proteobacteria
• absent in Nostoc sp. among cyanobacteria
• absent in Oenococcus and Leuconostoc among Firmicutes
• present only in Treponema denticola among four spirochetes
COG1327 “Predicted transcriptional regulator,
consists of a Zn-ribbon and ATP-cone domains”:
regulator of the riboflavin pathway?
Additional evidence – 1
• nrdR is
sometimes
clustered
with nrd
genes or
with
replication
genes
dnaB, dnaI,
polA
Additional evidence – 2
• In some genomes,
candidate NrdRbinding sites are
found upstream of
other replicationrelated genes
– dNTP salvage
– topoisomerase I,
replication initiator
dnaA, chromosome
partitioning, DNA
helicase II
Multiple sites (nrd genes): FNR, DnaA, NrdR
Mode of regulation
• Repressor (overlaps with promoters)
• Co-operative binding:
– most sites occur in tandem (> 90% cases)
– the distance between the copies (centers of
palindromes) equals an integer number of DNA turns:
• mainly (94%) 30-33 bp, in 84% 31-32 bp – 3 turns
• 21 bp (2 turns) in Vibrio spp.
• 41-42 bp (4 turns) in some Firmicutes
• experimental confirmation in Streptomyces
(Borovok et al. 2004, Grinberg et al. 2006) and
in E. coli (Grinberg et al. 2006)
Evolutionary processes that
shape regulatory systems
• Expansion and contraction of regulons
(birth or death of sites)
• Duplications of regulators
(with or without regulated loci)
• Loss of regulators
(with or without regulated loci)
• Re-assortment of regulators and structural genes
• … especially in complex systems
• Change of regulator specificity
• Horizontal transfer
Birth and death of sites is a very dynamic
process (even in bacteria)
NadR-binding sites upstream of pncB seem absent in
Klebsiella pneumoniae and Serratia marcescens
… but there are candidate sites further upstream …
… and they are clearly diferent (not simply misaligned).
Loss of regulators and cryptic sites
Loss of RbsR in Y. pestis
(ABC-transporter also is lost)
RbsR binding site
Start codon of rbsD
Regulon expansion:
how FruR has become CRA
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Common ancestor of Enterobacteriales
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Enterobacteriales
Common ancestor of Escherichia and Salmonella
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Enterobacteriales
E. coli and Salmonella spp.
Trehalose/maltose catabolism, alpha-proteobacteria
Duplicated LacI-family
regulators: lineage-specific
post-duplication loss
The binding motifs are very similar (the blue branch is
somewhat different: to avoid cross-recognition?)
Utilization of maltose/maltodextrin, Firmicutes
Displacement: invasion of a regulator from a
different subfamily (horizontal transfer from a
related species?) – “blue” sites
Orthologous TFs with
completely different regulons
(alpha-proteobaceria and
Xanthomonadales)
Utilization of an unknown galactoside in
gamma-proteobacteria
Yersinia and Klebsiella: two regulons, GalR (not shown,
includes genes galK and galT) and Laci-X
Erwinia: one regulon, GalR
Loss of regulator and merger of regulons:
It seems that laci-X was present in the
common ancestor (Klebsiella is an
outgroup)
Catabolism of gluconate, proteobacteria
extreme variability of regulation of “marginal” regulon members
β
Pseudomonas spp.
γ
Combined regulatory network for iron homeostasis genes in in a-proteobacteria.
[- Fe]
[+Fe]
[ - Fe]
[+Fe]
RirA
RirA
Irr
Irr
FeS
heme
degraded
Siderophore
uptake
2+
3+
Fe / Fe
uptake
Iron uptakesystems
Fur
[- Fe]
Iron storage
ferritins
FeS
synthesis
Heme
synthesis
Iron-requiring
enzymes
[ironcofactor]
Fur
IscR
Fe
FeS
Transcription
factors
FeS status
of cell
[+Fe]
The connecting line denote regulatory interactions, which the thickness reflecting the frequency of the interaction in the
analyzed genomes. The suggested negative or positive mode of operation is shown by dead-end and arrow-end of the line.
Fe and Mn regulons
Rhizobiaceae
Organism
Abb.
Irr
MUR /
FUR
MntR
RirA
IscR
Sinorhizobium meliloti
SM
+
+
-
+
-
+ +
+
-
+
-
Rhizobium leguminosarum
RL
Rhizobium etli
RHE
+
+
-
+
-
Agrobacterium tumefaciens
AGR
+
+
-
+
-
ML
+
-
+
+
-
MBNC
+
+ +
-
+
-
+
-
+
-
+
-
Mesorhizobium loti
Mesorhizobium
sp. BNC1
Brucella melitensis
Rhizobiales
Rhodobacteraceae
BQ
+
+
Bradyrhizobium japonicum
BJ
Rhodopseudomonas palustris
RPA
+ +
+ +
+
+
-
-
-
Nitrobacter hamburgensis
Nham
+
+
-
-
-
Nitrobacter winogradskyi
Nwi
+
+
-
-
-
Rhodobacter capsulatus
RC
-
Rhodobacter sphaeroides
Rsph
+
+
+
+
-
+
+
+
+
Silicibacter
STM
+
+
-
+
+
Silicibacter pomeroyi
S PO
+
+
-
+
+
Jannaschia
Jann
+
+
-
#?
+
+
+
+
quintana
and spp.
sp. TM1040
sp.CC51
HTCC2654
Rhodobacterales bacterium
Roseobacter
sp. MED193
Roseovarius nubinhibens
- proteobacteria
Rhodobacterales
Roseovarius
ISM
sp.217
Loktanella vestfoldensis
Sulfitobacter sp.
SKA53
EE-36
RB2654
+
+
-
MED193
+
+
-
ISM
+
+
-
+
#?
ROS217
+
+
-
+
+
SKA53
+
+
-
#?
+
EE36
+
+
-
#?
#?
+
OB2597
+
+
OA2633
-
+
-
-
+
CC
-
+
-
-
+
PB2503
-
+
-
-
+
Erythrobacter litoralis
ELI
-
-
Novosphingobium aromaticivorans
Saro
-
+
+
-
-
+
+
Sphinopyxis
g
alaskensis
HTCC2597
Oceanicola batsensis
HTCC2633
Oceanicaulis alexandrii
Caulobacterales
Caulobacter crescentu
s
Parvularculales
Parvularcula bermudensis
Rhodospirillales
SAR11 cluster
Rickettsiales
HTCC2503
Sala
-
+
-
-
+
ZM
-
+
-
-
+
Gluconobacter oxydans
GOX
-
+
-
+
Rhodospirillum rubrum
Rrub
-
+
+
-
-
+ +
Magnetospirillum magneticum
Amb
-
+ +
-
-
+
PU1002
+
+
-
-
+
-
-
-
-
+
Pelagibacter ubique
Rickettsia
HTCC1002
and Ehrlichia
species
B.
C.
+
Zymomonas mobilis
RB2256
A.
Distribution of
Irr,
Fur/Mur,
MntR,
RirA, and
IscR regulons
in α-proteobacteria
+
-
Hyphomonadaceae
Sphingomonadales
+
-
Bartonella
Bradyrhizobiaceae
BME
Group
D.
#?' in RirA column denotes
the absence of the rirA gene
in an unfinished genomic sequence
and the presence of candidate
RirA-binding sites upstream of
the iron uptake genes.
Fe and Mn regulons
Rhizobiaceae
Organism
Abb.
Irr
MUR /
FUR
MntR
RirA
IscR
Sinorhizobium meliloti
SM
+
+
-
+
-
+ +
+
-
+
-
Rhizobium leguminosarum
RL
Rhizobium etli
RHE
+
+
-
+
-
Agrobacterium tumefaciens
AGR
+
+
-
+
-
ML
+
-
+
+
-
MBNC
+
+ +
-
+
-
+
-
+
-
+
-
Mesorhizobium loti
Mesorhizobium
sp. BNC1
Brucella melitensis
Rhizobiales
Rhodobacteraceae
BQ
+
+
Bradyrhizobium japonicum
BJ
Rhodopseudomonas palustris
RPA
+ +
+ +
+
+
-
-
-
Nitrobacter hamburgensis
Nham
+
+
-
-
-
Nitrobacter winogradskyi
Nwi
+
+
-
-
-
Rhodobacter capsulatus
RC
-
Rhodobacter sphaeroides
Rsph
+
+
+
+
-
+
+
+
+
Silicibacter
STM
+
+
-
+
+
Silicibacter pomeroyi
S PO
+
+
-
+
+
Jannaschia
Jann
+
+
-
#?
+
+
-
+
+
quintana
and spp.
sp. TM1040
sp.CC51
HTCC2654
Rhodobacterales bacterium
Roseobacter
sp. MED193
Roseovarius nubinhibens
- proteobacteria
Rhodobacterales
Roseovarius
ISM
sp.217
Loktanella vestfoldensis
Sulfitobacter sp.
SKA53
EE-36
RB2654
MED193
+
+
-
+
+
-
+
#?
+
ISM
ROS217
+
+
-
+
+
SKA53
+
+
-
#?
+
EE36
+
+
-
#?
#?
+
+
+
OA2633
-
+
-
-
+
CC
-
+
-
-
+
PB2503
-
+
-
-
+
Erythrobacter litoralis
ELI
-
-
Novosphingobium aromaticivorans
Saro
-
+
+
-
-
+
+
Sphinopyxis
g
alaskensis
HTCC2633
Oceanicaulis alexandrii
Caulobacterales
Caulobacter crescentu
s
Parvularculales
Parvularcula bermudensis
SAR11 cluster
Rickettsiales
HTCC2503
Sala
-
+
-
-
+
ZM
-
+
-
-
+
Gluconobacter oxydans
GOX
-
+
+
Rrub
-
+
+
-
-
Rhodospirillum rubrum
-
+ +
Magnetospirillum magneticum
Amb
-
+ +
-
-
+
PU1002
+
+
-
-
+
-
-
-
-
+
Pelagibacter ubique
Rickettsia
HTCC1002
and Ehrlichia
species
Distribution of
Irr,
Fur/Mur,
MntR,
RirA, and
IscR regulons
in α-proteobacteria
B.
C.
+
Zymomonas mobilis
RB2256
A.
+
OB2597
HTCC2597
Oceanicola batsensis
Rhodospirillales
+
-
Hyphomonadaceae
Sphingomonadales
+
-
Bartonella
Bradyrhizobiaceae
BME
Group
D.
#?' in RirA column
denotes
the absence of the
rirA gene
in an unfinished
genomic sequence
and the presence of
candidate RirAbinding sites
upstream of the
iron uptake genes.
Fe and Mn regulons
Rhizobiaceae
Organism
Abb.
Irr
MUR /
FUR
MntR
RirA
IscR
Sinorhizobium meliloti
SM
+
+
-
+
-
+ +
+
-
+
-
Rhizobium leguminosarum
RL
Rhizobium etli
RHE
+
+
-
+
-
Agrobacterium tumefaciens
AGR
+
+
-
+
-
ML
+
-
+
+
-
MBNC
+
+ +
-
+
-
+
-
+
-
+
-
Mesorhizobium loti
Mesorhizobium
sp. BNC1
Brucella melitensis
Rhizobiales
Rhodobacteraceae
BQ
+
+
Bradyrhizobium japonicum
BJ
Rhodopseudomonas palustris
RPA
+ +
+ +
+
+
-
-
-
Nitrobacter hamburgensis
Nham
+
+
-
-
-
Nitrobacter winogradskyi
Nwi
+
+
-
-
-
Rhodobacter capsulatus
RC
-
Rhodobacter sphaeroides
Rsph
+
+
+
+
-
+
+
+
+
Silicibacter
STM
+
+
-
+
+
Silicibacter pomeroyi
S PO
+
+
-
+
+
Jannaschia
Jann
+
+
-
#?
+
RB2654
+
+
-
+
+
MED193
+
+
-
+
ISM
+
+
-
+
#?
ROS217
+
+
-
+
+
SKA53
+
+
-
#?
+
EE36
+
+
-
#?
#?
+
quintana
and spp.
sp. TM1040
sp.CC51
HTCC2654
Rhodobacterales bacterium
Roseobacter
sp. MED193
Roseovarius nubinhibens
- proteobacteria
Rhodobacterales
Roseovarius
ISM
sp.217
Loktanella vestfoldensis
Sulfitobacter sp.
SKA53
EE-36
OB2597
+
+
OA2633
-
+
-
-
+
CC
-
+
-
-
+
PB2503
-
+
-
-
+
Erythrobacter litoralis
ELI
-
-
Novosphingobium aromaticivorans
Saro
-
+
+
-
-
+
+
Sphinopyxis
g
alaskensis
HTCC2597
Oceanicola batsensis
HTCC2633
Oceanicaulis alexandrii
Caulobacterales
Caulobacter crescentu
s
Parvularculales
Parvularcula bermudensis
Rhodospirillales
SAR11 cluster
Rickettsiales
HTCC2503
Sala
-
+
-
-
+
ZM
-
+
-
-
+
Gluconobacter oxydans
GOX
-
+
+
Rrub
-
+
+
-
-
Rhodospirillum rubrum
-
+ +
Magnetospirillum magneticum
Amb
-
+ +
-
-
+
PU1002
+
+
-
-
+
-
-
-
-
+
Pelagibacter ubique
Rickettsia
HTCC1002
and Ehrlichia
species
B.
Not RirA.
IscR?
C.
+
Zymomonas mobilis
RB2256
A.
Distribution of
Irr,
Fur/Mur,
MntR,
RirA, and
IscR regulons
in α-proteobacteria
+
-
Hyphomonadaceae
Sphingomonadales
+
-
Bartonella
Bradyrhizobiaceae
BME
Group
#?' in RirA column denotes
the absence of the rirA gene
in an unfinished genomic sequence
and the presence of candidate
RirA-binding sites upstream of
D.the iron uptake genes.
UPDATE: the
genomes finished,
still no rirA gene.
Distribution of the conserved members of the Fe- and Mn-responsive regulons
and the predicted RirA, Fur/Mur, Irr, and DtxR binding sites in a-proteobacteria
Genes Functions:
Iron uptake
Iron storage
FeS synthesis
Iron usage
Heme biosynthesis
Regulatory genes
Manganese uptake
Phylogenetic tree of the Fur family of transcription factors in a-proteobacteria - I
Fur
sp|
Escherichia coli: P0A9A9
ECOLI
Pseudomonas aeruginosa
PSEAE
NEIMA
Neisseria meningitidis
: sp|Q03456
: sp|P0A0S7
Fur in g- and b- proteobacteria
HELPY Helicobacter pylori : sp|O25671
Bacillus subtilis : P54574
sp|
BACSU
SM mur
Sinorhizobium meliloti
Mesorhizobium sp. BNC1 (I)
MBNC03003179
BQ fur2
Bartonella quintana
BMEI0375
Brucella melitensis
EE36 12413 Sulfitobacter sp. EE-36
MBNC03003593Mesorhizobium sp. BNC1 (II)
HTCC2654
Rhodobacterales bacterium
RB2654 19538
Agrobacterium
tumefaciens
AGR C 620
RHE_CH00378 Rhizobium etli
Rhizobium leguminosarum
RL mur
Nham 0990 Nitrobacter hamburgensis X14
Nwi 0013
Nitrobacter winogradskyi
Rhodopseudomonas palustris
RPA0450
Bradyrhizobium japonicum
BJ fur
Roseovarius sp.217
ROS217 18337
Jannaschia sp. CC51
Jann 1799
Silicibacter pomeroyi
SPO2477
STM1w01000993Silicibacter sp. TM1040
MED193 22541 Roseobacter sp. MED193
OB2597 02997 Oceanicola batsensis HTCC2597
Loktanella vestfoldensisSKA53
SKA53 03101
Rhodobacter sphaeroides
Rsph03000505
Roseovarius nubinhibens ISM
ISM 15430
PU1002 04436Pelagibacter ubiqueHTCC1002
GOX0771 Gluconobacter oxydans
Zmomonas
y
mobilis
ZM01411
Novosphingobium aromaticivorans
Saro02001148
Sphinopyxis alaskensis RB2256
Sala 1452
ELI1325
Erythrobacter litoralis
Oceanicaulis alexandrii HTCC2633
OA2633 10204
PB2503 04877 Parvularcula bermudensis HTCC2503
CC0057
Caulobacter crescentus
Rhodospirillum rubrum
Rrub02001143
(I)
Magnetospirillum magneticum
Amb1009
Magnetospirillum magneticum (II)
Amb4460
Fur in e- proteobacteria
Fur in Firmicutes
Mur
in a-proteobacteria
Regulator of manganese
uptake genes (sit, mntH)
Fur
in a-proteobacteria
Regulator of iron uptake
and metabolism genes
Irr
a-proteobacteria
Erythrobacter litoralis
Caulobacter crescentus
Zymomonas mobilis
Novosphingobium aromaticivorans
Oceanicaulis alexandrii
Sphinopyxis alaskensis
Gluconobacter oxydans
Rhodospirillum rubrum
Parvularcula bermudensis -
Magnetospirillum magneticum
Identified Mur-binding sites
The A, B, and C groups
of a - proteobacteria
-
Sequence logos
for identified
Fur-binding sites
in the “other”
group of
a-proteobacteria
Bacillus subtilis
Mur
Escherichia coli
Sequence logos for
known
Fur-binding sites
in Escherichia coli
and Bacillus subtilis
Phylogenetic tree of the Fur family of transcription factors in a-proteobacteria - II
Fur
Escherichia coli : P0A9A9
sp|
ECOLI
Pseudomonas aeruginosa : sp|Q03456
PSEAE
NEIMA
Fur in g- and b- proteobacteria
Neisseria meningitidis : sp|P0A0S7
HELPY Helicobacter pylori : sp|O25671
sp|
BACSU Bacillus subtilis : P54574
Fur in e- proteobacteria
Fur in Firmicutes
a-proteobacteria
Mur / Fur
Agrobacterium tumefaciens
AGR C 249
Sinorhizobium meliloti
SM irr
Rhizobium etli
RHE CH00106
Rhizobium leguminosarum (I)
RL irr1
RL irr2 Rhizobium leguminosarum (II)
Mesorhizobium loti
MLr5570
MBNC03003186 Mesorhizobium sp. BNC1
BQ fur1 Bartonella quintana
Brucella melitensis (I)
BMEI1955
Brucella melitensis (II)
BMEI1563
BJ blr1216 Bradyrhizobium japonicum (II)
RB2654 182 Rhodobacterales bacterium HTCC2654
Loktanella vestfoldensis SKA53
SKA53 01126
Roseovarius sp.217
ROS217 15500
Roseovarius nubinhibens ISM
ISM 00785
OB2597 14726 Oceanicola batsensis HTCC2597
Jann 1652 Jannaschia sp. CC51
Rsph03001693Rhodobacter sphaeroides
Sulfitobacter sp. EE-36
EE36 03493
STM1w01001534 Silicibacter sp. TM1040
Roseobacter sp. MED193
MED193 17849
SPOA0445
Silicibacter pomeroyi
Rhodobacter capsulatus
RC irr
RPA2339
Rhodopseudomonas palustris (I)
RPA0424*
Rhodopseudomonas palustris (II)
Bradyrhizobium japonicum (I)
BJ irr*
Nwi 0035* Nitrobacter winogradskyi
Nham 1013* Nitrobacter hamburgensis X14
PU1002 04361
Pelagibacter ubique HTCC1002
Irr in a-proteobacteria
regulator of iron
homeostasis
Sequence logos for the identified Irr binding sites in a-proteobacteria.
The A group (8 species) - Irr
The B group (4 species) - Irr
The C group (12 species) - Irr
Phylogenetic tree of the Rrf2 family of transcription factors in a-proteobacteria
Nitrite/NO-sensing regulator NsrR
(Nitrosomonas europeae, Escherichia coli)
ROS217_15206
Rsph03001477
RC NsrR
GOX0860
Amb1318
Nwi_0743
Iron repressor RirA
(Rhizobium leguminosarum)
SPOA0186
Ricket.
Sala_1049
Saro02000305
NE NsrR
OB2597_05195
ROS217_02155
ROS217_14291
SMc00785
RHE CH00735
AGR_C_344
Cysteine metabolism
repressor CymR
(Bacillus subtilis)
AGR_L_1131
SPO3722
RHE_CH02777
RL_3336
SPO1393
MBNC02000669
MLl1642
SMc02238
AGR_C_872
RHE_CH00547
OA2633_11510
RL RirA
BMEII0707
MLr1147
MBNC02002196
BQ04990
RC 0780
RB2654_19993
Rsph023178
SPO0432
MED193_09800
STM_634
Positional clustering of rrf2-like genes with:
iron uptake and storage genes;
Fe-S cluster synthesis operons;
genes involved in nitrosative stress protection;
sulfate uptake/assimilation genes;
CC0132
thioredoxin reductase;
SMc01160
BJ blr7974
carboxymuconolactone
RL_5159
AGR_L_2343
decarboxylase-family genes;
AGR_C_402
hmc cytochrome operon
NsrR
RirA
RL_619
ZMO0116
ROS217_16231
GOX0099
BS CymR
IscR-II
Rrub02000219
ZMO0422
Sala_1236
IscR
ELI0458
Saro3534
DV Rrf2
OA2633_03246
CC1866
EC IscR
Jann_2366
STM_3629
EE36_14302
SPO2025
Rsph023725
RC_0477
Rrub_1115
Amb0200
GOX1196
RPA0663
Ricket.
Cytochrome complex
regulator Rrf2
(Desulfovibrio vulgaris)
Iron-Sulfur cluster
synthesis repressor IscR
(Escherichia coli)
PB2503_ 09884
proteins with the conserved C-X(6-9)-C(4-6)-C motif within effector-responsive domain
proteins without a cysteine triad motif
Sequence logos for the identified RirA-binding sites in a-proteobacteria
The A group - RirA (8 species)
(12 (12
species)
The C group - quasi-RirA
genomes)
An attempt to reconstruct the history
Regulators and their binding motifs
• Subtle changes at close evolutionary
distances
• Cases of motif conservation at surprisingly
large distances
• Surprisingly similar motifs of unrelated
regulators: “site usurpation” (???)
• Correlation between contacting
nucleotides and amino acid residues
DNA motifs and protein-DNA interactions
Entropy at aligned sites and the number of contacts
(heavy atoms in a base pair at a distance <cutoff from a protein atom)
CRP
PurR
IHF
TrpR
Specificity-determining positions
in the LacI family
• Training set: 459 sequences,
average length: 338 amino acids,
85 specificity groups
– 44 SDPs
10 residues contact NPF (analog of
the effector)
7 residues in the effector contact zone
(5Ǻ<dmin<10Ǻ)
6 residues in the intersubunit
contacts
5 residues in the intersubunit
contact zone (5Ǻ<dmin<10Ǻ)
7 residues contact the operator
sequence
6 residues in the operator contact
zone (5Ǻ<dmin<10Ǻ)
LacI from E.coli
CRP/FNR family of regulators
TGTCGGCnnGCCGACA
CooA
D es u lfo vib rio
TTGTGAnnnnnnTCACAA
FNR
G am m a
TTGATnnnnATCAA
HcpR
D es u lfo vib rio
TTGTgAnnnnnnTcACAA
Correlation between contacting
nucleotides and amino acid residues
•
•
•
•
DD
DV
EC
YP
VC
DD
DV
EC
YP
VC
CooA in Desulfovibrio spp.
CRP in Gamma-proteobacteria
HcpR in Desulfovibrio spp.
FNR in Gamma-proteobacteria
COOA
COOA
CRP
CRP
CRP
HCPR
HCPR
FNR
FNR
FNR
ALTTEQLSLHMGATRQTVSTLLNNLVR
ELTMEQLAGLVGTTRQTASTLLNDMIR
KITRQEIGQIVGCSRETVGRILKMLED
KXTRQEIGQIVGCSRETVGRILKMLED
KITRQEIGQIVGCSRETVGRILKMLEE
DVSKSLLAGVLGTARETLSRALAKLVE
DVTKGLLAGLLGTARETLSRCLSRMVE
TMTRGDIGNYLGLTVETISRLLGRFQK
TMTRGDIGNYLGLTVETISRLLGRFQK
TMTRGDIGNYLGLTVETISRLLGRFQK
Contacting residues:
REnnnR
TG: 1st arginine
GA: glutamate and 2nd
arginine
TGTCGGCnnGCCGACA
TTGTGAnnnnnnTCACAA
TTGTgAnnnnnnTcACAA
TTGATnnnnATCAA
The
correlation
holds for
other
factors in
the family
Open problems
• Model the evolution of regulatory systems (a catalog of
elementary events, estimates of probabilities)
–
–
–
–
–
–
–
Birth of a binding site; what are the mechanisms?
Loss of a binding site
Duplication of a regulated gene and/or a regulator
Horizontal transfer of a regulated gene and/or a regulator
Loss of a regulated gene and/or a regulator
Change of specificity
General properties?
• Distribution of TF family and regulon sizes
• Stable cores and flexible margins of functional systems (in terms of gene
presence and regulation)
• Co-evolution of TFs and DNA sites:
– “Neutral” model for the evolution of binding sites (with invariant functional
pressure from the bound protein)
– How do the motifs evolve? What is the driving force – changes in TFs?
– TF-family, position-specific protein-DNA recognition code?
All that needs to take into account the incompleteness and noise in
the data
RNA regulatory systems
• Riboswitches: regulation by formation of
alternative structures dependent on binding of
small molecules
• T-boxes: regulation by formation of alternative
structures dependent on binding of uncharged
tRNA
• Highly conserved
(sequence, secondary structure)
=> easy to recognize
• Large
=> phylogenetic trees, duplications etc.
Systematic
analysis of Tboxes
(very preliminary
results)
•
T-boxes: the mechanism
(Grundy & Henkin)
Partial alignment of predicted T-boxes
TGG:
AminoacyltRNA
synthetases
Amino acid
biosynthetic
genes
Amino acid
transporters
Terminator(underlined)
===========> <===========
T-box A n tite rm in a to r
==> ===>
<===<==
SA
DHA
ST
CA
DF
PN
MN
DF
HD
DF
ZC
BQ
MN
MN
ST
serS
tyrZ
trpS
aspS
valS
thrS
ileS
leuS
argS
proS
lysS
metS
pheS
glyQ
alaS
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
26
47
37
39
41
30
89
28
41
33
46
55
14
14
20
CGTTA
CGTTA
CCTTA
CGTTA
CGTTA
CGTTA
CGTTA
AGCTA
CGTTA
CGTTA
CGTTA
CGTTA
AATTA
AGCTA
AATTA
51
65
61
34
77
38
68
29
27
30
63
66
20
23
18
A A A T A G G G T G G C A A C G C G T A G A C - - - - - - - - - - - - C A C G T C C C T T G T A G G G A T G T G G T C TT T T T T T A
A G G T A A G G T G G T A A C A C G G G A G C A - - - - - - - T A C T C T C G T C C T T C T G G C A A T G A A G G A C G G G A G TT T T T T G T T T T
AATTGAGGTGGTACCGCGTATTACTT----GTAATAACGCCCTCACGTTTTAATAGCGTGGGGACTTTTTGCTAT
A T A A A G G A T G G C A C C G T G A A A A - - - - - - - - - - G C C T T C A C T C C T T A C T G G A G T G G A G G C T T T T TT T A T T T T A A A T A A A
A A T T A A G G T G G T A A C G C G A G C - - - - - - - - - - - -T T T T C G T C C T T T T T A A A G A G G A T G A A G A G C T CT T T T T T A T T T C T
A A T G A A G G T G G A A C C A C G T T G - - - - - - - - - - - - -C G A C G T C C T T T C G A G G A T G T C G C A T T T T T T T A T T A G
A A T T A A G G T G G T A C C A C G A G C - - - - - - - - - - - - -T T T C G T C C T T T G A T G A A A G T T C T T T T T T A T T G A T
AATTAGGGTGGTACCGCGAAGATT-------TATCCTCGTCCCTAAACGTAAGTTTAGTGACGAGGATTTTTTATTTTCA
A A C G A G A G T G G T A C C G C G G G T A A - - - - - - - - - A A G C T C G C C T C T T T T T A G A A G A G G C G G G T T T T TT A T T T T
A A C T A G A G T G G T A C C G C G G A A A T - - - - - T A A A C C T T T C G T C T C T A T A C T T G T A T A G A G A T G A G A G G T T TT T T A T A T T T T C A G G
A A C T G A G G T G G T A C C G C G A A G C T A A - - - - - C A A C T C T C G T C C T C A A G A T G A A T A A T C T T G G G G G T G G G A G T TT T T T T G T T G C A
A A A T A A G G T G G T A C C G C G A C T G T T T A - - - T A C A G C C C C G C C C T T A T C T T T T T T A G A T A A G G G C G G G G C TT T T T A T A T T T A A
A A A A C G G A T G G T A C C G C G T G T C - - - - - - - - - - - - -A A C G C T C C G C T T A A G G A G T T T T G G C A C T T T T T T T G T T T T
A A T T A G G G T G G A A C C G C G T T T - - - - - - - - - - - -C A A A C G C C C C T A T G T C A G T T G G C A T G G G A G T G A T T G A G C G T G G C T C T T T T
A A T A G A G G T G G T A C C G C G G T T - - - - - - - - - - - - - -T T C G C C C T C T G T G A G A T G G A C T T G T T T T G T A T G G A G G A C T A T T T G A A A
SA
BS
CA
BQ
BS
SA
MN
DHA
HD
BQ
EF
trpE
ilvB
ilvC
asnA
proB
cysE
hisC
pheA
serA
phhA
yxjH
->
->
->
->
->
->
->
->
->
->
->
32
50
40
51
33
33
46
41
42
51
40
AATTA
CGTTA
CGTTA
CGTTA
CGTTA
CATTA
CGTTA
CGTTA
cgtta
CGTTA
CGTTA
4
47
14
62
30
62
50
50
57
34
51
A A C T A A G G T G G C A C C A C G G T A - - - - - - - - - - - - -A C G C G T C C T T A C A G G T A T A T G C G T T A T G T G G T G T C T T T T T
AACAAGGGTGGTACCGCGGAAAGAAA---AGCCTTTTCGCCCCTTTTAGCTATCGCAGTTACTGCGCGGCTGATTGT
AATTTGGGTGGTACCGCGCGACCAAA-----AATTCTCGCCCCAAGCAGGGAATTTTGGCCGTTTTTTTATATAAATAAAT
AATTTGGGTGGTACCGCGGAACC-----AAAGCCTTTCGTCCCAGTTTTTTGGGAAAGAAGGGCTTTTTTTGTTGGCTT
AATCAAGGTGGTACCACGGAAAC--------CCATTTCGTCCTTATGAATCAGGATGAAATGGGTTTTTTTATTGTAGA
A T T C A G A G T G G A A C C G T G C G G - - - - - - - - - - - - -A A G C G C C T C T A A C A A T A C A A T T T G T A T G T T A G T G G T G C T T T T T T G
A A T G A A G G T G G A A C C A C G T G T G T - - - - - - - - - G T C A G C G T C C T T G C A A G T T T T T T G C A A G G G C G C TT T T T T G A A T A G T
AAAAAGGGTGGTACCGCGTGAC---------TTAACTCGTCCCTTATTTGGGGGTGAGGTAAGTCTTTTTTTATTTA
AATGAGGGTGGCACCGCGGTATG-------AACCTTCCGCCCCTCACGACAGTCGTCGTGTGGGCAGAAGGTTTTTTTACTAT
A A A T A G G G T G G T A C C G C G A T T C - - - - - - - - - - - -T T T C G C C C C T A T C G G A T T T T C C G A T A G G G G C T T T T T C T A T T T C
AAAAAAGGTGGTACCGCGATAA-----------TAATCGCCCTTTTACTAGTTACGGCTAGTAAAAGGGCGTTTTTTTATAAA
CA
yckK -> 38
DF
yqiX -> 41
HD
BH0807->74
EF
yheL ->
8
BQ
ykbA -> 46
BQ
sdt2 -> 40
EF
yusC -> 42
CA
yhaG -> 48
BQ
brnQ -> 44
REF01723 -> 44
BS
yvbW -> 56
CGTTA
CCTTA
TGTTA
AATTA
CGTTA
CGTTA
CGTTA
CGTTA
CGTTA
CGTTA
CGTTA
57
30
56
33
45
56
60
51
66
55
32
AATTAGAGTGGTACCGTGGAATT-------CAACTTCTGCCTCTAACTATGAGGATAGAAGTTTTTTGTTTTTAT
AAAAAGAGTGGTAACGCGGATAT----------AATTCGTCTCTTAGCTGTAAAGCTAAGGGACTTTTTTGATTTA
A A C T G G G G T G G C A C C A C G A C A A G - - - - - - - - - - T G A T C G T C C C C A A G A C T T T T A T C A G T C T T G G G G A C GT T T T T T T G T T C A T
A A T T A A G G T G G T A C C G C G G A G A - - - - - - - - - - - G A T T C G T C C T T A T T C T T T A A G G A T G A A T C T C T CT T T T T A T G T A G C
AACAAGGGTGGAACCACGAATAT--------AACACTCGTCCCTTTTTTAGGGAGGAGTGTTTTTTTATT
A A T T G A G G T G G T A C C A C G G T A T T A A C A T T A C A T A T A T C G T C C T C T A C A T G C A T A T T T G C G T G T A G G G G A CT T T T T T A T T T T C
A A T T A A G G T G G T A T C A C G A A A T G A - - - - - C A A A C T T T C G T C C T T T T T G C T G T A A T A G C A A A A G G A T G G A A G T T TT T T T G T T T
AATTTAGGTGGTACCGCGGAAGT---------ATCTCCGTCCTAATTAATAAGATTAGGGCGGAGTTTTTTATTTGC
AATTAGGGTGGTATCGCGGGTAAA------TATAACTCGTCCCTTTCTTTAGGGACGAGTTTTTTGTGTTCTT
AATTGAGGTGGCACCACGAATGC----------GATTCGTCCTCTTGGCTCACAGCCAAGAGGCTTTTTTGTTTTTTTAATA
A A C A A G A G T G G T A C C G C G G T C A G C - - C G A A G G C T C G T C G T C T C T T T A T C T A T T A G A T T A G G T A G G A G A C G G C G G G C T T TT T T
… continued (in the 5’ direction)
s p e c ifie r h a irp in
===>
AminoacyltRNA
synthetases
Amino acid
biosynthetic
genes
Amino acid
transporters
==>
===>
<=== <==
anti-anti
(specifier)
codon
S C <===
SA
DHA
ST
CA
DF
PN
MN
DF
HD
DF
ZC
BQ
MN
MN
ST
SERS
tyrZ
trpS
ASPS
VALS
THRS
ileS
leuS
ARGS
proS
lysS
metS
pheS
glyQ
alaS
SER
Tyr
Trp
ASP
VAL
THR
Ile
Leu
ARG
Pro
Lys
Met
Phe
Gly
Ala
---GTAGGACAAGTA
----AAGAACAAGTA
---ATTAGAAGAGTA
-----GAGAAAAGTA
-GAAGAAGAGGAGTA
----AGAGACAAGTC
----CAAAAACACAA
----CTAGAGCAGTA
-----TGGGAGAGTA
---AAAGAAATAGTA
---AAGAGAAGAGTA
---AAAGGAAAAGTA
----TGAGATTAGTA
---AGAAAGAGAGTT
-AGTTAAGAATTGTT
19
18
16
18
16
18
17
19
20
18
19
19
18
15
17
AGAGAGCTTGTGGTT---AGTGTGAACAAG--AGAAAGTTGCCGGCT---GATGAGAGGCGCTT
AGAGAGTTAGTGGTT---GGTGCAAGCTAACAGCGAATTGGGAAAT---GGTGTGAGCCCAAAGAGAGGAAAATTCACTGGCTGTAAGATTTTC
AGAGAGTGCGTGGTT---GCTGGAAACGCATAGCGAATAGGTGAT----GGTGTAAGACCTATT
AGAGGAAGTGGAA-----GGTGAGAACTAATATT
AGCGAGTCGGGAT-----GGTGGGAGCCGATAGAGAGAAAACGGT----GGTGAGAGTTTTC-AGAGAGCTCTGGTA----GCTGAGAAAGAGC-AGAGAGCTTCGGTA----GCTGAGAAGAAGC-AGGGAATGCGGGGCGTG-ACTGGAAACCCGCAGCGAACCTGAGAG----AGTGTAAGTCAGGT
AGAAAAGTGACGGTT---GCTGCGAGTCATT-
15
18
12
15
17
14
18
10
14
14
15
14
16
14
17
GAA--TCTACCTACTT
GAA--TACCTCTTTGA
GAAA-TGGACTAATGA
GAAA-GACATCTCGGA
GAAT-GTAGCTTTGGA
GAT--ACTACTCTTGA
-----ATCATTTTGTT
GAA--CTTACTAGATT
GAAA-CGCACCCATGA
GAA--CCTGTCTTTTA
GAAAAAAGACTTGGAG
GAACAATGGCCTTTGA
GAA--TTCACTCAGAA
GACT-GGCACTTTCTC
-----GCTACTTAACT
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
SA
BS
CA
BQ
BS
SA
MN
DHA
HD
BQ
EF
trpE
ilvB
ilvC
asnA
proB
cysE
hisC
pheA
serA
phhA
yxjH
Trp
Leu
Val
Asn
Pro
Cys
His
Phe
Ser
Tyr
Met
TCTAAAGAAATAGTA
---TGAGGATAAGTA
-----AGGAAGAGTA
--AGGACGAGTAGTA
-----AGGATTAGTA
--CGAAGGATTAGTA
-----AGAGAAAAAA
-----AAAGAGAGCA
----GAAGATGAGGA
AGAATCGCAGTAGTA
-----TAGGAAAGTA
22
20
17
15
18
18
16
19
17
17
17
AGAAAGCTAATGGGT---GATGGGAATTAGC-AGAGAACCGGGTTA----GCTGAGAACCGG--AGAGAGTGAGATACT---GGTGGGAACTCAT-AGCGAGTCAGGGGT----GGTGTGAGCCTGA-AGAGAGCAAAATGAACC-GCTGAAACATTTTGC
AGAGAGTGTACGGTT---GCTGTGAGTACA--AGAGAGTATGGGAA----GCTGAAAACATAC-AGGGAACTAAAGTCGGAGACTGAAAGCTTTAGT
AGAGAGCTGGTGGTT---GCTGTGAACCAGCTAGAGAGCTAATGGTC---GGTGGAAATTGGC-AGAGAGACTTTGGTT---GGTGAAAAAAGTT--
14
16
13
15
15
14
15
14
18
14
13
GAAT-TGGACTTTGGA
GAA--CTCGCCTCAGA
GAAG-GTAGCCTTTGA
GAAG-AACCTCCTGGA
GAA--CCTGCCTTGGA
GAA--TGCACCTTCGT
-----CACATTCTTGA
GAGA-TTCACTCTGGA
-----AGCCCTTCTGA
GAAT-TACAATTCTGG
GAAAAATGGCCTAGGA
->
->
->
->
->
->
->
->
->
->
->
Cys
Arg
Lys
Tyr
Thr
Trp
Met
Trp
Ile
His
Leu
----AAGAACCAGTA
-----AGAGAAAGTA
----AGAGAAGAGTA
-TTATTAGCCCAGTA
--GAGGACACGATCA
---GCAAGAAGAGTA
----AAAGAAGAGTA
----AAGGAAGAGTA
----GAGAACGAGTA
--TTAGGACATAGTA
-----GGGAGCAGTA
17
16
19
19
16
18
18
18
19
18
18
AGAGAAAAATCTCCAAG-GCTGAAAGGGATTTT
AGCGAGTTAGGGGTT---GGTGTAAGCCTAGCAGAAAGCCTGTAGTT---GCTGAGAACGGGT-AGAAAGTCGATGGTT---GCTGCGAATCGAT-AGAGAGGGAAGCCTTTG-GCTGTGAGCTTCCTAGAGAGCTGGGGGAA---GGTGTGAGCCCGGTAGAGAGCCCTGTTT----GCTGAGAATGGG--AGAGAGCTGAGGGT----GGTGTGATCTCAGTAGAGAGTTGGCGATTT--GCTGAAAGCCAAC-AGAGACTTTTTCATTG--GCTGAAAGAAAAAGAGAGAGCTGCGGGGT---GGTGCGACGCAGC--
15
14
14
13
14
15
16
15
15
17
13
GAA--TGCATCTTTGA
GAAG-AGAGCTCTGGA
GAAGCAAGACTCTGAG
GAAT-TACACTAATAA
GATT-ACCACCTCTGA
GAA--TGGGCTTGCGA
GAAG-ATGGTCTTTGA
GAA--TGGACCTTTTA
GAAA-ATCATCTCCGA
-----CACACCTAAAA
GAA--CTCGCCCGGGA
->
->
->
->
->
->
->
->
->
->
->
CA
yckK
DF
yqiX
HD
BH0807
EF
yheL
BQ
ykbA
BQ
sdt2
EF
yusC
CA
yhaG
BQ
brnQ
REF01723
BS
yvbW
~800 T-boxes in ~90 bacteria
• Firmicutes
–
–
–
–
aa-tRNA synthetases
enzymes
transporters
all amino acids excluding glutamine, glutamate, lysine
• Actinobacteria (regulation of translation – predicted)
– branched chain (ileS)
– aromatic (Atopobium minutum)
• Delta-proteobacteria
– branched chain (leu – enzymes)
• Thermus/Deinococcus group (aa-tRNA synthases)
– branched chain (ileS, valS)
– glycine
• Chloroflexi, Dictyoglomi
– aromatic (trp – enzymes)
– branched chain (ileS)
– threonine
Recent duplications and bursts:
ARG-T-box in Clostridium difficile
C AC _AR G S
LR_A RG S
C PE_A RG S
C B_A RG S
C BE_AR G S
L a c to b a c illa le s
C TC _AR G S
LP_A R G S
LM E_AR G S
C lo s trid ia le s
a rg S
a rg S
LJ_AR G S
C D F _ Y Q IX Y Z
LG A _AR G S
R DF02391
PPE_A RG S
LSA _A RG S
С DF_AR G C
B C_A RG S2
EF_A R G S
B H_A RG S
C DF_AR G H
B a c illa le s
y q iX Y Z
a rg S
NEW
: A R G -s p e c ific T-b o x re g u la to ry s ite
R D F02391
NEW
a m in o a c y l-tR N A s y n th e ta s e
b io s y n th e tic g e n e s
a m in o a c id tra n s p o rte rs
p re d ic te d
a m in o a c id
tra n s p o rte rs
C lo s trid iu m
d iffic ile
a rg C J B D F
a rg H
o th e rs
a rg G
a m in o a c id
b io s y n th e tic
genes
G ra m + b a c te ria :
C lo s trid iu m
d iffic ile :
A h rC re g u la to ry p ro te in
(n e g a tiv e re g u la tio n o f a rg in in e m e ta b o lis m
p o s itiv e re g u la tio n o f a rg in in e c a ta b o lis m )
B in d in g to 5 ’ U T R g e n e re g io n
re g u la tio n o f g e n e e x p re s s io n
5’
...
A h rC s ite
A h rC is lo s t
E x pa n s io n o f T-b o x re g u lo n
re g u la tio n o f e x p re s s io n o f
a rg in in e b io s y n th e tic
a n d tra n s p o rt g e n e s b y
T-b o x a n tite rm in a tio n
O th e r c lo s trid ia s p p .
(C A , C T C , C T H , C P E , C B , C P E )
y q iX Y Z
y q iX Y Z
a rg C
a rg C
a rg H
a rg H
a rg G
: A h rC b in d in g s ite
: A R G -s p e c ific T-b o x re g u la to ry s ite
B a c illa le s
O th e r G ra m +
C H _ H IS S
aspS
h is S
C T H _ H IS S
L a c to b a c illa le s
A S P \A S N
h is o p e ro n
D R E _ H IS S
ASN/ASP/HIS
T-boxes:
Duplications
and changes
in specificity
H IS
T T E _ H IS S
h is X Y Z
P L _ H IS S
NEW
H IS
B L _ H IS S
B E _ H IS S
B S _ H IS S
B C _ H IS S
L R E _ H IS X Y Z
L S A _ H IS X Y Z
O O E _ H IS X Y Z S G O _ H IS C
O B _ H IS S
B H _ H IS S
B C L _ H IS S
L
E X _ H IS S
H
P_
IS
XY
S M U _ H IS C
Z
E F _ H IS X Y Z
L M E _ H IS X Y Z
C D F _ H IS Z X
L M O _ H IS X Y Z
E F _ H IS S
E F _ H IS X Y Z
L M E _ H IS (Z \ G )
L L _ H IS C
L P _ H IS Z
C lo s trid ia le s
C PE_A SN S2
C DF_A SN A
C B_A SN S2
C DF_A SN S2
C TC_A SN A
asnS
L C A _ H IS Z
C B_A SN S3
C AC _ASNS32
asnA
B C_A SN S2
B C_A SN A
C B E_A SN S2
P. p e n to s a c e u s
C TC_A SN S2
asnS
C PE_A SN A
ASP
P P E _ H IS X Y Z
h is X Y Z
H IS
PPE_A SN S
aspS
EX_ASNA
L C A _ H IS S
L a c to b a c illa le s
LB _A SN A
LB _A SN S2
L J _ H IS S
LP_A SN A
asnS
PPE_A SN A
ASN
L B _ H IS S
asnA
LR E_ASPS
LJ_ASNA
L P _ H IS S P P E _ H IS S
L R E _ H IS S
ASN
L . jo h n s o n ii
L J _ g ln Q H M P
LD _A SN A
asnA
ASN
g ln Q H M P
ASP
SG _A SPS2
SM U _ASPS2
Blow-up
L C A _ H IS S
L J _ H IS S
P P E _ H IS X Y Z
PPE_ASNS2
L B _ H IS S
LR E_A SPS
LB_ASNA
L P _ H IS S P P E _ H IS S
PPE_ASNA
LP_A SN A
L R E _ H IS S
ASN
H IS
AAC
CAC
P. p e n to s a c e u s
asnS
ASP
LJ_A SN A
h is X Y Z
LJ_G LN Q H M P
ASP
H IS
ASN
ASP
GAC
CAC
AAC
GAC
L a c to b a c illa le s
L a c to b a c illa le s
h is S
asnA
aspS
ASN
ASP
L . re u te ri
L . jo h n s o n ii
aspS
h is S
H IS
LD_ASNA
ASP
d is ru p tio n o f h is S -a s p S o p e ro n
m u ta tio n o f re g u la to ry c o d o n
asnA
ASN
g ln Q H M P
ASP
H IS
Branched-chain amino acids:
duplications and changes in specificity
F irm ic u te s
le u S
LEU
LEU
B a c illa le s
P L _ ILV B
Ilv o p e ro n
B H _ ILV B
C . th e rm o c e llu m
LEU
B . c e re u s
148_0001
.......
YOCR3
LEU
LEU
δ -p ro te o b a c te ria
C lo s trid iu m d iffic ile
D e s u lfito b a c te riu m
h a fn ie n s e
B S _ ILV B
C TH _148_0001
B E _ ILV B
B L _ ILV B
O c e a n o b a c illu s
ih e y e n s is
C DF_LEU A
O B1271
B . S u b tilis
B . lic h e n ifo rm is
C PE_LEU S
B C L _ ILV B
A_
DH
S y n tro p h o m o n a s
w o lfe i
LEU
C TC _LEU S
G SU _LEU A
B S_LEU S
029_0008
LEU
C BE _LE US
LE
yvbW
.......
UA
le u o p e ro n
LEU
D F_LEU S
TTE _LE US
D T H _ ILV B
C B_LEU S
C A_LEU S
B L_LEU S
LEU
L P _ B R N Q 1 _ ile
B CL_LEU S
B E_LEU S
EX_LE US
L C R _ IL E S
B H_LEU S
SW
LSA _LE US
D RE_070_0004
_L
L L _ IL E S
F irm ic u te s
D AC _LEU A
LE
LP_LEU S
C H_LEU S
S B S_YVBW
B L_YVB W
O_
EU
LM
O
B C_LEU S
US
O B _LEU S
ile S
SW O _029_0008
F irm ic u te s
IL E
O B _ ILV B
LJ_LEU S
v a lS
S P Y _ IL E S
LG A _LE US
VA L
S Z _ IL E S
S E Q _ IL E S
EF_LEU S
LB_LEU S
BC_YOCR3
S T H _ IL E S
C a rb o x y d o th e rm u s
h y d ro g e n o fo rm a n s
B . c e re u s
S U B _ IL E S
PPE_LEUS
O B 1271
C . a c e to b u ty lic u m
S M I_ IL E S
S P _ IL E S
S O B _ IL E S
O O E_LEUS
LP3666
C H _ VA L S
Ilv B N
L M O _ VA L S
IL E
C . d iffic ile
IL E
L B _ IL E S
B H _ VA L S
.......
Ilv o p e ro n 2
L D _ IL E S S A _ IL E S
S A _ VA L S
B E _ VA L S
IL E
.......
L J _ IL E S
C A _ ILV C
C T H _ VA L S
IL E
L M E _ IL E S
S M U _ IL E S E F _ IL E S
VA L
H e lio b a c illu s m o b ilis
Ilv o p e ro n
S G _ IL E S
S A G _ IL E S
ilv C
Ilv o p e ro n 2
Ilv C B
.......
O O E_LP3666
L R E _ PA N E
D G _ VA L S
E X _ VA L S
B C L _ VA L S
B C _ VA L S
LCA _B R NQ 1_val
L J _ B R N Q _ ile
H M O _ VA L S
L C A _ B R N Q 2 _ ile
L R E _ B R N Q _ ile
D R E _ VA L S
L S A _ IL E S
L a c to b a c illa c e a e
C lo s trid ia c e a e
B a c illu s c e re u s
B S _ VA L S
T T E _ IL E S
T T E _ VA L S
C B _ VA L S
LJ_O PP
_V
A
LS
B L _ VA L S
E
C P E _ IL E S
P
D F _ VA L S
C
D H A _ VA L S
C B _ IL E S
C T C _ VA L S
L J _ VA L S
P P E _ VA L S
E F _ VA L S
LS
VA
A_
LS
L L _ VA L S
VA L
b rn Q
C B E _ VA L S
P P E _ IL E S
C A C _ VA L S
L C R _ VA L S
L a c to b a c illu s c a s e i
L a c to b a c illu s p la n ta ru m
b rn Q
C T C _ IL E S
L D _ VA L S
L M E _ VA L S
IL E
LP_B RN Q 2_val
L M O _ IL E S
IL E
VA L
D F _ IL E S
E X _ IL E S
BC_YBGE*
L R _ VA L S
O B _ IL E S
BC_YBGE
L P _ VA L S
LRE_3666_1
B C _ IL E S
IL E
C PE_B R NQ
C TC _B R NQ 2
L P _ IL E S
B CE_B RN Q 1
H M O _ ILV B
ATC
CTC
B S _ IL E S
B L _ IL E S
B C _ IL E S 2
VA L
IL E
C AC _B R NQ
C T H _ IL E S
LR_LEU S
G TC
T-b o x d u p lic a tio n a n d m u ta tio n
o f re g u la to ry c o d o n
B C L _ IL E S
B H _ IL E S
C TC _B R NQ 1
C D F _ ILV C
B C _ ILV B
L a c to b a c illa le s
lp 3 6 6 6
D H A _ IL E S
B
C H _ IL E S
IL E
E_
D e s u lfo to m a c u lu m re d u c e n s
IL
E
Ilv o p e ro n
S
O O E _ IL E S
L a c to b a c illu s jo h n s o n ii
opp
LRE_3666_2
D R E _ IL E S
CH_YBGE
IL E
LEU
H M O _ IL E S
IL E
D R E _ ILV D *_ le u
L a c to b a c illu s re u te ri
panE
IL E
D R E _ ILV D _ ile
R e c e n t T-b o x d u p lic a tio n a n d m u ta tio n
o f re g u la to ry c o d o n
IL E
CTC
ATC
LEU
AT
C
CTC
Blow-up
transporter:
ATC
GTC
dual
regulation of
common
enzymes:
ATC
CTC
PEP
Same enzymes
– different
regulators
(common part
of the aromatic
amino acids
biosynthesis
pathway)
E 4P
a ro A
a ro :
R e g u la te d b y T Y R (B C )
R e g u la te d b y P H E (S W O , D R E , H M O , C H , M T H , C T H )
R e g u la te d b y T R P (D E , D E H )
DAHP
a ro B
a ro C
a ro D
S H IK IM AT E
a ro I
a ro E
a ro F
pa b A
pa b B
ADC
C H O R IS M AT E
a ro A
pheB
a ro H
trp E
trp G
A N T H R A N IL AT E
ty rA
h is C
aspB
trp D C F B A
kinurenine
pathw ay
TRP
yhaG
TRP
TYR
PH E
phhA
T R P trp X Y Z
T R P \P H E y o c R fa m ily
TYR yheL
F O L AT E
cf. E.coli:
AroF,G,H:
feedback
inhibition by
TRP, TYR,
PHE;
transcriptional
regulation by
TrpR, TyrR
S-box (SAM riboswitch)
a
a
D
c
C
H
g
G
A
C
G
R
U
G
C
YA
c
gg
CCCD
c AG G G A
P3
N
yG A
G r
a
P4
A
N
r Cc N
y GgN
g
A
P2
G
a
Nc
U
A
u
P1
U
C
u
5'
N
G a
A
U
R
A
G
a
3'
b a s e s te m
u
c
c
N
P5
g
c a r
y
r gu
Grundy and
Henkin, 1998
S-box riboswitch: regulator of methionine biosynthesis
Firmicutes
Loss of
S-boxes
Lactobacillales:
Met-T-box
Streptotoccales:
MtaR (transcription factor);
SAM-III riboswitch (metK)
(the Henkin group)
Bacillales:
S-box
Clostridiales:
S-box
proteobacteria
Other genomes with S-boxes: the Zoo
•
Petrotoga
•
actinobacteria
(Streptomyces, Thermobifida)
•
Chlorobium, Chloroflexus, Cytophaga
•
Fusobacterium
•
Deinococcus
Xanthomonas:
S-box
E.coli:
TFs
alphas:
SAM-II
Geobacter:
S-box
Need more genomes
Acknowledgements
• Andrei A. Mironov (algorithms and software)
• Alexandra B. Rakhmaninova (SDPs)
•
•
•
•
•
Olga Kalinina (SDPs/LacI)
Olga Laikova (LacI, sugars)
Dmitry Ravcheev (FruR)
Dmitry Rodionov (now at Burnham Institute) (NrdR, iron)
Alexei Vitreschak (RNA)
• Leonid Mirny, MIT (protein/DNA contacts, SDPs)
• Andy Johnston, University of East Anglia (iron)
•
•
•
•
Howard Hughes Medical Institute
Russian Fund of Basic Research
Russian Academy of Sciences, program “Molecular and Cellular Biology”
INTAS
Документ
Категория
Искусство, дизайн
Просмотров
39
Размер файла
7 521 Кб
Теги
1/--страниц
Пожаловаться на содержимое документа