close

Вход

Забыли?

вход по аккаунту

?

Coffee Shop

код для вставкиСкачать
Coffee Shop
F91921025 й»ѓд»Ѓжљђ
F92921029 戴志華
F92921041 ж–ЅйЂёе„Є
R93921142 еђіж–јиЉі
R94921035 林與絜
Menu
Coffee Shop Opening
Why coffee shop?
Three Flavors
COFFEE
T-Coffee
3DCoffee
Remarks
Recipes
2005/12/14
2
Multiple Sequence Alignment
Multiple sequence alignment is one of the
most important tool for analyzing biological
sequence.
structure prediction
phylogenetic analysis
function prediction
polymerase chain reaction (PCR) primer design.
2005/12/14
3
Multiple Sequence Alignment
However, the accuracy is not good enough.
difficult to evaluate the quality of a multiple
alignment
algorithmically very hard to produce the optimal
alignment
In order to increase the accuracy of multiple
sequence alignment, we opened a coffee
shop to share three kinds of coffee.
2005/12/14
4
Before (drinking) COFFEE
For comparative genomics, and why?
Understanding the process of evolution at gross
level and local level
Translate DNA sequence data into proteins of
known function
Meaning of conservative regions
E. coli, C. elegans, Drosophila, Human…
What’s their relationship?
2005/12/14
5
集胞藻屬
(и—Ќз¶ и—»йЎћ)
大腸桿菌
з·љиџІ
жћњи …
дєєйЎћ
й…µжЇЌиЏЊ
�拉伯芥
Classification for
genes of different
function
Adapted from “Principles of genome analysis and genomics” Fig. 7.5 (p.129),
by S. B. Primrose and R. M. Twyman, 3rd edition
2005/12/14
6
Comparative genomics vs.
multiple sequence alignment
Alignment в†’
conservative region
Conservative region
в†’ gene location
Evolution evidence
http://www.public.iastate.edu/~semrich/compgen/
2005/12/14
7
A: human chromosome I
B: human chromosome II
C: human chromosome III
Chromosome III region 125128 Mb was magnified 120X
The alignment between the
chromosomes
http://gchelpdesk.ualberta.ca/news/02jun05/cbhd_news_02jun05.php
2005/12/14
8
Our Flavors
COFFEE: A New Objective Function For Multiple
Sequence Alignmnent.
C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14
(5) 407-422,1998
T-Coffee: A novel method for multiple sequence
alignments.
C.Notredame, D. Higgins, J. Heringa,Journal of Molecular
Biology,Vol 302, pp205-217,2000
3DCoffee: Combining Protein Sequences and
Structures within Multiple Sequence Alignments.
O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins,
C. Notredame. Journal of Molecular Biology,Vol 340, pp385395,2004
2005/12/14
9
COFFEE
COFFEE
An objective function for multiple sequence
alignments
CГ©dirc Notredame, Liisa Holm and Desmond G.
Higgins
SAGA with COFFEE score
2005/12/14
11
Introduction
COFFEE - Consistency based Objective Function
For alignmEnt Evaluation
An objective function, COFFEE score, is proposed
to measure the quality of multiple sequence
alignments
Optimize the COFFEE score of a multiple sequence
alignment with the genetic algorithm package
SAGA (Sequence Alignment Genetic Algorithm)
2005/12/14
12
Overview of their method
Given
a set of sequences to be aligned
a library containing all pairwise alignments between
them,
the COFFEE score reflects the level of consistency
between a multiple sequence alignment and the
library.
2005/12/14
13
COFFEE score
N -1
COFFEE score =
N
 W
i =1 j =i +1
N -1 N
i, j
 W
i =1 j =i +1
Г— SCORE ( A )
i, j
i, j
Г— LEN ( A )
i, j
with :
SCORE ( Ai , j ) = number of aligned pairs of residues
that are shared between Ai , j and the library
2005/12/14
14
COFFEE score
2005/12/14
15
Using COFFEE in SAGA
Iteratively, a multiple sequence alignment with higher
COFFEE score is generated by SAGA until the COFFEE
score cannot be improved
SAGA follows the general principle of genetic algorithm.
The notion of survival of the fittest
SAGA iteratively does:
Evaluate the score of the alignments
The fitter an alignment, the more likely it is to survive and produce
an offspring
Alignments survived may be kept unchanged, randomly modified
(mutation), or combined with another alignment (cross-over)
2005/12/14
16
Results
COFFEE function
COFFEE score &
alignment accuracy
SAGA
з­‰дё‹жњѓзњ‹е€°дёЂе †иЎЁж ј
很枯燥,所以請忍耐…
Optimization of
COFFEE function
Effect of optimization
Comparison:
COFFEE and others
Others: PRRP, Clustal W,
PILEUP, SAGA MSA, SAM
2005/12/14
17
Optimization
COFFEE function was optimized by SAGA
Using SAGA alignments
Using ClustalW alignments
2005/12/14
18
Comparison
Multiple alignments of SAGA COFFEE and
5 other methods
PRRP, ClustalW, PILEUP, SAGA MSA, SAM
Performance of SAGA and ClustalW
Comparison of other 5 methods
即使SAGA-COFFEE不�最好的結果 →跟最
еҐЅзљ„д№џз›ёеЋ»дёЌйЃ Identity level lower в†’ better SAGACOFFEE results
2005/12/14
19
2005/12/14
20
Correctly aligned ratio
Better than PRRP
Worse than PRRP
Ratio of (E+H) residue correctly aligned
Better of worse alignment? SAGA-COFFEE &
others
NO such thing as an ideal method
2005/12/14
21
COFFEE score and alignment accuracy
E+H accuracy (%)
E+H accuracy (%)
r=0.65
>85%зљ„sequenceйѓЅеЏЇй ђжё¬
з”±coffee scoreеЋ»й ђжё¬
(error
alignmentзљ„жє–зўєеє¦
~ В±10%)
Average identity дё¦жІ’жњ‰иѕ¦
жі•й ђжё¬alignmentзљ„жє–зўєеє¦
Average identity (%)
2005/12/14
Coffee sequence score
22
Correlation between score and accuracy
Higher score в†’ higher accuracy
SAGA produces more high-score sequence than
ClustalW
2005/12/14
23
Coffee Break ?
T-Coffee
T-Coffee
A novel method for multiple sequence
alignments
C.Notredame, D. Higgins, J. Heringa
ClustalW with extended library
2005/12/14
26
ClustalW
ClustalW is the core alignment stradegy of T-Coffee,
it follows the procedure below:
Pairwise Alignment: calculate distance matrix
Guide Tree
Unrooted Neighbor-Joining Tree
Rooted Neighbor-Joining Tree: guide tree with sequence
weights
Progressive Alignment: align following the guide
tree
2005/12/14
27
Calculate distance matrix
2005/12/14
28
Guide tree
Use Neighbor-Joining Method to build guide
tree from distance matrix.
First construct an unrooted Neighbor-Joining
tree, then convert it to a rooted NeighborJoining tree, the guide tree.
2005/12/14
29
Unrooted Neighbor-Joining Tree
2005/12/14
30
Rooted Neighbor-Joining Tree
2005/12/14
31
Progressive Alignment: align
following the guide tree
Seq1
Seq2
Alignment 1
Seq3
Seq5
Alignment 2
Alignment 3
2005/12/14
Seq4
Final alignment
32
Progressive-alignment strategy
Pros
Faster and saving spaces. (compared with
computing all possible multiple alignments)
Cons
May not find optimum solution.
Errors made in the rest alignments cannot be
rectified later as the rest of the sequences are
added in. “Once a gap, always a gap!”
T-Coffee is an attempt to minimize that effect!
2005/12/14
33
T-Coffee Algorithm
Generating a primary library of alignments
Derivetion of the primary library weights
Combination of the libraries
Extending the library
Progressive alignment strategy
2005/12/14
34
ClustalW Primary
Library (Global)
Lalign Primary
Library (Local)
Weighting
Primary Library
2005/12/14
35
Primary Library
2005/12/14
36
ClustalW Primary
Library (Global)
Lalign Primary
Library (Local)
Weighting
Primary Library
Extension
Extended Library
2005/12/14
37
Extended Library
Weight(A-C-B)
= min( Weigh(A-C), Weight(B-C) )
= min( 77, 100 ) = 77
A
Weight(A-D-B)
= min( Weight(A-D), Weight(B-D) )
= min( 100, 100 ) = 100
2005/12/14
38
Extended Library
SeqA: GARFIELD THE LAST FAT CAT
SeqB: GARFIELD THE
FAST CAT
SeqA: GARFIELD THE LAST FAT CAT
A
SeqB: GARFIELD THE
2005/12/14
39
FAST CAT
Extended Library
SeqA: GARFIELD THE LAST FAT CAT
SeqB: GARFIELD THE
FAST CAT
SeqA: GARFIELD THE LAST FAT CAT
A
SeqB: GARFIELD THE
2005/12/14
40
FAST CAT
ClustalW Primary
Library (Global)
Lalign Primary
Library (Local)
Weighting
Primary Library
Extension
Extended Library
Progressive Alignment
Multiple Alignment Information
2005/12/14
41
Progressive Assignment
2005/12/14
42
Complexity Analysis
complexity of the whole procedure:
O(N2L2) + O(N3L) + O(N3) + O(NL2)
O(N2L2): computation of the pair-wise library
O(N3L): computation of the extended pair-wise
library
O(N3): computation of the NJ tree
O(NL2): computation of the progressive alignment
N sequences that can be aligned in a multiple
alignment of length L
2005/12/14
43
Experiment
Implementation environment
Result 1: Effect of combining local and
global alignments without extension; effect
of the library extension
Result 2: compared with other multiple
sequence alignment methods
2005/12/14
44
Implementation environment
Programming language: ANSI C
Hardware: LINUX platform with Pentium II
processors (330 MHz).
Test case: BaliBase database of multiple
sequence alignment
2005/12/14
45
Result 1
Table 1: The effect of combining local and global alignments
Name
C
CE
L
LE
CL
global/local/extend
ClustalW pw /.../...
ClustalW pw/…/ex
.../Lalign pw/...
.../Lalign pw/ex
ClustalW pw/Lalign pw/..
CLE
ClustalW pw/Lalign pw /ex 80.6
2005/12/14
Cat1(81)
70.6
77.1
65.4
72.6
76.2
Cat2(23)
26.7
33.6
12.1
25.6
32.0
Cat3(4)
43.0
47.6
22.8
47.2
48.3
Cat4(12)
56.0
64.8
53.9
77.5
76.2
Cat5(11)
60.0
75.9
66.0
85.5
74.6
Total(141) Significance
58.9
7.8
66.3
17.7
52.0
7.8
64.2
16.3
66.5
12.1g
37.1
52.9
83.2
88.6
72.0
46
Result 2
Table 2: T-coffee compared with other multiple sequence alignment methods
Method
Cat1(81) Cat2(23) Cat3(4)
Cat4(12) Cat5(11) Total1(141) Total2(141) Significance
Dialign
ClustalW
Prrp
T-Coffee
71.0
78.5
78.6
80.6
74.7
65.7
51.1
83.2
2005/12/14
25.2
32.2
32.5
37.1
35.1
42.5
50.2
52.9
80.4
74.3
82.7
88.6
47
61.5
66.4
66.4
72.0
57.3
58.6
59.0
68.6
11.3
26.2
36.9
3DCoffee
3DCoffee
Combining protein sequences and structures
within multiple sequence alignments
O. O'Sullivan, K Suhre, C. Abergel, D.G.
Higgins, C. Notredame
T-Coffee with structure information
2005/12/14
49
3DCoffee
Structural information can help to improve
the quality of multiple sequence alignments
3DCoffee
Combines protein sequences and structures
Is based on T-Coffee version 2.00
Uses a mixture of pairwise sequence alignments
and pairwise structure comparison methods.
2005/12/14
50
3DCoffee
Use T-Coffee to compile
A primary library: a list of weighted pairs of
residues.
An extended library: usage the column
consistency relationship between all sequences
According to the structure information
Fugue, SAP, LSQman
2005/12/14
51
3DCoffee
Fugue – a threading method that aligns a
protein sequence with a 3D-structure
SAP – uses DP to compute a pairwise
alignment based on a non-rigid structure
superposition
LSQman – a rigid body structure
superposition package
2005/12/14
52
3DCoffee
Set the weight of new alignment as 100
which is the most score of primary library
Add the weighted alignments into the library
Carry out progressive alignment the same as
T-Coffee
2005/12/14
53
Remarks
COFFEE : An objective function for multiple
sequence alignments
SAGA with COFFEE score
T-Coffee : A novel method for multiple
sequence alignments
ClustalW with extended library
3DCoffee : Combining protein sequences and
structures within multiple sequence alignments
T-Coffee with structure information
2005/12/14
54
Recipes
CLUSTAL W: improving the sensitivity of progressive
multiple sequence alignment through sequence weighting,
position-specific gap penalties and weight matrix choice.
Julie D.Thompson, Desmond G.Higgins+ and Toby J.Gibson*. 1994
COFFEE: A New Objective Function For Multiple
Sequence Alignmnent.
C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) 407422,1998
T-Coffee: A novel method for multiple sequence alignments.
C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302,
pp205-217,2000
3DCoffee: Combining Protein Sequences and Structures
within Multiple Sequence Alignments.
O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of
Molecular Biology,Vol 340,
pp385-395,2004
2005/12/14
55
Q&A
2005/12/14
56
Thank You
2005/12/14
57
Residue score
Sequence score measurement
Global measurement
Residue was scored 9
>90% of the pairs involved in were also present in
the reference library
Residue score evaluated в†’ substitution
defined
Class 5 substitution → residue score ≥ 5
2005/12/14
58
5566677788888888899999877
- - - - -66666666788888888887
2005/12/14
vsdvprdlevvaatptslliswdap
gslevvaatptslliswdap
59
• Correct substitution:
SAGA > ClustalW
• Lower accuracy:
more false positive in
SAGA alignment
2005/12/14
60
High-scoring residues
with high accuracy
2005/12/14
Higher substitution
category в†’ smaller
number of prediction
61
Документ
Категория
Презентации
Просмотров
9
Размер файла
1 931 Кб
Теги
1/--страниц
Пожаловаться на содержимое документа