вход по аккаунту


Direct Protein Identification from Nonspecific Peptide Pools by High-Accuracy MS Data Filtering.

код для вставкиСкачать
Protein Identification
DOI: 10.1002/ange.200503787
Direct Protein Identification from Nonspecific
Peptide Pools by High-Accuracy MS Data
Vinh An Thieu, Dieter Kirsch, Thomas Flad,
Claudia Mller, and Bernhard Spengler*
A new method of protein identification and biomarker
discovery based on the analysis of cleavage peptides formed
by endogenous proteolysis or other undetermined processes is
described herein. Combining database analysis with highly
accurate MS and composition-based de novo sequencing
(CBS) now allows rapid and reliable analysis of undefined
peptide pools and their originating proteins. With established
methods, this can be a highly challenging task if cleavage
enzymes, protein size, and the number of originating proteins
are unknown.
Peptide pools of undefined formation history are readily
found, for example, in blood plasma, pathological samples, or
if the enzymatic degradation of biological samples is not
actively suppressed. Such peptides are usually hard to
characterize with respect to sequence and originating proteins
because database (DB)-assisted approaches such as peptide
mass fingerprinting requires information on the cleavage
enzyme to work well. To understand the specific problems in
this kind of analysis, one has to note that standard procedures
for protein identification and proteome characterization use
specific enzymatic cleavage of individual proteins after 2Dgel separation. Subsequent protein DB search procedures are
based on a list of cleavage-peptide masses resulting from the
questioned protein. Another alternative approach, called
shotgun analysis, uses specific enzymatic cleavage of a
complex (unseparated) mixture of proteins followed by 2DLC fractionation, MS, or MS/MS (fragment ion) analysis of
the peptides and DB-assisted protein identification.[1, 2] Both
methods are commonly used in proteomics studies and are
known to work well only if the cleavage step is well defined.
Both methods were recently found to nevertheless exhibit a
frustratingly poor analytical quality with respect to reproducibility, reliability, completeness of information, and comparability between different instruments or protocols in the field
of proteome characterization.[3, 4] Furthermore, they are
[*] V. An Thieu, Dr. D. Kirsch, Prof. Dr. B. Spengler
Institut f(r Anorganische und Analytische Chemie
Justus Liebig Universit2t
35392 Gießen (Germany)
Fax: (+ 49) 641-993-4809
Dr. T. Flad, Prof. Dr. C. M(ller
Sektion f(r Transplantationsimmunologie und Immunh2matologie
Medizinische Klinik und Poliklinik, Abt. II
Universit2t T(bingen
72072 T(bingen (Germany)
Angew. Chem. 2006, 118, 3395 –3397
2006 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim
mostly unsuccessful with nonspecifically formed complex
peptide samples.
The key to the limitations of these methods lies in the way
in which an individual peptide sequence is determined. If a
short sequence tag, a fragmentation pattern, or a set of
peptide masses point, in a protein sequence DB, to either a
misidentified protein sequence, to an erroneous protein
sequence, or to no protein at all, then an incorrect peptide
sequence or no sequence at all would be determined. In this
case, there is typically no attempt or possibility to qualitycontrol the individual result. A more reliable approach would
be to first determine the peptide sequence without DB help
and then compare the result with DB protein sequences, or to
first list DB hits from standard protein identification
approaches and then validate these hits by DB-independent
peptide sequencing. This kind of independency in peptide
sequence determination and protein identification would
result in a much more reliable and quality-controllable
method of proteome characterization.
DB-independent, so-called de novo sequencing of peptides is not yet employed in routine proteome analyses owing
to limited reliability and long execution times. The CBS
approach of de novo sequencing[5] that we recently introduced
is able to reliably determine, within milliseconds, amino acid
sequences of individual peptides based on high-accuracy
Fourier Transform Ion Cyclotron Resonance (FTICR) MS.
CBS determines the amino acid compositions of unknown
peptides by combinatorial calculations based on accurately
measured mass values of precursor ions and respective
fragment ions. Instead of probability-based scoring, the
algorithm filters sets of mathematically possible compositions
of ions and attempts to reduce the intersection to a minimal
number of possible amino acid compositions. The correct
sequence of the determined composition is subsequently
verified by sequence permutation and matching of observed
and calculated fragment ion signals. Combining this new
approach of de novo sequencing with protein DB information
now allows us to analyze undefined peptide pools rapidly and
accurately and to identify originating proteins. These peptide
pools might even stem from uncontrolled autoproteolysis
reactions that allow us to avoid well-defined gel-based or
other laborious protein separation and digestion steps that are
applied to a native and highly conserved protein sample, as is
usually done in proteomics studies. This method appears to be
highly successful in the detection of biomarkers in differential
studies of easily accessible biofluids or extracts.
The peptide pool is analyzed by nano-LC FTICR MS with
a nanoelectrospray ionization (nano-ESI)[6, 7] interface. High
accuracy MS/MS data were used to search protein DBs
resulting in score-based lists of candidate peptides/proteins
that are, at this stage, still long and of limited reliability. To
reduce and verify the raw list of candidates, CBS analysis was
used in parallel to find the correct peptide sequences from the
list. The combination of candidate lists of amino acid
compositions of peptide precursor ions and of their fragment
ions resulted in unequivocal sequence identifications typically
within milliseconds. After the peptides were identified with
CBS, the corresponding DB-listed proteins were then used to
filter and annotate the list of cleavage-peptide signals. The
remaining unidentified peptide signals were then investigated
in more detail in the following cycle (Figure 1). In general,
about two thirds of the top 20 protein raw-list identifications
turned out to be false positives before CBS verification. This
again demonstrates the need for dedicated validation procedures further to automated proteomics analyses.
Figure 1. Schematic of the accurate-mass data filtering method for the
rapid protein identification from unspecified peptide pools. Nonspecifically cleaved proteins were harvested as complex peptide pools and
analyzed by nano-HPLC/nano-ESI-FTICR-MS/MS. Raw candidate lists
of proteins were created by a nonspecific DB search. Amino acid
composition analysis and sequencing was used to delete false
positives from the list while confirmed proteins were digested in silico
and used to annotate the original peptide signal lists (see Figure 2).
The above method was applied to directly identify cancerrelated cellular proteins. It allowed us to rapidly obtain
verified identification and characterization of originating
proteins whose cleavage products were acid eluted from
cancer cells, without the need for gel separations, enzymatic
digestions, or immunoaffinity chromatography.
Identified peptides were found to be mostly representative (proteolytic) fragments that arose from the natural
degradation of expressed proteins, including cyclophilin A,
galectin-1, S-100 Ca-binding protein, macrophage-migration
inhibitory factor (MIF), glyceraldehyde phosphate dehydrogenase (GAPDH), acyl-CoA-binding protein (ACBP), peroxiredoxin, aldolase A, enolase 1, and others. Although most
of these proteins belong to well-known ubiquitous cellular
house-keeping molecules, some of these proteins (enolase 1,
GAPDH, peroxiredoxin) were found in previous proteomics
or transcriptional-expression analyses to be up regulated in
renal cells that are resistant to oxidative or osmotic stress.[8, 9]
Alternatively, they were identified as renal-cell carcinomaassociated antigens (galectin-1 and S-100 proteins) with
potential relevance for biomarkers in tumor development
and novel immunotherapeutic approaches.[8, 10, 11] A section of
2006 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim
Angew. Chem. 2006, 118, 3395 –3397
the annotated peptide map is shown in Figure 2. By using the
described method, sophisticated procedures employing welltuned, empirically determined score cutoff values[12] for each
DB search could be avoided.
Figure 2. A section of the final annotated peptide map showing that
the method allows detection of biomarker proteins that originate from
unspecified natural degradation processes. This section is an FT-ICR
MS spectrum that was averaged over a time range of 4 min out of a
60 min nano-LC separation. The annotation process is described in
Figure 1.
The described method appears to have a high potential for
rapid proteome studies based on cleavage products or
substance pools (peptides or other biomolecules) originating
from natural or other systematic degradation processes.
Typically, a one-dimensional, conventional HPLC separation,
when performed after a simple acid elution and size-exclusion
filtration, was found to be sufficient as sample preparation.
The method should be amenable to both qualitative screening
studies and more demanding quantitative, differential, or
dynamic analyses of biological systems, for example, in
biomarker search studies. Furthermore, the described
approach can be assumed to be especially useful for
posttranslational modification studies, analysis of small proteins, and proteins that are not satisfactorily detected in gelelectrophoretical separations.
Experimental Section
Mild acidic elution of autoproteolytic peptides and other biomolecules[13] from human renal carcinoma cells (cell line A498) was
performed by treating the cells with an ice-cold citric-acid-phosphate
buffer at pH 3 for 1 min. After centrifugation, the supernatant was
prefractionated by reversed-phase HPLC with a C8 column and UV
detection. This step was performed to study the extract in more detail
and was found to be, in general, unnecessary for the following
procedure. To reduce the complexity of these fractions, ultrafiltration
Angew. Chem. 2006, 118, 3395 –3397
membranes with a cut off at 3 kDa were used to obtain only the lowmolecular-weight fraction. Nano-LC separation (Ultimate Plus,
Dionex Corporation, Idstein, Germany) coupled to FTICR MS/MS
(Finnigan LTQ FT, Thermo Electron, Bremen, Germany) was
employed. A mass resolution of 100 000 at m/z = 400 u and a mass
accuracy of better than 2 ppm were achieved.
MS/MS data were analyzed by using a common DB search
program (Sequest, version 27, revision 12).[14] Precursor ion mass
tolerances of 0.01 u and fragment ion tolerances of 0.5 u were selected
as program parameters. Different from typical proteomics studies
based on tryptic digestion of gel-separated proteins, the cleavage
enzyme and the mass range of the originating proteins were not
specified in the DB search. Owing to an otherwise extensive
reduction in the number of true-positive sequences and therefore
failure of protein identifications, the filter parameters for the DB
search result list were not set too stringent. The peptide precursor
mass range was set to 550–3500 u, and only differing (unique) peptide
sequences were displayed in the protein identification lists. These lists
of possible proteins contained correct assignments and additionally a
large number of false-positive hits. The lists were used as raw lists for
further data evaluation. By using CBS analysis, the raw lists were
verified, annotated, and filtered by verifying the presumed peptide
sequences consecutively, starting from low mass values. Verified
proteins were then digested in silico and the resulting theoretical
cleavage peptides were marked in the signal list after individual
confirmation by CBS. Peptide sequences that remained without
protein assignment were, in the end, marked as being of unknown
Received: October 26, 2005
Revised: March 10, 2006
Published online: April 18, 2006
Keywords: high mass accuracy · mass spectrometry · peptide
sequencing · protein structures · sequence determination
[1] A. J. Link, J. Eng, D. M. Schieltz, E. Carmack, G. J. Mize, D. R.
Morris, B. M. Garvik, J. R. Yates, Nat. Biotechnol. 1999, 17, 676 –
[2] M. P. Washburn, D. Wolters, J. R. Yates, Nat. Biotechnol. 2001,
19, 242 – 247.
[3] D. Chamrad, H. E. Meyer, Nat. Methods 2005, 2, 647 – 648.
[4] J. E. Elias, W. Haas, B. K. Faherty, S. P. Gygi, Nat. Methods 2005,
2, 667 – 675.
[5] B. Spengler, J. Am. Soc. Mass Spectrom. 2004, 15, 703 – 714.
[6] J. B. Fenn, M. Mann, C. K. Meng, S. F. Wong, C. M. Whitehouse,
Science 1989, 246, 64 – 71.
[7] M. S. Wilm, M. Mann, Int. J. Mass Spectrom. Ion Processes 1994,
136, 167 – 180.
[8] H. Dihazi, A. R. Asif, N. K. Agarwal, Y. Doncheva, G. A.
MFller, Mol. Cell. Proteomics 2005, 4, 1445 – 1458.
[9] J. Morrison, K. Knoll, M. J. Hessner, M. Y. Liang, Physiol.
Genomics 2004, 17, 271 – 282.
[10] A. N. Young, M. B. Amin, C. S. Moreno, S. D. Lim, C. Cohen,
J. A. Petros, F. F. Marshall, A. S. Neish, Am. J. Pathol. 2001, 158,
1639 – 1651.
[11] S. Amatschek, U. Koenig, H. Auer, P. Steinlein, M. Pacher, A.
Gruenfelder, G. Dekan, S. Vogl, E. Kubista, K. H. Heider, C.
Stratowa, M. Schreiber, W. Sommergruber, Cancer Res. 2004, 64,
844 – 856.
[12] J. E. Elias, W. Haas, B. K. Faherty, S. P. Gygi, Nat. Methods 2005,
2, 667 – 675.
[13] S. Sugawara, T. Abo, K. Kumagai, J. Immunol. Methods 1987,
100, 83 – 90.
[14] J. K. Eng, A. L. McCormack, J. R. Yates, J. Am. Soc. Mass
Spectrom. 1994, 5, 976 – 989.
2006 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim
Без категории
Размер файла
90 Кб
data, nonspecific, filtering, high, direct, protein, identification, pool, peptide, accuracy
Пожаловаться на содержимое документа