close

Вход

Забыли?

вход по аккаунту

?

minf.201700094

код для вставкиСкачать
Application Note
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
www.molinf.com
DOI: 10.1002/minf.201700094
R-based Tool for a Pairwise Structure-Activity Relationship
Analysis
Kyrylo Klimenko*[a]
Abstract: The Structure-Activity Relationship analysis is a
complex process that can be enhanced by computational
techniques. This article describes a simple tool for SAR
analysis that has a graphic user interface and a flexible
approach towards the input of molecular data. The
application allows calculating molecular similarity represented by Tanimoto index & Euclid distance, as well as,
determining activity cliffs by means of Structure-Activity
Landscape Index. The calculation is performed in a pairwise
manner either for the reference compound and other
compounds or for all possible pairs in the data set. The
results of SAR analysis are visualized using two types of
plot. The application capability is demonstrated by the
analysis of a set of COX2 inhibitors with respect to Isoxicam.
This tool is available online: it includes manual and input
file examples.
Keywords: Structure-activity relationships · molecular similarity · structure-activity landscape · gWidgets · drug discovery
Structure-Activity Relationship (SAR) analysis has been a
useful technique for drug discovery, particularly at hit-tolead stage.[1] According to,[2,3] the SARs can be either
continuous (“activity hills”) or discontinuous (“activity cliffs”)
based on whether small changes in compounds structure
lead to small or dramatic changes in activity. In the presence
of gently rolling hills, or continuous SARs, small changes in
molecular structure will cause small effects on activity and
the ‘biological activity radius’ will be populated by a
spectrum of increasingly diverse structures of similar
activity. This is in contrast to discontinuous SARs, where
small changes in structure have dramatic effects.[3] Both
types of relationship give insights about beneficial and
detrimental structure modifications of hit compounds. Even
though the software for similarity search[4,5] and quantification of activity cliffs[6,7] exists, it either has commercial
restrictions[4,6] or requires an advanced knowledge of
computing,[5,7] keeping some medicinal chemists from using
it. For instance, R packages, such as “ChemmineR”[8] and
“rcdk”,[9] provide various molecular descriptors and data
mining techniques for SAR analysis, however they require
users to perform calculations via R command line, which
may not be convenient for the inexperienced user. Thus,
free user-friendly tool was created for pairwise StructureActivity Relationship analysis.
Structure-Activity Relationship Analyser (SARA) is a Rbased application built for R version 3.1.3[10] that can run
both on Windows and Linux. It consists of several scripts
and a graphic user interface (Figure 1A) created using
“gWidgets” and “gWidgetstcltk” R packages.[11,12] This tool
allows calculating molecular similarity, Structure-Activity
Landscape Index (SALI) and visualising the results of SAR
analysis.
SARA is flexible with respect to the ways of molecular
structure representation. It can either calculate descriptors
from chemical structure file in the SMILES format or use TXT
file with descriptors pre-calculated by user. The tool
computes 22 molecular descriptors (Table S1) that provide
basic representation of molecular structure in case the user
does not have a priori knowledge of the optimal descriptors
for the particular task. The SMILES format is useful since
online databases (e. g. ChEMBL,[13] PubChem[14]) store information on compounds structure as smiles strings allowing to start the SAR immediately after data extraction and
curation. If the user has a clear hypothesis about possible
SAR, then more elaborate descriptors (e. g. PSA, cLogP) can
be calculated beforehand and used as a text file input.
Descriptors are used to compute molecular similarity by
means of Tanimoto coefficient or Euclidean distance. Both
parameters are computed between every compound in the
data set in form of the matrix.
Since this tool is designed to contribute to the SAR
investigations around core compounds structure, the analysis of structure-activity relationship is carried out in the
pairwise manner. There is an option to select reference
compound for optimization and compare it against every
other compound in the set both in terms of structure and
activity. After the data upload, the user may select any
molecule from the data set to be the reference one.
When molecular similarity between the reference compound and the rest of the dataset is known, the calculation
of Structure-Activity Landscape Index becomes possible.
[a] K. Klimenko
Department of molecular structure and chemoinformatics, A.V.
Bogatsky Physico-Chemical Institute of NAS of Ukraine
Lyustdorfskaya doroga, 86, Odessa 65080, Ukraine
E-mail: alhimikir@gmail.com
Supporting information for this article is available on the WWW
under https://doi.org/10.1002/minf.201700094
© 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim
Mol. Inf. 2017, 36, 1700094
(1 of 5) 1700094
Application Note
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
www.molinf.com
Figure 1. Structure-Activity Relationship Analyser: interface and output examples. A – graphic user interface, B – example of calculated
descriptors, C – SALI distribution plot, D – Structure-Activity chart.
The original SALI[15] is based on Tanimoto coefficient
representation of similarity and it is calculated as follows:
Ai Aj SALIi;j ¼
1 simði; jÞ
ð1Þ
where Ai is reference compound activity, Aj is activity of
other compounds in the set, simði; jÞ is a Tanimoto
coefficient for the selected pair of compounds.
Tanimoto coefficient is a convenient way to assess
molecular similarity, however it has certain limitations with
respect to comparison using non-fragment molecular descriptors. Descriptors like molecular weight or pKa are
integral characteristics of the molecule, thus comparing them
in terms of “intersection” and “union” might be an extension
of Tanimoto coefficient application. Thus, a modification of
SALI based on Euclidean distance was introduced:
SALI
d
i;j
Ai Aj ¼
ln dði; jÞ
ð2Þ
where d(i,j) is Euclidean distance between reference and a
set compound. The Structure-Activity Relationship Analyser
computes both the original and modified version of the
index.
Another feature of SARA is results visualization. It is
done by means of SALI distribution plot (Figure 1C) and
Structure-Activity (SA) chart (Figure 1D). SALI plot shows
SALI value distribution within the data set. SA chart shows
how compounds scatter in 2D space of similarity and
activity change. Similar to the SAS map,[16] the chart area is
divided into 4 segments that reflect compounds similarity
and activity. Y-axis show activity change and X-axis shows
Tanimoto similarity coefficient for data set compounds with
respect to reference one. The summary of Structure-Activity
analysis results is generated as a CSV file. It includes results
of Tanimoto similarity/Euclidean distance calculations, SALI/
SALId and the least common descriptor for dataset compounds in a pairwise comparison to the reference one. For
the pair of compounds (i,j) that are represented by a set of
descriptors
d1,d2…d
n the least common descriptor is defined
as max din d jn , i.e. a descriptor that has the highest
n¼1
absolute difference compared to other descriptors, reflect-
© 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim
Mol. Inf. 2017, 36, 1700094
(2 of 5) 1700094
Application Note
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
www.molinf.com
ing the biggest structural difference between two compounds.
A data set of 139 COX2 inhibitors[17] was chosen to
demonstrate tool capability. Studied compounds belong to
enol-carboxamide class. Activity was measured as inhibition
(%) of edema formation in the hind paw of the rat in
response to a subplantar injection of carrageenin at 0.1 mg/
mL dose of inhibitor.[18] Compounds structure was standardized using ChemAxon toolkit.[19]
Two types of descriptor space were used: i) 14 atombased descriptors from SARA and ii) Simplexes (SiRMS)
differentiated by electronegativity[20] In case of SiRMS, only
2D descriptors were calculated and their cross-correlation
was assessed. As a result, only one descriptor among highly
correlating (Pearson r > 0.95) ones was kept – the one with
the higher linear correlation to activity. More information
about the descriptors used in this study is given in the
Supplementary Material Table S1.
Isoxicam (CHEMBLID53292) was chosen as a reference
compound. This nonsteroidal anti-inflammatory drug
(NSAID) was used in several countries but had to be
withdrawn from the market due to adverse effects.[21] It
displays mild COX2 inhibition capacity, so its core structure
has the potential for modifications that can lead to activity
increase.
Isoxicam is rather similar to other compounds from data
set with average Tanimoto similarity coefficient of 0.8 and
0.6 in case of in-built and SiRMS descriptors, respectively.
Thus, only compounds with Tanimoto similarity ratio 0.9
were subjected to analysis of continuous SAR. The structure-activity relationship analysis for 29 selected inhibitors
(Figure 2) revealed that an isoxazole substitution at the
amide group to methylated thiazole increases the activity,
though changing methyl to ethyl group diminishes this
effect. Switching from benzene ring to a thiophen one in
the fused ring fragment leads to the activity growth in case
of unsubstituted thiophens and to the activity fall in case of
methylated thiophens. Introduction of fluorine or chlorine
substituent to the aromatic fragment is generally responsible for the increase in inhibition.
Figure 2. Analysis of SAR based on molecular similarity with respect to Isoxicam. Red – activity increase, blue – activity decrease. Compounds
are ordered from most to least similar according to their mean Tanimoto similarity ratio. The correspondence of compounds numbers in this
figure to their ChEMBL IDs & data source article numbers[17] is given in the Table S2 of the Supplementary Material.
© 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim
Mol. Inf. 2017, 36, 1700094
(3 of 5) 1700094
Application Note
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
www.molinf.com
SALI distribution plot was used to choose compounds
with the highest SALI according to every descriptor space
representation used in this work. In total 13 inhibitors were
selected and analyzed: 7 of them have thiazole substituent,
once again indicating the importance of this fragment.
Since similarity depends largely on the used descriptor
space, the rank correlation between Tanimoto values
produced by SARA & SiRMS descriptors was assessed. The
rank correlation between SALI & modified SALI for these
descriptors was assessed as well. As a result, a correlation
between Tanimoto values was established with a Kendall’s
tau coefficient (t) of 0.56. As for SALI correlation, t equals
0.73 & 0.77 for Simplex and SARA descriptors, respectively.
The calculations needed to do the SAR analysis of this COX2
inhibitors dataset took less than several seconds each.
Application capability to perform on the large dataset was
assessed using PubChem anti-malaria assay data for approximately 170 000 compounds.[22] The performance was done
on Windows 7/i5-6500 CPU 3.20 GHz/8 GB RAM PC and no
computation took more than 8 minutes. The detailed
description of SARA performance on large dataset is given
in Table S3 of Supplementary Material.
Structure-Activity Relationship Analyser (SARA) facilitates
molecular similarity calculation and activity cliffs quantification. This tool uses in-built, commercially available and even
manually calculated descriptors for the chemical structure
representation. The output can be easily saved and
visualized using practical file formats. The application can
be downloaded as SARA_v1_0.rar archive from https://
github.com/klimenko-od91/SARA.
Conflict of Interest
None declared.
Acknowledgements
The author would like to thank Adlen Muats and Victor
Kuz’min for advice on application creation and demonstration of its capability.
[2] G. M. Maggiora, J. Chem. Inf. Model. 2006, 46, 1535.
[3] H. Eckert, J. Bajorath, Drug Discovery Today. 2007,12, 225–233.
[4] JChem Base v17.15.0, 2017, ChemAxon (http://www.chemaxon.com)
[5] N. M. O’Boyle, M. Banck, C. A. James, C. Morley, T. Vandermeersch, G. R. Hutchison, J. Cheminf. 2011, 3, 33.
[6] Activity Miner; in: Forge, v10.4.2, Cresset, Litlington, Cambridgeshire, UK, http://www.cresset-group.com/forge/
[7] http://sali.rguha.net/
[8] Y. Cao, A. Charisi, L. Cheng, T. Jiang, T. Girke, Bioinformatics
2008, 24, 1733–1734.
[9] R. Guha, J. Stat. Softw. 2007, 18, 1–16
[10] R Core Team (2015). R: A language and environment for
statistical computing. R Foundation for Statistical Computing,
Vienna, Austria. URL http://www.R-project.org/
[11] J. Verzani. Based on the iwidgets code of S. Urbanek,
suggestions by S. Urbanek, P. Grosjean and M. Lawrence
(2014). gWidgets: gWidgets API for building toolkit-independent, interactive GUIs. R package version 0.0-54. http://CRAN.Rproject.org/package = gWidgets
[12] J. Verzani (2014). gWidgetstcltk: Toolkit implementation of
gWidgets for tcltk package. R package version 0.0-55. http://
CRAN.R-project.org/package = gWidgetstcltk
[13] A. P. Bento, A. Gaulton, A. Hersey, L. J. Bellis, J. Chambers, M.
Davies, F. A. Krger, Y. Light, L.. Mak, S. McGlinchey, M.
Nowotka, G. Papadatos, R. Santos, J. P. Overington, Nucleic
Acids Res. 2014, 42, 1083–1090 doi:10.6019/CHEMBL.database.23.
[14] S. Kim, P. A. Thiessen, E. E. Bolton, J. Chen, G. Fu, A. Gindulyte,
L. Han, J. He, S. He, B. A. Shoemaker, J. Wang, B. Yu, J. Zhang,
S. H. Bryant, Nucleic Acids Res. 2016, 44(D1), D1202–1213.
[15] R. Guha, J. H. Van Drie, J. Chem. Inf. Model. 2008, 48, 646–658.
[16] O. Mndez-Lucio, J. Prez-Villanueva, R. Castillo, J. L. MedinaFranco, Mol. Inf. 2012, 31, 837–846
[17] E. S. Lazer, C. K. Miao, C. L. Cywin, R. Sorcek, H.-C. Wong, Z.
Meng, I. Potocki, M.-A. Hoermann, R. J. Snow, M. A. Tschantz,
T. A. Kelly, D. W. McNeil, S. J. Coutts, L. Churchill, A. G. Graham,
E. David, P. M. Grob, W. Engel, Hans Meier, G. Trummlitz, J. Med.
Chem. 1997, 40, 980–989.
[18] J. G. Lombardino, E. H. Wiseman, W. M. Mclamore, J. Med.
Chem. 1971, 14, 1171–1175.
[19] ChemAxon Standardizer 6.1.4. http://www.chemaxon.com/
jchem/doc/user/standardizer.html (accessed Feb. 2009)
[20] V. E. Kuz’min, A. G. Artemenko, E. N. Muratov, J. Comput.-Aided
Mol. Des. 2008, 22, 403–421.
[21] M. Fung, A. Thornton, K. Mybeck, J. H. Wu, K. Hornbuckle, E.
Muniz, Drug Inf. J. 2001, 35, 293–317.
[22] National Center for Biotechnology Information. PubChem BioAssay Database; AID = 504834, https://pubchem.ncbi.nlm.nih.gov/bioassay/504834 (accessed Sept. 5, 2017).
References
[1] J. Hughes, S. Rees, S. Kalindjian, K. Philpott, Br. J. Pharmacol.
2011, 162, 1239–1249.
© 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim
Received: July 19, 2017
Accepted: October 16, 2017
Published online on && &&, 0000
Mol. Inf. 2017, 36, 1700094
(4 of 5) 1700094
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
APPLICATION NOTE
K. Klimenko*
1–5
R-based Tool for a Pairwise
Structure-Activity Relationship
Analysis
Документ
Категория
Без категории
Просмотров
3
Размер файла
975 Кб
Теги
201700094, minf
1/--страниц
Пожаловаться на содержимое документа