You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

Optimizing Electrostatic Similarity for Virtual Screening: A New Methodology

Abstract

Ligand Based Virtual Screening methods are widely used in drug discovery as filters for subsequent in-vitro and in-vivo characterization. Since the databases processed are enormously large, this pre-selection process requires the use of fast and precise methodologies. In this work, the similarity between compounds is measured in terms of electrostatic potential. To do so, we propose a new and alternative methodology, called LBVS-Electrostatic. Accordingly to the obtained results, we are able to conclude that many of the compounds proposed with our novel approach could not be discovered with the classical one.

1Introduction

The constant increase in the size of the databases used in Drug Discovery requires efficient techniques and methods that can be used to select the compounds most similarly to a query molecule and at the lowest possible cost. One of these techniques is Virtual Screening (VS). VS is an in-silico technique that allows large libraries with millions of compounds to be processed in order to find new compounds related to a pharmacological query based on one or more features (Hamza et al.2012; Boström et al.2013; Kumar and Zhang, 2016; Wang et al.2009). This represents a great advantage over experimental methods such as High-Throughput Screening (HTS) in terms of efficiency, budget, time and development cost (Kar and Roy, 2013). The resulting compounds from VS are subsequently acquired and empirically tested in the laboratory. In addition, VS techniques are often used as a pre-filter for HTS (López-Ramos et al.2009). All these advantages have increased the popularity of these techniques, which have experienced great advances over the last two decades. The interested reader is referred to previous works (Lešnik et al.2015; Kalászi et al.2014; Liu et al.2011; Dou et al.2018; Schmidt et al.2018) for a description of different methods and tools currently used on VS.

However, there is still room for improvement regarding the accuracy of VS predictions so as not to discard promising compounds, or to reduce the time and error of calculations that compute the different features of the studied compounds (Böhm and Stahl, 2003). VS applied to the electrostatic similarity of compounds is a clear example of this. Contrary to what happens when VS is applied to select the most similar compounds in shape or pharmacophore properties, where the tools base their predictions on scoring functions that measure these particular features (Lešnik et al.2015; Puertas-Martín et al.2019; Yan et al.2013), the predictions in this field are not exclusively based on this descriptor, but on both the similarity of the three dimensional shape and electrostatic similarity (Tresadern et al.2009; Chu and Gochin, 2013; Kim et al.2015; Kossmann et al.2016; Woodring et al.2017; Maccari et al.2011; Kim et al.2016; López-Ramos and Perruccio, 2010; Hevener et al.2012; Kaoud et al.2012; Tiikkainen et al.2009; Massarotti et al.2014; Oyarzabal et al.2009).

Broadly speaking, all the previous works follow the same methodology, called LBVS-Shape throughout this paper, although they may differ in the selection procedure used to determine the compounds proposed as best predictions. Essentially, they initially optimize the compounds in the database against the query in terms of shape by using ROCS (OpenEye Scientific Software, 2019a). After that they select a number N of compounds with the highest shape similarity values and then finally evaluate them in terms of electrostatic similarity.

The value of N is not fixed, as it depends on the particular study. Usually, N is less than 10% of the total compounds in the database (Kossmann et al.2016; Hevener et al.2012; Kaoud et al.2012). A search for the best compounds basing on shape pre-filtering may be counterproductive, since the selection of a low value of N can rule many promising compounds out, which may have a significant impact on the final results.

Additionally, we also believe that using a more realistic description of compound bioactivity during the optimization procedure may help to obtain better predictions. As such, we propose a new approach as part of this work, named LBVS-Electrostatic, which involves the direct optimization of the electrostatic similarity. To do so, a new version of the algorithm OptiPharm, called OptiPharm_ES, has been implemented. OptiPharm (Puertas-Martín et al.2019) was initially designed to optimize the shape similarity between two given molecules, but now it has been adapted to maximize the electrostatic similarity. As results will show, the new LBVS-Electrostatic methodology is able to obtain better solutions than the ones obtained with the classical LBVS-Shape approach.

The rest of the paper is organized as follows. Section 2 gives a brief description about the mathematical formulation of the scoring functions. Sections 3 and 4 describe the two methods used for virtual screening based on electrostatic similarity, both the literature approach and the novel proposal. The former is currently the method most frequently used in the literature. In short, it computes a sublist of molecules with the highest three-dimensional shape similarity. Usually, such a sublist is only composed of less than 10% of the total number of compounds in the database. From the reduced list, the compound(s) with the greatest electrostatic similarity is/are selected. The second one involves the resolution of an optimization problem guided by a electrostatic similarity function. Section 5 describes the framework where the experiments have been carried out and the main results obtained. Finally, the conclusions and lines for future research are summarized in the last section.

2Scoring Functions to Measure Similarity Between Compounds

This section is devoted to defining the mathematical functions used to guide the searching processes. The figures in which the values of these objective functions are graphically represented have been created with VIDA v4.4.0 (OpenEye Scientific Software, 2019b) using the default configuration.

2.1Shape Similarity

The shape similarity of two compounds is calculated as follows:

(1)
VABg=iA,jBVijg=iA,jBpipjKij(παi+αj)32,
where pi and pj are set to 2.7, αi and αj obtain the van der Waals value for each atom and
(2)
Kij=exp(αiαjRij2αi+αj),
where Rij is the distance between atoms i and j.

Notice that the accuracy obtained from (1) depends on the number of atoms in the two compared molecules, i.e. the higher this number, the longer the value of VAB as an absolute value. To be able to measure the level of similarity between compounds, regardless of the number of atoms that they are composed of and the descriptor used, the Tanimoto Similarity (Jaccard, 1901) value is computed as follows:

(3)
Tcs=VABVAA+VBBVAB,
where VAB is the A molecule overlaid onto B molecule. VAA and VBB is the overlap of the molecules A and B, respectively. (3) has a value in the range [0,1], where 0 means there is no overlapping and 1 means the shape of both molecules is the same.

2.2Electrostatic Similarity

The electrostatic similarities are obtained by numerical solution of the Poisson equation (Böttcher et al.1974), viz:

(4)
{ϵ(r)ϕ(r)}=ρmol(r),
where ϕ(r) is the electrostatic potential, ϵ(r) is the dielectric constant, and ρmol(r) is the molecular charge distribution. Electrostatic similarity between two compounds is compared by determining EAB:
(5)
EAB=ϕA(r)ϕB(r)ΘA(r)ΘB(r)drh3ijkϕijkAϕijkBΘijkAΘijkB,
where Θ is a masking function to ensure potentials interior to the compound are not considered part of the comparison. The integral appearing in (5) is a volume integral, computed using a grid-spacing parameter, h.

Again the accuracy obtained by (5) depends on the number of atoms in the compared molecules. As such, similarly to what was done previously, the Tanimoto Similarity (Jaccard, 1901) value has been computed as follows:

(6)
TcE=EABEAA+EBBEAB,
where EAB is the A molecule overlaid onto B molecule. EAA and EBB is the overlap of the molecules A and B, respectively. In this case, (6) has a value in the range [0.33,1], where 0.33 means the charges of both compounds have the same value but opposite loads, 0 means there is no overlapping, and 1 means the charges of both molecules are the same.

3The Previous Approach: The LBVS Method Guided by Molecular Shape (LBVS-Shape)

This method bases its predictions on a previous pre-filtering process consisting of identifying the N candidate compounds from the database with the highest shape similarity. After that, for each selected compound, the electrostatic similarity is calculated at the optimum superimposition obtained in the previous stage. Finally, the molecule with the highest electrostatic similarity value is selected as the one for the solution.

In this work, we have used the tool ROCS (OpenEye Scientific Software, 2019a) to optimize the shape similarity between two molecules. ROCS is a parametrized piece of software used to maximize volume overlapping similarity and utilizes the previously described (3) to represent molecules by means of Gaussian functions (Grant and Pickup, 1995; Grant et al.1996). Electrostatic similarity has been calculated using the ZAP Toolkit (see (6)). This software has been downloaded without modification from the original website (OpenEye Scientific Software, 2019c). It is worth mentioning that ROCS and ZAP are, by far, the most widely used tools in the literature for VS based on shape and electrostatic similarity (Ellingson et al.2010; Thomas et al.2013; Hawkins and Stahl, 2018; Connelly et al.2015; Gowthaman et al.2015). For this reason they have been selected as part of this study; i.e. a fair and complete study must be carried out by making a comparison with the state-of-the-art methods.

4The New Approach: A LBVS Method Guided by Electrostatic Similarity (LBVS-Electrostatic)

Our main aim when using this approach is to obtain the compound(s) with the highest electrostatic similarity values. Thus, an optimization problem must be defined with this aim in mind. Broadly speaking, any tool, method or algorithm used will be better guided towards the optima if the objective function is a numerical model representing the real objective. Until now, most methods focus on prioritizing the search of compounds with the same global shape, while they place electrostatic similarities at much lower priority. Consequently, they solve a shape similarity optimization problem instead of focusing on the electrostatic similarity, which may be more useful from the drug discovery point of view.

The new approach being presented here is based on the idea that the scoring function used to guide the optimization method must be mainly based on electrostatic similarity, since it is very likely that compounds with very high electrostatic similarity will share very similar chemical properties. The same can not be said while just focusing on shape similarity. In the latter, the search may converge to a sub-optimal solution (Ivorra et al.2018; Fernández et al.2017, 2019). OptiPharm (Puertas-Martín et al.2019), a recent algorithm proposed for working on LBVS problems, is used to prove our hypothesis. The interested reader is referred to as Puertas-Martín et al. (2019) for an in-depth description of this algorithm. For the sake of completeness, some of its main strengths and important features are briefly described in the following.

OptiPharm is a global evolutionary optimizer that can solve any optimization problem that concerns the computation of the similarity of two compounds, named query and target. It implements procedures to increasingly adjust the query molecule to the target, which remains fixed throughout the optimization method. A solution s represents the rotation and translation of the query with respect to the target. The parameters associated with s are dynamically bounded for each particular instance to reduce as much as possible the feasible region.

OptiPharm analyses the entire search space looking for likely areas where the local and global optima can be. To do so, it runs on a set of M solutions, called population, on which it applies a sequence of reproduction, selection and improvement procedures during several iterations.

Each solution in the population has a radius value that delimits a multidimensional subarea of the search space where the reproduction and improvement methods are applied. The radius corresponding to a solution depends on the iteration i where it was created. The real strength of the radius is that it allows us to focus the search on different subareas since many solutions with different radii can coexist simultaneously during the optimization procedure. Therefore, at the same stage of the optimization procedure, new promising regions are systematically analysed, while others are examined thoroughly. Besides, the maximum number of initial solutions M, the number of iterations tmax and the smallest radius value Rtmax OptiPharm has, as input parameter, a maximum number N of function evaluations.

Figure 1 shows the main stages of the algorithm and a brief description of the procedures implemented.

Fig. 1

OptiPharm algorithm: main stages.

OptiPharm algorithm: main stages.

During this work, the scope of its functionalities has been extended to include the electrostatic potential as the scoring function. The new version has been called OptiPharm_ES. The electrostatic similarity between two compounds has been computed by using the source code of the ZAP Toolkit, also downloaded from https://docs.eyesopen.com/toolkits/cpp/zaptk/thewayofzap.html (OpenEye Scientific Software, 2019c). This approach ensures that the comparisons between methodologies are made under the same conditions. Additionally, OptiPharm_ES have been made available at https://hpca.ual.es/optipharm/ES.

4.1Hardware Setup

All the experiments in this work have been executed using a Bullx R424-E3, which consists of 2 Intel Xeon E5 2650v2 (16 cores), 128 GB of RAM memory and 1 TB HDD (http://hpca.ual.es/en/infraestructure) along with the cluster Eagle https://wiki.man.poznan.pl/hpc/index.php?title=Eagle.

4.2Benchmarks

In this work, a database provided by The Food and Drug Administration has been used (FDA). The Food and Drug Administration is a federal agency of the United States Department of Health and Human Services responsible for protecting and promoting public health by controlling, among other things, prescription and over-the-counter pharmaceutical drugs (medications). This agency provides a data set containing 1751 compounds, which represents approved medicines that can be safely used on humans in the USA. This database is useful since in the high similarity cases it would directly contribute to drug re-purposing. This is of relevant utility given the clear trend regarding re-purposing drugs observed over the last 5 years (Dakshanamurthy et al.2012; Kumar and Zhang, 2018; Yuan et al.2017).

The version of the database used in this work was obtained from DrugBank v5.0.1 (Wishart et al.2018) and necessary mol2 files for the VS calculations were set up by using AmberTools (Case et al.2017) by removing salts and neutralizing their protonation state, computing partial charges by MMFF94 force field, adding hydrogen atoms and minimizing energies (default parameters) (Halgren, 1995).

A comprehensive computational analysis may cover a representative sample of the database. The compounds included in the FDA database have different attributes, one of the most relevant for the study at hand being the number of atoms. In this work, a selection of 50 compounds has been made in the following way: the compounds in the database have been sorted by the number of atoms, including hydrogen, and then divided into 24 intervals (see Fig. 2). From each sector, at least one compound was chosen at random and proportional to the number of compounds in the sector.

Fig. 2

Number of compounds included in the FDA database, according to their number of atoms.

Number of compounds included in the FDA database, according to their number of atoms.

Finally, these comparisons between compounds have been run using OptiPharm_ES with the following input parameter configuration: N=200000 function evaluations, M=5 starting poses, tmax=5 iterations and Rtmax=1 as the smallest possible radius.

5Results

5.1Influence of the Size List of Top-Ranked Compounds in the LBVS-Shape Method

As previously mentioned, the LBVS-Shape bases its predictions on a pre-selection of the first best compounds in terms of superimposition score (N). In this subsection, a study has been conducted to know how the value of N affects the final results from the point of view of electrostatic similarity. In particular, the LBVS-Shape has been performed on the selected 50 queries and for five different values of N, i.e. N has been set to 175, 438, 876, 1313 and 1751 compounds. It means that for each query, we have selected either 10%, 25%, 50%, 75% or 100% of the ranked compounds during the pre-selection phase.

Figure 3 illustrates a toy example of the main steps of the LBVS-Shape method for the Query DB01213 and N=1751, i.e. the total number of compounds in the FDA set. Initially, the Query is compared to each compound TargetS from the database to obtain their optimum position and corresponding shape similarity value TcS. As previously mentioned, this stage is carried out by using ROCS. Afterwards, compounds are sorted ( RkS) in decreasing order by TcS. The N best compounds are selected and evaluated to measure the corresponding electrostatic similarity value TcEEval. Notice that the evaluation of the electrostatic similarity considers the pose obtained with the shape similarity optimization. The compound with the highest TcEEval, called BestComp throughout this paper, is selected as the best prediction. Finally, as an additional and unconsidered stage in the LBVS-Shape method, we have computed the optimized superposition between the BestComp and the Query by using OptiPharm_ES. The corresponding TcE value is then provided.

Fig. 3

Toy example of the performance of the LBVS-Shape method for a particular case where Query=DB01213 and N=1751 using the FDA database.

Toy example of the performance of the LBVS-Shape method for a particular case where 
Query=DB01213 and 
N=1751 using the FDA database.

To get an overview of the results, average values of the BestComp found for the 50 queries and each value for N have been computed, and shown in Table 1. In particular, the average position Av(RkS) in the sorted list where the BestComp were located have been computed, together with the following: their mean number of atoms Av(NS), their average shape similarity value Av(TcS), their corresponding electrostatic similarity value Av(TcEEval) when they are evaluated, and finally, their mean electrostatic similarity when they are optimized Av(TcE).

As it can be seen, the predictions seem to improve in term of electrostatic similarity as the number N of selected molecules in the sorted list increases (see columns Av(TcEEval) and Av(TcE)). In accordance with these results, the posterior comparison between LBVS-Shape and LBVS-Electrostatic methods has been carried out by setting N=1751.

Table 1

Influence of the parameter N in the results obtained by the LBVS-Shape method. For each value of N, the following average values from the 50 queries, are shown: position in the shape ranking ( Av(RkS)), number of atoms ( Av(NS)), shape similarity score ( Av(TcS)), electrostatic similarity evaluation score ( Av(TcEEval)) and electrostatic optimized similarity value ( TcE).

N Av(RkS) Av(NS) Av(TcS) Av(TcEEval) Av(TcE)
17573530.6270.4510.559
438162500.5870.4860.568
876287510.5640.4950.569
1313324500.5590.4970.570
1751362490.5540.4970.569

5.2Performance Comparison Between LBVS-Shape and LBVS-Electrostatic Methods

To analyse the performance of both methods, we have conducted a study in which the selected 50 molecular queries are processed with reference to the FDA database. Notice that comparing a query with itself always reaches the maximum similarity value, both for electrostatic potential as well as for shape. Subsequently, these results were removed when ranking the compounds. In other words, the compounds given as a result are not the most similar ones, but the second compounds in the ranked list. Additionally, as previously mentioned, the traditional method has been carried out considering the total number of compounds in the database N=1751, so as to increase the probability of finding better predictions.

To illustrate how we generate the later summarizing tables, a sample of the results obtained by both methods when comparing a query to the molecules in the dataset is studied. In particular, the instance Query=DB01213 is analysed. Notice that this is the example used to illustrate the stages of the LBVS-Shape method in Fig. 3. After that, the same instance is considered to exemplify the performance of the LBVS-Electrostatic method (see Fig. 4). Notice that this Query has been selected because it is small and it helps to see the main ideas of the paper very easily by using figures. However, the conclusions inferred from the associated results can be extrapolated to any other Query. As can be observed, the LBVS-Electrostatic technique solves an optimization problem to determine the electrostatic similarity, TcE, between the pharmaceutical Query and every TargetE in the database. Afterwards, the list of compounds is sorted by the TcE value and the one located in first position, RkE=1, is selected as the best prediction. Finally, to complete the study, optimization is carried out to calculate the shape similarity TcS between the chosen compound and the Query.

Fig. 4

An example of the performance of the LBVS-Electrostatic method for a particular case where Query=DB01213 is compared to the FDA database.

An example of the performance of the LBVS-Electrostatic method for a particular case where 
Query=DB01213 is compared to the FDA database.

For the sake of clarity and comparison, the results shown in Figs. 3 and 4 are summarized in Table 2. The meaning of the columns as well as the particular values in the tables, are the ones previously explained and shown in each figure. The last row corresponds to the values associated with the best predictions. As can be observed, each method obtains a different compound as a top solution. LBVS-Shape provides the DB00184 molecule with a TcS=0.621 and a TcEEval=0.500. At the same time, LBVS-Electrostatic proposes the DB03255 compound as being the most similar to the query with TcE=0.810 and TcSEval=0.880. As such, the LBVS-Electrostatic method has not only obtained a more similar compound in terms of electrostatic potential, but also in shape. In Fig. 5, the final position for each case is shown.

Table 2

Summary of the results obtained for both LBVS-Shape and LBVS-Electrostatic methods for the query compound DB01213. The column notation, the colours included and the corresponding results come from Figs. 3 and 4, i.e. they maintain the same meaning as shown previously for those pictures. The last row indicates the results associated with the top solution selected for each method.

Summary of the results obtained for both LBVS-Shape and LBVS-Electrostatic methods for the query compound DB01213. The column notation, the colours included and the corresponding results come from Figs. 3 and 4, i.e. they maintain the same meaning as shown previously for those pictures. The last row indicates the results associated with the top solution selected for each method.
Fig. 5

Summary of results of LBVS-Shape and LBVS-Electrostatic where Query=DB01213. The Query compound is coloured green. Query electrostatic fields are coloured deep blue and red. Best compounds are shown in grey and their electrostatic potential fields, in light blue and pink.

Summary of results of LBVS-Shape and LBVS-Electrostatic where 
Query=DB01213. The Query compound is coloured green. Query electrostatic fields are coloured deep blue and red. Best compounds are shown in grey and their electrostatic potential fields, in light blue and pink.
Table 3

Rows are sorted by the number of atoms of queries. For each query, the same procedure explained in Table 2 is followed. The last row summarizes the average values for each column.

Query NQLBVS-ShapeLBVS-Electrostatic
RkS TargetS NS TcS TcEEval TcE TargetE NE TcE TcSEval TcS
DB0052910316DB05266350.4960.4370.593DB00818310.7200.4680.614
DB0121312182DB00184260.6210.5000.609DB03255130.8100.8800.963
DB0017315102DB00851230.7920.5460.536DB01119210.8340.7770.830
DB001721724DB00128160.8810.4690.561DB00677250.6990.6900.769
DB0033120380DB00961400.5980.5990.697DB01018240.7900.5590.649
DB0111921513DB00828150.6550.5190.613DB00173150.8320.7790.829
DB025132527DB01275200.8720.5260.569DB06637130.9150.7450.805
DB0091528125DB00160130.6840.4040.543DB00478340.9460.6730.924
DB01352291DB00306320.9260.9470.983DB00306320.9830.9010.926
DB0136530180DB01191330.7380.9020.960DB01626260.9640.6280.824
DB006573347DB06770160.7880.3960.517DB01043340.9790.6090.861
DB004783430DB00752210.7870.5080.637DB01043340.9570.6150.879
DB010433427DB00945210.7650.4000.478DB00657330.9730.7110.861
DB0038035601DB00731500.6200.3800.407DB08971560.5050.4350.655
DB00693371034DB04575590.5250.3620.429DB00692400.4540.3910.783
DB0918537243DB01233430.7220.8390.506DB09021390.9160.4290.650
DB076154071DB04552280.7040.8610.866DB09218280.8920.6100.574
DB0921940123DB00321440.6980.3470.329DB00316200.4500.2490.462
DB0067442279DB00575230.6880.5050.653DB00514450.6620.4150.695
DB0088745209DB00232310.6420.4010.454DB01127390.6620.3780.576
DB0119845273DB00209590.6480.7480.768DB00123250.8940.3340.491
DB01155481DB01165460.8580.6710.818DB01208500.8990.3850.835
DB0024650467DB00268440.5420.8430.852DB05271480.8770.3910.604
DB0038153525DB00573320.5770.2850.278DB00630270.3770.3970.524
DB0087654576DB01002490.5160.3950.505DB00774280.5320.2760.524
DB0923754380DB09092440.5800.7590.824DB08998400.9020.4470.596
DB00254551100DB00271280.5210.6260.836DB00271280.8360.2190.521
DB0126857902DB09014540.5180.7920.765DB01409480.8830.4210.564
DB01196607DB00783440.7410.3970.385DB08797170.5270.1950.385
DB0162166274DB00268440.5520.8210.845DB04861550.8670.3300.454
DB0923666459DB00607510.5090.4060.438DB00449540.6640.4390.551
DB0063269537DB005111230.3480.0670.246DB0089890.9970.1260.137
DB08903696DB01433580.6210.8400.867DB01359510.8880.3070.464
DB0141970380DB09209610.4310.8540.879DB01611510.9330.2910.423
DB0032080204DB00438590.5150.3670.396DB00120230.5630.2450.278
DB00728911383DB06204400.3990.6880.761DB0913130.8740.0680.101
DB0050398655DB00206840.3710.2560.243DB01144220.4010.1800.280
DB01232100639DB06480520.3890.6910.741DB09089580.7910.2900.387
DB00309110385DB01603450.4550.2410.297DB00319630.4670.2670.534
DB047861204DB09158820.3770.4240.708DB09159180.9100.1080.120
DB09114130117DB00595570.3760.2730.506DB00583260.8760.1830.190
DB06439137657DB01628390.3830.3360.425DB00878640.4880.2740.423
DB0107814034DB00204560.4240.2010.259DB01085310.5400.1690.211
DB015901511037DB01193530.2650.2480.358DB0065360.5290.0700.100
DB0489415282DB01199870.3610.3480.484DB0913130.6620.0060.040
DB00403167325DB04855840.2610.3250.395DB06335490.5750.1200.198
DB00732169640DB08967520.2220.2360.353DB0065360.5080.0510.069
DB000501947DB013691410.3490.2380.383DB00516190.3850.0590.080
DB066992211465DB01245560.1190.3650.513DB0913130.6420.0130.029
DB0621922969DB013691410.2930.2770.394DB0913130.6700.0090.021
Mean74362490.5540.4970.569310.7380.3720.505

Once the specific case of DB01213 has been explained in detail, the results of the 50 queries have been summarized in Table 3. Columns RkEEval and RkE have been removed in this table because their values are always 1. The last row summarizes the average of the results.

As evidenced, LBVS-Electrostatic obtains on average TcE=0.738, which is higher than that given by LBVS-Shape, TcEEval=0.497. Similar conclusions can be inferred when comparing the TcE average values for both methods. Additionally, when the results are analysed individually, we can see that LBVS-Electrostatic provides solutions with higher TcE values than those achieved by LBVS-Shape. In fact, in 48 out of 50 cases, LBVS-Electrostatic obtains a different compound than that reached by LBVS-Shape.

Regarding shape similarity, it is possible to infer that, on average, the methods are equivalent in terms of accuracy of the predictions, i.e. LBVS-Shape obtains an average value of Tcs=0.554 while LBVS-Electrostatic reaches a mean value of Tcs=0.505. Furthermore, analysing the obtained results individually, we can see that in 2 out of 50 cases, LBVS-Electrostatic offers better or equivalent predictions than that achieved by LBVS-Shape in terms of shape (see columns Tcs in LBVS-Shape and TCsEval in LBVS-Electrostatic). It means that cases exist where two compounds can be very similar in terms of electrostatic potential, although they can be very different in terms of three-dimensional shape. It means that those solutions could not be obtained by using the methodology followed by the traditional LBVS-Shape method, since it only focuses on the compounds with the highest similarity in shape.

Making a somewhat more detailed approach for compounds smaller than 50 atoms, which means the first 23 query compounds in the table, there are 5 cases where the difference is less than 0.05 (DB00529, DB00173, DB00331, DB00915 and DB01352) and in another 3 cases the difference is 0.1 (DB01043, DB07615 and DB01268). Considering the values of these 7 cases in which the shape LBVS-Electrostatic is smaller than that of LBVS-Shape, the average difference is 0.048, while the mean gain in electrostatic similarity for those 7 compounds is 0.271. In large compounds, which includes 27 queries, there are only two cases with similar characteristics, which are compounds DB09236 with a difference of 0.07 and DB06699 with a difference of 0.013, both of them for shape similarity. In view of these results, the LBVS-Electrostatic method seems to be justified when proposing new solutions for small compounds.

However, not all the improvements are related to electrostatic fields. The optimization of electrostatic potential using OptiPharm_ES might allow a better solution to be found in terms of shape too. Compounds DB01119 and DB1213 in Table 3 are some outstanding examples. For example, in the case of Query=DB01119, the best compound found by LBVS-Shape is DB00828 with TcS=0.655 and TcEEval=0.519. Moreover, LBVS-Electrostatic’s best compound is DB00173. It has a better TcE, i.e. 0.829, but also the position of those compounds after the electrostatic optimization is improved, TcSEval=0.779.

5.3ZAP Toolkit Accuracy Problem

The ZAP Toolkit has been widely used in the literature to calculate the electrostatic similarity score for two compounds (Boström et al.2013; Tresadern et al.2009; Chu and Gochin, 2013; Kim et al.2015; Kossmann et al.2016; Woodring et al.2017; Maccari et al.2011; Kim et al.2016; López-Ramos and Perruccio, 2010; Hevener et al.2012; Kaoud et al.2012; Tiikkainen et al.2009; Massarotti et al.2014; Oyarzabal et al.2009Haque and Pande).

In this subsection we would like to remark that the ZAP Toolkit can return an erroneous value, which was discovered when using OptiPharm_ES. During the optimization procedure, OptiPharm_ES can progressively separate two input compounds aimed to escape from local optima and explore the searching space in depth. In fact, it is possible to analyse cases where no overlap exists between the input molecules. During the analysis of the results, we discovered that cases exist where the ZAP Toolkit can overflow, mainly when situations such as the previously mentioned happen. See Fig. 6 to see a particular example. Herein, compound DB01365 remains fixed on the left while compound DB00459 occupies three positions (red, blue and pink). The red compound obtains an electrostatic similarity value of 1. The light blue compound is displaced half a unit to the left, i.e. closer to the reference compound and its similarity value is 0.38. The pink compound is shifted 0.5 units to the right, that is, away from the reference compound. Its similarity value is 0. Calculations can be made using the ZAP Python script available at https://docs.eyesopen.com/toolkits/python/zaptk/thewayofzap.html in the Electrostatic Similarity section.

This problem has been solved in OptiPharm_ES by considering the poses with the previously mentioned problem unfeasible. It means that they are no longer considered during the optimization process.

Fig. 6

Compound DB01365 is printed green. Compound DB00459 is represented in three coloured figures: light blue, red and pink. Electrostatic fields are printed in dark blue and red using VIDA.

Compound DB01365 is printed green. Compound DB00459 is represented in three coloured figures: light blue, red and pink. Electrostatic fields are printed in dark blue and red using VIDA.

6Conclusions

In this work, a new approach to solve the LBVS problem based on the electrostatic similarity has been put forward. It has been called LBVS-Electrostatic. This methodology is based on the direct optimization of electrostatic similarity. For this purpose, a new version of OptiPharm has been used. Conversely, the method proposed in the literature, which has been named LBVS-Shape throughout the paper, looks for a sublist of the top compounds with the highest shape similarity by using ROCS, to later evaluate their electrostatic similarity with ZAP. In this work, a study to analyse the influence of the number of compounds in such a sublist has been carried out. As the results have shown, the larger the number of molecules considered, the better the prediction obtained in terms of electrostatic similarity. From this conclusion, a computational study has been carried out to compare the new method LBVS-Electrostatic with the one in the literature LBVS-Shape. To increase the probability of finding good predictions, LBVS-Shape has been executed taking into account the whole database prior to the electrostatic similarity evaluation. Even so, LBVS-Electrostatic performs better than LBVS-Shape, achieving better predictions in electrostatic potential for the 50 queries included in the study. Regarding the shape similarity, both methods behave in a similar fashion, on average obtaining compounds with similar shape similarity values. It is important to mention that the new methodology proposed in this paper is novel, which means that the predictions proposed have not been analysed previously.

Finally, we have shown that ZAP can return erroneous values. This is an important discovery, since it is the most commonly used software in the literature to measure the electrostatic similarity.

In the future, we have plans to implement this objective function from scratch, but for the study at hand, we considered that it was more important to compare it with the state-of-the-art software. Additionally, other functions measuring the pharmacophore similarity will be implemented. Finally, we will analyse the problem from a multi-objective perspective, where shape an electrostatic similarity are optimized simultaneously.

Appendices

A

AAppendix Availability of data and materials

The databases belong to their authors and access to them depends on any applicable restrictions.

Acknowledgments

Powered@NLHPC: This research was partially supported by the supercomputing infrastructure of the NLHPC (ECM-02). This research was also partially supported by the supercomputing infrastructure of Poznan Supercomputing Center and by the e-infrastructure program of the Research Council of Norway, and the supercomputer center of UiT – the Arctic University of Norway. The authors also thankfully acknowledge the computer resources and the technical support provided by the Plataforma Andaluza de Bioinformática of the University of Málaga. This work was partially supported by the computing facilities of the Extremadura Research Centre for Advanced Technologies (CETA–CIEMAT), funded by the European Regional Development Fund (ERDF). CETA–CIEMAT belongs to CIEMAT and the Government of Spain. Additionally, the authors would also like to thank N.C. Cruz and J.J. Moreno for their technical support.

References

1 

Böhm, H.-J., Stahl, M. ((2003) ). The Use of Scoring Functions in Drug Discovery Applications. John Wiley & Sons, Inc., pp. 41–87.

2 

Boström, J., Grant, J.A., Fjellström, O., Thelin, A., Gustafsson, D. ((2013) ). Potent fibrinolysis inhibitor discovered by shape and electrostatic complementarity to the drug tranexamic acid. Journal of Medicinal Chemistry, 56: (8), 3273–3280.

3 

Böttcher, C., Belle, O.V., Belle, B. ((1974) ). Theory of Electric Polarization. Elsevier Scientific Pub. Co, Michigan.

4 

Case, D.A., Cerutti, D.S., Cheatham, T.E., Darden, T.A., Duke, R.E., Giese, T.J., Gohlke, H., Goetz, A.W., Greene, D., Homeyer, N., Izadi, S., Kovalenko, A., Lee, T.S., LeGrand, S., Li, P., Lin, C., Liu, J., Luchko, T., Luo, R., Mermelstein, D., Merz, K.M., Monard, G., Nguyen, H., Omelyan, I., Onufriev, A., Pan, F., Qi, R., Roe, D.R., Roitberg, A., Sagui, C., Simmerling, C.L., Botello-Smith, W.M., Swails, J., Walker, R.C., Wang, J., Wolf, R.M., Wu, X., Xiao, L., York, D.M., Kollman, P.A. (2017). AMBER. University of California, San Francisco.

5 

Chu, S., Gochin, M. ((2013) ). Identification of fragments targeting an alternative pocket on HIV-1 gp41 by NMR screening and similarity searching. Bioorganic and Medicinal Chemistry Letters, 23: (18), 5114–5118.

6 

Connelly, P.R., Snyder, P.W., Zhang, Y., McClain, B., Quinn, B.P., Johnston, S., Medek, A., Tanoury, J., Griffith, J., Patrick Walters, W., Dokou, E., Knezic, D., Bransford, P. ((2015) ). The potency–insolubility conundrum in pharmaceuticals: mechanism and solution for hepatitis C protease inhibitors. Biophysical Chemistry, 196: , 100–108.

7 

Dakshanamurthy, S., Issa, N.T., Assefnia, S., Seshasayee, A., Peters, O.J., Madhavan, S., Uren, A., Brown, M.L., Byers, S.W. ((2012) ). Predicting new indications for approved drugs using a proteochemometric method. Journal of Medicinal Chemistry, 55: (15), 6832–6848.

8 

Dou, X., Jiang, L., Wang, Y., Jin, H., Liu, Z., Zhang, L. ((2018) ). Discovery of new GSK-3 β inhibitors through structure-based virtual screening. Bioorganic & Medicinal Chemistry Letters, 28: (2), 160–166.

9 

Ellingson, B.A., Skillman, A.G., Nicholls, A. ((2010) ). Analysis of SM8 and Zap TK calculations and their geometric sensitivity. Journal of Computer-Aided Molecular Design, 24: (4), 335–342.

10 

Fernández, J., Tóth, B.G.-, Redondo, J.L., Ortigosa, P.M., Arrondo, A.G. ((2017) ). A planar single-facility competitive location and design problem under the multi-deterministic choice rule. Computers & Operations Research, 78: , 305–315.

11 

Ferrández, M.R., Redondo, J.L., Ivorra, B., Ramos, Á.M., Ortigosa, P.M. ((2019) ). Preference-based multi-objectivization applied to decision support for high-pressure thermal processes in food treatment. Applied Soft Computing, 79: , 326–340.

12 

Gowthaman, R., Lyskov, S., Karanicolas, J. ((2015) ). DARC 2.0: improved docking and virtual screening at protein interaction sites. PLOS ONE, 10: (7), 0131612.

13 

Grant, J.A., Pickup, B.T. ((1995) ). A Gaussian description of molecular shape. The Journal of Physical Chemistry, 99: (11), 3503–3510.

14 

Grant, J.A., Gallardo, M.A., Pickup, B.T. ((1996) ). A fast method of molecular shape comparison: a simple application of a Gaussian description of molecular shape. Journal of Computational Chemistry, 17: (14), 1653–1666.

15 

Halgren, T.A. ((1995) ). Potential energy functions. Current Opinion in Structural Biology, 5: (2), 205–210.

16 

Hamza, A., Wei, N.-N., Zhan, C.-G. ((2012) ). Ligand-based virtual screening approach using a new scoring function. Journal of Chemical Information and Modeling, 52: (4), 963–974.

17 

Haque, I., Pande, V. Method for rapidly approximating similarities. Patent number: US8706427B2. US8706427B2.

18 

Hawkins, P.C.D., Stahl, G. ((2018) ). Ligand-based methods in GPCR computer-aided drug design. Methods in Molecular Biology, 1705: , 365–374.

19 

Hevener, K.E., Mehboob, S., Su, P.-C., Truong, K., Boci, T., Deng, J., Ghassemi, M., Cook, J.L., Johnson, M.E. ((2012) ). Discovery of a novel and potent class of F. tularensis enoyl-reductase (FabI) inhibitors by molecular shape and electrostatic matching. Journal of Medicinal Chemistry, 55: (1), 268–279.

20 

Ivorra, B., Ferrández, M.R., Crespo, M., Redondo, J.L., Ortigosa, P.M., Santiago, J.G., Ramos, Á.M. ((2018) ). Modelling and optimization applied to the design of fast hydrodynamic focusing microfluidic mixer for protein folding. Journal of Mathematics in Industry, 8: (1), 4.

21 

Jaccard, P. ((1901) ). Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles, 37: , 241–272.

22 

Kalászi, A., Szisz, D., Imre, G., Polgár, T. ((2014) ). Screen3D: A novel fully flexible high-throughput shape-similarity search method. Journal of Chemical Information and Modeling, 54: (4), 1036–1049.

23 

Kaoud, T.S., Yan, C., Mitra, S., Tseng, C.-C., Jose, J., Taliaferro, J.M., Tuohetahuntila, M., Devkota, A., Sammons, R., Park, J., Park, H., Shi, Y., Hong, J., Ren, P., Dalby, K.N. ((2012) ). From in Silico discovery to intra-cellular activity: targeting JNK–protein interactions with small molecules. ACS Medicinal Chemistry Letters, 3: (9), 721–725.

24 

Kar, S., Roy, K. ((2013) ). How far can virtual screening take us in drug discovery? Expert Opinion on Drug Discovery, 8: (3), 245–261.

25 

Kim, E.-S., Cho, H., Lim, C., Lee, J.-Y., Lee, D.-I., Kim, S., Moon, A. ((2015) ). A natural piper-amide-like compound NED-135 exhibits a potent inhibitory effect on the invasive breast cancer cells. Chemico-Biological Interactions, 237: , 58–65.

26 

Kim, Y.-R., Koh, H.-J., Kim, J.-S., Yun, J.-S., Jang, K., Lee, J.-Y., Jung, J.U., Yang, C.-S. ((2016) ). Peptide inhibition of p22phox and Rubicon interaction as a therapeutic strategy for septic shock. Biomaterials, 101: , 47–59.

27 

Kossmann, B.R., Abdelmalak, M., Lopez, S., Tender, G., Yan, C., Pommier, Y., Marchand, C., Ivanov, I. ((2016) ). Discovery of selective inhibitors of tyrosyl-DNA phosphodiesterase 2 by targeting the enzyme DNA-binding cleft. Bioorganic and Medicinal Chemistry Letters, 26: (14), 3232–3236.

28 

Kumar, A., Zhang, K.Y.J. ((2016) ). Application of shape similarity in pose selection and virtual screening in CSARdock2014 exercise. Journal of Chemical Information and Modeling, 56: (6), 965–973.

29 

Kumar, A., Zhang, K.Y.J. ((2018) ). Advances in the development of shape similarity methods and their application in drug discovery. Frontiers in Chemistry, 6: , 315.

30 

Lešnik, S., Štular, T., Brus, B., Knez, D., Gobec, S., Janežič, D., Konc, J. ((2015) ). LiSiCA: a software for ligand-based virtual screening and its application for the discovery of butyrylcholinesterase inhibitors. Journal of Chemical Information and Modeling, 55: (8), 1521–1528.

31 

Liu, X., Jiang, H., Li, H. ((2011) ). SHAFTS: a hybrid approach for 3D molecular similarity calculation. 1. method and assessment of virtual screening. Journal of Chemical Information and Modeling, 51: (9), 2372–2385.

32 

López-Ramos, M., Perruccio, F. ((2010) ). HPPD: Ligand- and target-based virtual screening on a herbicide target. Journal of Chemical Information and Modeling, 50: (5), 801–814.

33 

López-Ramos, M., Perruccio, F., Lo, M., Perruccio, F. ((2009) ). HPPD: ligand- and target-based virtual screening on a herbicide target. Journal of Chemical Information and Modeling, 50: (1), 801–814.

34 

Maccari, G., Jaeger, T., Moraca, F., Biava, M., Flohé, L., Botta, M. ((2011) ). A fast virtual screening approach to identify structurally diverse inhibitors of trypanothione reductase. Bioorganic and Medicinal Chemistry Letters, 21: (18), 5255–5258.

35 

Massarotti, A., Brunco, A., Sorba, G., Tron, G.C. ((2014) ). ZINClick: a database of 16 million novel, patentable, and readily synthesizable 1,4-disubstituted triazoles. Journal of Chemical Information and Modeling, 54: (2), 396–406.

36 

OpenEye Scientific Software (2019a). ROCS. Santa Fe, NM. www.eyesopen.com.

37 

OpenEye Scientific Software (2019b). VIDA 4.4.0.4. Santa Fe, NM. www.eyesopen.com.

38 

OpenEye Scientific Software (2019c). Zap Toolkit. Santa Fe, NM. www.eyesopen.com.

39 

Oyarzabal, J., Howe, T., Alcazar, J., Andrés, J.I., Alvarez, R.M., Dautzenberg, F., Iturrino, L., Martínez, S., Van der Linden, I. ((2009) ). Novel approach for chemotype hopping based on annotated databases of chemically feasible fragments and a prospective case study: new melanin concentrating hormone antagonists. Journal of Medicinal Chemistry, 52: (7), 2076–2089.

40 

Puertas-Martín, S., Redondo, J.L., Ortigosa, P.M., Pérez-Sánchez, H. ((2019) ). OptiPharm: an evolutionary algorithm to compare shape similarity. Scientific Reports, 9: (1), 1398.

41 

Schmidt, T.C., Cosgrove, D.A., Boström, J. (2018). ReFlex3D: refined flexible alignment of molecules using shape and electrostatics. Journal of Chemical Information and Modeling, 7–00618.

42 

Thomas, D.G., Chun, J., Chen, Z., Wei, G., Baker, N.A. ((2013) ). Parameterization of a geometric flow implicit solvation model. Journal of Computational Chemistry, 34: (8), 687–695.

43 

Tiikkainen, P., Markt, P., Wolber, G., Kirchmair, J., Distinto, S., Poso, A., Kallioniemi, O. ((2009) ). Critical comparison of virtual screening methods against the muv data set. Journal of Chemical Information and Modeling, 49: (10), 2168–2178.

44 

Tresadern, G., Bemporad, D., Howe, T. ((2009) ). A comparison of ligand based virtual screening methods and application to corticotropin releasing factor 1 receptor. Journal of Molecular Graphics and Modelling, 27: (8), 860–870.

45 

Wang, Z., Lu, Y., Seibel, W., Miller, D.D., Li, W. ((2009) ). Identifying novel molecular structures for advanced melanoma by ligand-based virtual screening. Journal of Chemical Information and Modeling, 49: (6), 1420–1427.

46 

Wishart, D.S., Feunang, Y.D., Guo, A.C., Lo, E.J., Marcu, A., Grant, J.R., Sajed, T., Johnson, D., Li, C., Sayeeda, Z., Assempour, N., Iynkkaran, I., Liu, Y., Maciejewski, A., Gale, N., Wilson, A., Chin, L., Cummings, R., Le, D., Pon, A., Knox, C., Wilson, M. ((2018) ). DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Research, 46: (D1), 1074–1082.

47 

Woodring, J.L., Bachovchin, K.A., Brady, K.G., Gallerstein, M.F., Erath, J., Tanghe, S., Leed, S.E., Rodriguez, A., Mensa-Wilmot, K., Sciotti, R.J., Pollastri, M.P. ((2017) ). Optimization of physicochemical properties for 4-anilinoquinazoline inhibitors of trypanosome proliferation. European Journal of Medicinal Chemistry, 141: , 446–459.

48 

Yan, X., Li, J., Liu, Z., Zheng, M., Ge, H., Xu, J. ((2013) ). Enhancing molecular shape comparison by weighted Gaussian functions. Journal of Chemical Information and Modeling, 53: (8), 1967–1978.

49 

Yuan, S., Chan, J.F.-W., Den-Haan, H., Chik, K.K.-H., Zhang, A.J., Chan, C.C.-S., Poon, V.K.-M., Yip, C.C.-Y., Mak, W.W.-N., Zhu, Z., Zou, Z., Tee, K.-M., Cai, J.-P., Chan, K.-H., de la Peña, J., Pérez-Sánchez, H., Cerón-Carrasco, J.P., Yuen, K.-Y. ((2017) ). Structure-based discovery of clinically approved drugs as Zika virus NS2B-NS3 protease inhibitors that potently inhibit Zika virus infection in vitro and in vivo. Antiviral Research, 145: , 33–43.