PHARMACEUTICAL DATA MINING
Approaches and Applications for Drug Discovery
Edited by
KONSTANTIN V. BALAKIN
Institute of Physiologically Active Compounds
Russian Academy of Sciences
Pharmaceutical drug discovery and development have historically followed a
sequential process in which relatively small numbers of individual compounds
were synthesized and tested for bioactivity. The information obtained from
such experiments was then used for optimization of lead compounds and their
further progression to drugs. For many years, an expert equipped with the
simple statistical techniques of data analysis was a central fi gure in the analysis
of pharmacological information. With the advent of advanced genome and
proteome technologies, as well as high - throughput synthesis and combinatorial
screening, such operations have been largely replaced by a massive parallel
mode of processing, in which large - scale arrays of multivariate data are
analyzed. The principal challenges are the multidimensionality of such data
and the effect of “ combinatorial explosion. ” Many interacting chemical,
genomic, proteomic, clinical, and other factors cannot be further considered
on the basis of simple statistical techniques. As a result, the effective analysis
of this information - rich space has become an emerging problem. Hence, there
is much current interest in novel computational data mining approaches that
may be applied to the management and utilization of the knowledge obtained
from such information - rich data sets. It can be simply stated that, in the era
of post - genomic drug development, extracting knowledge from chemical, biological,
and clinical data is one of the biggest problems. Over the past few
years, various computational concepts and methods have been introduced to
extract relevant information from the accumulated knowledge of chemists,
biologists, and clinicians and to create a robust basis for rational design of
novel pharmaceutical agents.
Refl ecting the needs, the present volume brings together contributions
from academic and industrial scientists to address both the implementation o
new data mining technologies in the pharmaceutical industry and the challenges
they currently face in their application. The key question to be answered
by these experts is how the sophisticated computational data mining techniques
can impact the contemporary drug discovery and development.
In reviewing specialized books and other literature sources that address
areas relevant to data mining in pharmaceutical research, it is evident that
highly specialized tools are now available, but it has not become easier for
scientists to select the appropriate method for a particular task. Therefore,
our primary goal is to provide, in a single volume, an accessible, concentrated,
and comprehensive collection of individual chapters that discuss the most
important issues related to pharmaceutical data mining, their role, and possibilities
in the contemporary drug discovery and development. The book
should be accessible to nonspecialized readers with emphasis on practical
application rather than on in - depth theoretical issues.
The book covers some important theoretical and practical aspects of pharmaceutical
data mining within fi ve main sections:
• a general overview of the discipline , from its foundations to contemporary
industrial applications and impact on the current and future drug
discovery;
• chemoinformatics - based applications , including selection of chemical
libraries for synthesis and screening, early evaluation of ADME/Tox and
physicochemical properties, mining high - throughput screening data, and
employment of chemogenomics - based approaches;
• bioinformatics - based applications , including mining the gene expression
data, analysis of protein – ligand interactions, analysis of toxicogenomic
databases, and vaccine development;
• data mining methods in clinical development , including data mining in
pharmacovigilance, predicting individual drug response, and data mining
methods in pharmaceutical formulation;
• data mining algorithms, technologies, and software tools , with emphasis
on advanced data mining algorithms and software tools that are currently
used in the industry or represent promising approaches for future drug
discovery and development, and analysis of resources available in special
databases, on the Internet and in scientifi c literature.
It is my sincere hope that this volume will be helpful and interesting not
only to specialists in data mining but also to all scientists working in the fi eld
of drug discovery and development and associated industries.
Konstantin V. Balakin