|M.Sc Student||Harel Shahar|
|Subject||Prototype-Based Chemical Design using Diversity-Driven|
|Department||Department of Computer Science||Supervisors||Professor Shaul Markovitch|
|Dr. Kira Radinsky|
|Full Thesis text|
As the space of potential molecules for pharmacological treatment is literally infinite, designing a
new drug is an expensive and lengthy process. A common technique during drug discovery is
to start from a molecule which already has some of the desired properties. An interdisciplinary
team of scientists generates hypothesis about the required changes to the prototype. We call this
process a prototype-driven hypothesis generation.
In this work, we develop an algorithmic unsupervised approach for prototype-driven hypothesis
generation. Our method is inspired by the known analogy between a chemist understanding of a
compound and a language speaker understanding of a word (“Atoms are letters, molecules are
the words, supramolecular entities are the sentences and the chapters” [Jean-Marie Lehn 1995]),
which motivates the potential of Natural Language Processing for Computational Chemistry.
More formally, we design a conditional deep generative model for molecule generation with
The model operates on a given molecule prototype and generates various molecules as candidates. The generated molecules should be novel and share desired properties with the prototype. Our model extends Variational Autoencoders to allow a conditional diverse sampling - sampling an example from the data distribution (drug-like molecules) which is closer to a given input. This allows sampling molecules closer to a prototype molecule, and thus increase probability of generating a valid drug with similar characteristics. Additionally, we add a diversity component that introduce parametrized diversity into the generation process, to allow the sampling to generate novelty with respect to the prototype.
We show that the molecules generated by the system are valid molecules which simultaneously
have strong connection to the prototype and are novel. In addition, we suggest several ranking
functions for the generated molecule population.
Out of the compounds generated by the system, we identified 35 FDA-approved drugs. As
an example, our system generated Isoniazid - one of the main drugs for Tuberculosis.