QUERY-BASED MOLECULE OPTIMIZATION AND APPLICATIONS TO FUNCTIONAL MOLECULE DISCOVERY

2022

Online Patent

Zugriff:

Zum Volltext

A query-based generic end-to-end molecular optimization (“QMO”) system framework, method and computer program product for optimizing molecules, such as for accelerating drug discovery. The QMO framework decouples representation learning and guided search and applies to any plug-in encoder-decoder with continuous latent representations. QMO framework directly incorporates evaluations based on chemical modeling, analysis packages, and pre-trained machine-learned prediction models for efficient molecule optimization using a query-based guided search method based on zeroth order optimization. The QMO features efficient guided search with molecular property evaluations and constraints obtained using the predictive models and chemical modeling and analysis packages. QMO tasks include optimizing drug-likeness and penalized log P scores with similarity constraints and improving the target binding affinity of existing drugs to pathogens such as the SARS-CoV-2 main protease protein while preserving the desired drug properties. QMO tasks further improves optimizing antimicrobial peptides toward lower toxicity.

Titel:	QUERY-BASED MOLECULE OPTIMIZATION AND APPLICATIONS TO FUNCTIONAL MOLECULE DISCOVERY
Link:	Zum Volltext
Veröffentlichung:	2022
Medientyp:	Patent
Sonstiges:	Nachgewiesen in: USPTO Patent Applications Sprachen: English Document Number: 20220076137 Publication Date: March 10, 2022 Appl. No: 17/016640 Application Filed: September 10, 2020 Claim: 1. A query-based molecule optimization method comprising: modifying, by the at least one hardware processor, a sequence structure corresponding to a molecule to be optimized; running, by the at least one hardware processor, a plurality of machine learned prediction models for said modified sequence structure for predicting a respective plurality of properties of a molecule corresponding to said modified sequence structure and generating loss values as a measure of differences between respective plurality of property predictors and a corresponding respective plurality of specified threshold constraints, and using the generated loss values as a guide for further modifying said sequence structure for evaluation of the respective plurality of predicted properties; and determining by the at least one hardware processor whether said each of the plurality of properties predicted for said corresponding further modified sequence structure satisfy all said corresponding respective plurality of specified threshold constraints; and determining the corresponding further modified sequence structure as an optimized original molecule when each of the plurality of properties predicted for said further modified sequence structure satisfies all said respective plurality of specified threshold constraints. Claim: 2. The method as claimed in claim 1, wherein said sequence structure of said original molecule is a 1-dimensional sequence of symbols, said method further comprising: encoding, by the at least one hardware processor, the 1-dimensional sequence of symbols by mapping said 1-dimensional sequence of symbols to a data vector in a latent representation vector space, said data vector comprising a latent representation of the 1-dimensional sequence of symbols at a reduced dimension. Claim: 3. The method as claimed in claim 2, wherein said modifying said sequence structure comprises: adding a perturbation to said data vector, said perturbation comprising a random vector; said method further comprising: decoding said modified data vector to obtain a new modified sequence structure corresponding to the original molecule for said evaluation of the respective plurality of predicted properties. Claim: 4. The method as claimed in claim 3, wherein said generating loss values comprises solving, using the at least one hardware processor, a loss function. Claim: 5. The method as claimed in claim 4, wherein said loss function is formulated to: optimize a molecular similarity to said original molecule while satisfying desired chemical properties, said loss function comprising: a first function term quantifying a property validity loss to be minimized and a second function term quantifying a molecular similarity score to be maximized; or optimize chemical properties of said original molecule while satisfying similarity constraints, said loss function comprising: a first function term quantifying a molecular constraint loss to be minimized and a second function term quantifying a molecular property score to be maximized. Claim: 6. The method as claimed in claim 4, wherein said solving the loss function comprises: obtaining an objective function comprising loss value terms resulting from solving said loss function; and performing a zeroth order gradient descent to solve said objective function; and obtaining a pseudo gradient loss value estimate from solving said objective function. Claim: 7. The method as claimed in claim 6, further comprising: applying one or more random direction queries using said random vector from said latent representation vector space to obtain the pseudo gradient loss value estimate; and updating an iterate of said objective function as a function of the loss value of the current iteration and said pseudo gradient loss value estimate. Claim: 8. A non-transitory computer readable medium comprising instructions that, when executed by at least one hardware processor, configure the at least one hardware processor to: modify a sequence structure corresponding to a molecule to be optimized; run a plurality of machine learned prediction models for said modified sequence structure for predicting a respective plurality of properties of a molecule corresponding to said modified sequence structure and generate loss values as a measure of differences between a respective plurality of property predictors and a corresponding respective plurality of specified threshold constraints, and using the generated loss values as a guide for further modifying said sequence structure for evaluation of the respective plurality of predicted properties; determining whether said each of the plurality of properties predicted for the corresponding further modified sequence structure satisfy all said corresponding respective plurality of specified threshold constraints; and determine the corresponding further modified sequence structure as an optimized original molecule when each of the plurality of properties predicted for said further modified sequence structure satisfies all said respective plurality of specified threshold constraints. Claim: 9. The non-transitory computer readable medium as claimed in claim 8, wherein said sequence structure of said original molecule is a 1-dimensional sequence of symbols, said instructions further configure the at least one hardware processor to: encode the 1-dimensional sequence of symbols by mapping said 1-dimensional sequence of symbols to a data vector in a latent representation vector space, said data vector comprising a latent representation of the 1-dimensional sequence of symbols at a reduced dimension. Claim: 10. The non-transitory computer readable medium as claimed in claim 9, wherein to modify said sequence structure, said instructions further configure the at least one hardware processor to: add a perturbation to said data vector, said perturbation comprising a random vector; and said instructions further configuring the at least one hardware processor to: decode said modified data vector to obtain a new modified sequence structure corresponding to the original molecule for said evaluation of the respective plurality of predicted properties. Claim: 11. The non-transitory computer readable medium as claimed in claim 10, wherein to generate said loss values, said instructions further configure the at least one hardware processor to: solve a loss function. Claim: 12. The non-transitory computer readable medium as claimed in claim 11, wherein said loss function is formulated to: optimize a molecular similarity to said original molecule while satisfying desired chemical properties, said loss function comprising: a first function term quantifying a property validity loss to be minimized and a second function term quantifying a molecular similarity score to be maximized; or optimize chemical properties of said original molecule while satisfying similarity constraints, said loss function comprising: a first function term quantifying a molecular constraint loss to be minimized and a second function term quantifying a molecular property score to be maximized. Claim: 13. The non-transitory computer readable medium as claimed in claim 11, wherein to solve the loss function, the instructions further configure the at least one hardware processor to: obtain an objective function comprising loss value terms resulting from solving said loss function; and perform a zeroth order gradient descent to solve said objective function; and obtain a pseudo gradient loss value estimate from solving said objective function. Claim: 14. The non-transitory computer readable medium as claimed in claim 13, wherein the instructions further configure the at least one hardware processor to: apply one or more random direction queries using said random vector from said latent representation vector space to obtain the pseudo gradient loss value estimate; and update an iterate of said objective function as a function of the loss value of the current iteration and said pseudo gradient loss value estimate. Claim: 15. A computer-implemented query-based molecule optimization system comprising: a memory storage device; and a hardware processor coupled to said memory storage device and configured to perform a method to: modify a sequence structure corresponding to a molecule to be optimized; run a plurality of machine learned prediction models for said modified sequence structure for predicting a respective plurality of properties of a molecule corresponding to said modified sequence structure and generate loss values as a measure of differences between a respective plurality of property predictors and a corresponding respective plurality of specified threshold constraints, and using the generated loss values as a guide for further modifying said sequence structure for evaluation of the respective plurality of predicted properties; determine whether said each of the plurality of properties predicted for the corresponding further modified sequence structure satisfy all said corresponding respective plurality of specified threshold constraints; and determine the corresponding further modified sequence structure as an optimized original molecule when each of the plurality of properties predicted for said further modified sequence structure satisfies all said respective plurality of specified threshold constraints. Claim: 16. The computer-implemented system as claimed in claim 15, wherein said sequence structure of said original molecule is a 1-dimensional sequence of symbols, said hardware processor further configured to: encode the 1-dimensional sequence of symbols by mapping said 1-dimensional sequence of symbols to a data vector in a latent representation vector space, said data vector comprising a latent representation of the 1-dimensional sequence of symbols at a reduced dimension. Claim: 17. The computer-implemented system as claimed in claim 16, wherein said modifying said sequence structure comprises: adding a perturbation to said data vector, said perturbation comprising a random vector; and said hardware processor is further configured to: decode said further modified data vector to obtain a new modified sequence structure corresponding to the original molecule for said evaluation of the respective plurality of predicted properties. Claim: 18. The computer-implemented system as claimed in claim 17, wherein to generate said loss values, said hardware processor is further configured to: solve a loss function, said loss function is formulated to: optimize a molecular similarity to said original molecule while satisfying desired chemical properties, said loss function comprising: a first function term quantifying a property validity loss to be minimized and a second function term quantifying a molecular similarity score to be maximized; or optimize chemical properties of said original molecule while satisfying similarity constraints, said loss function comprising: a first function term quantifying a molecular constraint loss to be minimized and a second function term quantifying a molecular property score to be maximized. Claim: 19. The computer-implemented system as claimed in claim 18, wherein to solve the loss function, the hardware processor is further configured to: obtain an objective function comprising loss value terms resulting from solving said loss function; and perform a zeroth order gradient descent to solve said objective function; and obtain a pseudo gradient loss value estimate from solving said objective function. Claim: 20. The computer-implemented system as claimed in claim 19, wherein the hardware processor is further configured to: apply one or more random direction queries using said random vector from said latent representation vector space to obtain the pseudo gradient loss value estimate; and update an iterate of said objective function as a function of the loss value of the current iteration and said pseudo gradient loss value estimate. Current International Class: 06; 06; 06

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.