Molecular Design Method based on New Molecular Representation and Variational Auto-encoder


Li Kai, LiNing, Zhang Wei, Gao Ming, Beijing Information Science and Technology University, China


Based on the traditional VAE, a novel neural network model is presented, with the latest molecular representation, SELFIES, to improve the effect of generating new molecules. In this model, multi-layer convolutional network and Fisher information are added to the original encoding layer to learn the data characteristics and guide the encoding process, which makes the features of the data hiding layer more aggregated, and integrates the Long Short Term Memory neural network (LSTM) into the decoding layer for better data generation, which effectively solves the degradation phenomenon generated by the encoding layer and decoding layer of the original VAE model. Through experiments on zinc molecular data sets, it is found that the similarity in the new VAE is 8.47% higher than that of the original ones. SELFIES are better at generating a variety of molecules than the traditional molecular representation, SELFIES. Experiments have shown that using SELFIES and the new VAE model presented in this paper can improve the effectiveness of generating new molecules.


VAE, Molecular notation, Multilayer convolutional network, Fisher information, LSTM.

Full Text  Volume 13, Number 3