Abstract: Gradient descent is prevalent for large scale optimization problems in machine learning, especially its major role is computing and correcting the connection strength of neural network in deep learning. However, choosing a proper learning rate for SGD can be difficult. A too small rate may lead to painfully slow convergence, while too large one would hinder convergence. In this paper, we present a novel variance reduction technique which applies the moving average of gradient termed SMVRG. SMVRG can take a large learning rate by using variance reduction technique. And, we only need to preserve current gradient and the previous average gradient. Our method is employed to Long Short-Term Memory (LSTM). The experiment on two data sets, the IMDB (movie reviews) and SemEval-2016 (sentiment analysis in twitter) shows our method can improve the results significantly.