Large Scale Distributed Deep Networks 中译文
Scaling Distributed Machine Learning with the Parameter Server
Communication Efficient Distributed Machine Learning with the Parameter Server
More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server
An overview of gradient descent optimization algorithms
各种优化方法总结比较(sgd/momentum/Nesterov/adagrad/adadelta)