Bayesian aggregation has many good characteristics in both theory and practice, which is proved more stable and flexible than single model selection. However, for large models, the optimization and inference of posterior models are resource-intensive from a practical view. Thus, this work considers a general framework to perform Bayesian aggregation on over-parametrized models, especially for neural networks. In particular, rather than using explicit Gibbs distribution in conventional models, we leverage the samples from Monte Carlo Markov Chain (MCMC) process of Langevin-like dynamics with anisotropic noise and aggregate models by recalibrating training data. With different noise shape, the corresponding posterior has some virtues on over-parametrized setting. Moreover, recalibration techniques can be conducted to helps us to obtain an efficient well-calibrated model at inference time.
更多科大概覽