beat365体育亚洲官网在线下载 - 歡迎您!


BDAI重点实验室研究生沙龙第29期:Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models


BDAI 299(1).jpg

报告标题Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models



报告摘要:The state-of-the-art Mixture-of-Experts (short as MoE) architecture has achieved several remarkable successes in terms of increasing model capacity. However, MoE has hindered widespread adoption due to complexity, memory consumption, and training instability. Here we propose to construct a novel parameter-efficient MoE architecture by sharing information from different experts. Specifically, we use matrix product operators (MPO, a tensor decomposition from quantum many-body physics) to reconstruct the parameter matrix in the expert layer, and increase model capacity for pre-trained language models by sharing the central tensor (containing the core information) among different experts and keeping the auxiliary tensor (complementing the central tensor) of different experts. We also design the gradient mask strategy for the tensor structure of MPO to alleviate the overfitting problem. Extensive experiments based on T5 and GPT show improved performance and efficiency in increasing pre-trained language model capacity (27.2x fewer parameters for the comparable model performance, compared with the Switch Transformers). We additionally demonstrate an improvement in the positive transfer effects of our approach for multi-task learning.

报告标题:Unbiased Sequential Recommendation with Latent Confounders



报告摘要:Sequential recommendation holds the promise of understanding user preference by capturing successive behavior correlations. Existing research focus on designing different models for better fitting the offline datasets. However, the observational data may have been

contaminated by the exposure or selection biases, which renders the learned sequential models unreliable. In order to solve this fundamental problem, in this paper, we propose to reformulate the sequential recommendation task with the potential outcome framework, where we are able to clearly understand the data bias mechanism and correct it by re-weighting the training instances with the inverse propensity score (IPS). For more robustness modeling, a clipping strategy is applied to the IPS estimation to reduce the variance of the learning objective. To make our framework more practical, we design a parameterized model to remove the impact of the potential latent confounders. At last, we theoretically analyze the unbiasedness of the proposed framework under both vanilla and clipping IPS estimations. To the best of our knowledge, this is the first work on debiased sequential recommendation. We conduct extensive experiment based on both synthetic and real-world datasets to demonstrate the effectiveness of our framework.