paper

Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data

Jialei Liu, Jun Liao, Kuangnan Fang

Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data

Name: Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data
Author: Jialei Liu, Jun Liao, Kuangnan Fang

Jialei Liu, Jun Liao, Kuangnan Fang

Paper2025-11-14English

Start Reading

Description

Positive-Unlabeled (PU) learning presents unique challenges due to the lack of explicitly labeled negative samples, particularly in high-stakes domains such as fraud detection and medical diagnosis. To address data scarcity and privacy constraints, we propose a novel transfer learning with model averaging framework that integrates information from heterogeneous data sources - including fully binary labeled, semi-supervised, and PU data sets - without direct data sharing. For each source domain type, a tailored logistic regression model is conducted, and knowledge is transferred to the PU target domain through model averaging. Optimal weights for combining source models are determined via a cross-validation criterion that minimizes the Kullback-Leibler divergence. We establish theoretical guarantees for weight optimality and convergence, covering both misspecified and correctly specified target models, with further extensions to high-dimensional settings using sparsity-penalized estimators. Extensive simulations and real-world credit risk data analyses demonstrate that our method outperforms other comparative methods in terms of predictive accuracy and robustness, especially under limited labeled data and heterogeneous environments.