Parallel Assisted Learning - Assisted Learning

Abstract

In the era of big data, a population’s multimodal data are often collected and preserved by different business and government entities. These entities often have their local machine learning data, models, and tasks that they cannot share with others. Meanwhile, an entity often needs to seek assistance from others to enhance its learning quality without sharing proprietary information. How can an entity be assisted while it is assisting others? We develop a general method called Parallel Assisted Learning that applies to the context where entities perform supervised learning and can collate their data according to a common data identifier. Under the PAL mechanism, a learning entity that receives assistance is obligated to assist others without the need to reveal any entity’s local data, model, and learning objective. Consequently, each entity can significantly improve its particular task. The applicability of the proposed approach is demonstrated by data experiments.

An obligation to assist others

Our work is thus motivated by the following question. How to bind each entity to assist others while it is being assisted?

We will develop a general method called Parallel Assisted Learning (PAL) to realize such reciprocal learning in this work. PAL will enable collaborations among heterogeneous entities (with distinct and private objectives) to accomplish entity-specific missions. Our work was inspired by Pareto Improvement , a notion in economics that concerns the improvement of individuals’ utilities without harming else. An essential characteristic of PAL is that the assistance allows each entity to have its unique input data, model, and objective (task labels), which are not required to be shared. This renders its fundamental difference from existing distributed learning frameworks such as Federated Learning, which need to operate on a globally shared model and objective. Two examples are provided below to illustrate the application scenarios.

Two clinics in a city hold features collected from the same set of patients. The data can be collated by a non-private patient ID in hindsight. One clinic holds features from lab tests, and the other holds pharmaceutical features. Due to limited infrastructures or regulation constraints, they cannot migrate data to each other. Also, each of them has a private learning task. They will invoke the PAL protocol to improve their single-clinic performance. Ideally, they will achieve near-oracle performance as if data were centralized and the tasks could be transparent.
Two organizations collect surveys from the same cohort of mobile users. The user data can be collated by a non-private username. One collects economic features such as working hours and salary level, and the other focuses on demographic information such as gender and age. They may assist with each other’s learning tasks in a privacy-preserving manner.

In general, PAL is expected to be useful in one or more of the following scenarios. First, an entity does not want to provide dedicated resources (such as machines or cloud-based interfaces) to assist others before it sees the performance gain from being assisted. Second, an entity favors such an incentive that collaborators will not cheat during both training and prediction stages, which may cause terminated collaboration and hinder everyone’s performance gain otherwise. Third, an entity without an established reputation can still collaborate with known brands in an economical way.

References

Censor, Yair. “Pareto optimality in multiobjective problems.” Applied Mathematics and Optimization 4, no. 1 (1977): 41-59.
Wang, Xinran, Jiawei Zhang, Mingyi Hong, Yuhong Yang, and Jie Ding. “Parallel Assisted Learning.” IEEE Transactions on Signal Processing 70 (2022): 5848-5858. [DOC]