Assisted Learning: A Framework for Multi-Organization Learning

Abstract

In an increasing number of AI scenarios, collaborations among different organizations or agents (e.g., human and robots, mobile units) are often essential to accomplish an organization-specific mission. However, to avoid leaking useful and possibly proprietary information, organizations typically enforce stringent security constraints on sharing modeling algorithms and data, which significantly limits collaborations. In this work, we introduce the Assisted Learning framework for organizations to assist each other in supervised learning tasks without revealing any organization’s algorithm, data, or even task. An organization seeks assistance by broadcasting task-specific but nonsensitive statistics and incorporating others’ feedback in one or more iterations to eventually improve its predictive performance. Theoretical and experimental studies, including real-world medical benchmarks, show that Assisted Learning can often achieve near-oracle learning performance as if data and training processes were centralized.

A Solution for Assisted Regression

Assisted Regression: In the regression learning context, a general Assisted learning protocol was developed for the following scenario. Suppose agents A and B are each equipped with a unique data modality over a particular population of subjects. Each of them also has a private modeling process. Agent A intends to be assisted by B.

In the training stage, agent A will first use its own algorithm and data to fit a regression model, denoted by ‘a1’. Agent A then sends the fitted residual and the associated non-private data ID (e.g., timestamp, email, and username) to agent B. Agent B collates the residual with his local data and uses the residual as the labels to fit a private model ‘b1’. Then, agent B sends the residual back to agent A for the next round of training. The procedure continues until A no longer needs assistance, for example, when the predictive performance on a set of validation data is good enough.

So how does B assist A in the prediction stage? In the prediction stage, for any future object observed by both A and B, agent A will apply the locally trained models ‘a1,’ ‘a2,’ …, and add their prediction results, and so does B. Agent A will need the part from B to produce the final prediction result (usually by addition). It is noteworthy that in the above two stages, B never needs to know the objective (namely the original label y) of A. Both agents can use their private models to perform local training, and their training data (X) are not shared.

It was shown in many cases that A can often achieve near-oracle performance, as if private data and models were centralized and a model selection technique were applied. The reason why it works can be illustrated by a simple scenario where both A and B use linear models. A few rounds in the learning stage is to iteratively project the task to A and B’s column spaces. It can be proved that this is asymptotically equivalent to the projection onto the union space of A and B as if data were centralized.

References

Xian, Xun, Xinran Wang, Jie Ding, and Reza Ghanadan. “Assisted learning: A framework for multi-organization learning.” Advances in neural information processing systems 33 (2020): 14580-14591. [DOC]