GAL: Gradient Assisted Learning for Decentralized Multi-Organization Collaborations

Abstract

Collaborations among multiple organizations, such as financial institutions, medical centers, and retail markets in decentralized settings are crucial to providing improved service and performance. However, the underlying organizations may have little interest in sharing their local data, models, and objective functions. These requirements have created new challenges for multi-organization collaboration. In this work, we propose Gradient Assisted Learning (GAL), a new method for multiple organizations to assist each other in supervised learning tasks without sharing local data, models, and objective functions. In this framework, all participants collaboratively optimize the aggregate of local loss functions, and each participant autonomously builds its own model by iteratively fitting the gradients of the overarching objective function. We also provide asymptotic convergence analysis and practical case studies of GAL. Experimental studies demonstrate that GAL can achieve performance close to centralized learning when all data, models, and objective functions are fully disclosed.

Toward a Community of Shared Interest

The main idea of Gradient Assisted Learning (GAL) is outlined below. In the training stage, the organization to be assisted, denoted by Alice, will calculate a set of ‘residuals’ and broadcast these to other organizations. These residuals approximate the fastest direction of reducing the training loss in hindsight. Subsequently, other organizations will fit the residuals using their local data, models, and objective functions and send the fitted values back to Alice. Alice will then assign weights to each organization to best approximate the fastest direction of learning. Next, Alice will line-search for the optimal gradient assisted learning rate along the calculated direction of learning. The above procedure is repeated until Alice accomplishes sufficient learning. In the inference stage, other organizations will send their locally predicted values to Alice, who will then assemble them to generate the final prediction. We show that the number of assistance rounds needed to attain the centralized performance is often small (e.g., fewer than ten). This is appealing since GAL is primarily developed for large organizations with rich computation resources. A small number of interactions will reduce communication and networking costs.

Our main contributions are summarized below.

We propose a Gradient Assisted Learning (GAL) algorithm that is suitable for large-scale autonomous decentralized learning. It can effectively exploit task-relevant information preserved by vertically decentralized organizations. Our method enables simultaneous collaboration among organizations without sharing data, models, and objective functions.
We provide asymptotic convergence analysis and practical case studies of GAL. For the case of vertically distributed data, GAL generalizes the classical Gradient Boosting algorithm.
Our proposed method can significantly outperform learning baselines and achieve near oracle performance on various benchmark datasets. Compared with existing works, GAL does not need frequent synchronization of organizations. It also significantly reduces the computation and communication overhead.

Organizations in our learning framework form a shared community of interest. Each serviceproviding organization can provide end-to-end assistance for an organization without sharing anyone’s proprietary data, models, and objective functions. In practice, the participating organizations may receive financial rewards from the one to assist. Moreover, every organization in this framework can provide its own task and seek help from others. As a result, all organizations become mutually beneficial to each other.

References

Diao, Enmao, Jie Ding, and Vahid Tarokh. “GAL: Gradient Assisted Learning for Decentralized Multi-Organization Collaborations.” Advances in Neural Information Processing Systems (NeurIPS). [DOC]