论文标题
FedVLN:保存隐私的联合视觉和语言导航
FedVLN: Privacy-preserving Federated Vision-and-Language Navigation
论文作者
论文摘要
数据隐私是可以感知环境,与人类交流并在现实世界中采取行动的体现代理的核心问题。在帮助人类完成任务的同时,代理商可能会观察和处理用户的敏感信息,例如房屋环境,人类活动等。在这项工作中,我们引入了隐私保护的体现的代理人学习,以实现视觉和语言导航的任务(VLN),在该任务中,通过遵循自然语言指导,一个体现的代理商在其中进行了体现的代理商。我们将每个房屋环境视为本地客户,除了与云服务器和其他客户端共享本地更新,并提出了一个新颖的联合视觉和语言导航(FIDVLN)框架,以保护培训和培训期间的数据隐私。特别是,我们提出了一种分散的培训策略,以将每个客户的数据限制在其本地模型培训中,并采用联合的预探测方法来进行部分模型聚合,以提高模型的通用性,以使其对环境看不见。 R2R和RXR数据集的广泛结果表明,在我们的FedVLN框架下,分散的VLN模型在集中式培训的同时,在保护可见的环境隐私的同时,取得了可比较的结果,并且联合的预探测明显超过了集中的预探测,同时保留了不见了的环境隐私。
Data privacy is a central problem for embodied agents that can perceive the environment, communicate with humans, and act in the real world. While helping humans complete tasks, the agent may observe and process sensitive information of users, such as house environments, human activities, etc. In this work, we introduce privacy-preserving embodied agent learning for the task of Vision-and-Language Navigation (VLN), where an embodied agent navigates house environments by following natural language instructions. We view each house environment as a local client, which shares nothing other than local updates with the cloud server and other clients, and propose a novel federated vision-and-language navigation (FedVLN) framework to protect data privacy during both training and pre-exploration. Particularly, we propose a decentralized training strategy to limit the data of each client to its local model training and a federated pre-exploration method to do partial model aggregation to improve model generalizability to unseen environments. Extensive results on R2R and RxR datasets show that under our FedVLN framework, decentralized VLN models achieve comparable results with centralized training while protecting seen environment privacy, and federated pre-exploration significantly outperforms centralized pre-exploration while preserving unseen environment privacy.