论文标题
撒哈拉以南非洲的开放式街头地图健康诊所和学校分类的主题建模方法
A Topic Modeling Approach to Classifying Open Street Map Health Clinics and Schools in Sub-Saharan Africa
论文作者
论文摘要
数据剥夺或缺乏有关个人福祉的易于获得和可行的信息,对发展中国家是一个重大挑战,也是旨在减轻贫困的政策的设计和运营障碍。在本文中,我们探讨了从OpenStreetMap到代理的数据的适用性,可用于两个关键公共服务的位置:学校和卫生诊所。由于成千上万的数字人类人士的努力,OpenStreetMap之类的在线映射存储库包含数以百万计的建筑物和其他结构上的记录,可以描绘出其位置和使用。不幸的是,这些数据中的许多数据都锁定在复杂的,非结构化的文本中,这似乎不适合对学校或诊所进行分类。我们应用一种可扩展的,无监督的学习方法来无标记的OpenStreetMap构建数据,以提取非洲十个国家 /地区的学校和卫生诊所的位置。我们发现主题建模方法极大地提高了性能与仅依赖结构化密钥。我们通过将OSM方法与WHO确定的学校和诊所确定的学校和诊所进行比较来验证我们的结果,并更广泛地描述OSM覆盖范围。
Data deprivation, or the lack of easily available and actionable information on the well-being of individuals, is a significant challenge for the developing world and an impediment to the design and operationalization of policies intended to alleviate poverty. In this paper we explore the suitability of data derived from OpenStreetMap to proxy for the location of two crucial public services: schools and health clinics. Thanks to the efforts of thousands of digital humanitarians, online mapping repositories such as OpenStreetMap contain millions of records on buildings and other structures, delineating both their location and often their use. Unfortunately much of this data is locked in complex, unstructured text rendering it seemingly unsuitable for classifying schools or clinics. We apply a scalable, unsupervised learning method to unlabeled OpenStreetMap building data to extract the location of schools and health clinics in ten countries in Africa. We find the topic modeling approach greatly improves performance versus reliance on structured keys alone. We validate our results by comparing schools and clinics identified by our OSM method versus those identified by the WHO, and describe OSM coverage gaps more broadly.