论文标题

自然语言单位之间的相似性:从粗糙到罚款的过渡

Similarity between Units of Natural Language: The Transition from Coarse to Fine Estimation

论文作者

Mu, Wenchuan

论文摘要

捕获人类语言单位之间的相似性对于解释人类如何关联不同的对象至关重要,因此其计算已受到广泛的关注,研究和应用。随着我们周围不断增加的信息量,计算相似性变得越来越复杂,尤其是在许多情况下,例如法律或医疗事务,衡量相似性需要额外的谨慎和精确度,因为语言单位中的小行为可以产生重大的现实效果。我在本论文中的研究目标是开发回归模型,以更精致的方式解释语言单位之间的相似性。 相似性的计算已经走了很长一段路,但是调试措施的方法通常基于不断拟合人类的判断值。为此,我的目标是开发一种算法,该算法准确地在相似性计算中引起了漏洞。此外,大多数方法对它们计算的相似性具有模糊的定义,并且通常很难解释。提出的框架解决了这两个缺点。它通过捕获不同的漏洞来不断改善模型。此外,该模型的每一个完善都提供了合理的解释。本文中引入的回归模型称为逐步完善的相似性计算,该计算将攻击测试与对抗训练结合在一起。本文的相似性回归模型在处理边缘案例中实现了最新的性能。

Capturing the similarities between human language units is crucial for explaining how humans associate different objects, and therefore its computation has received extensive attention, research, and applications. With the ever-increasing amount of information around us, calculating similarity becomes increasingly complex, especially in many cases, such as legal or medical affairs, measuring similarity requires extra care and precision, as small acts within a language unit can have significant real-world effects. My research goal in this thesis is to develop regression models that account for similarities between language units in a more refined way. Computation of similarity has come a long way, but approaches to debugging the measures are often based on continually fitting human judgment values. To this end, my goal is to develop an algorithm that precisely catches loopholes in a similarity calculation. Furthermore, most methods have vague definitions of the similarities they compute and are often difficult to interpret. The proposed framework addresses both shortcomings. It constantly improves the model through catching different loopholes. In addition, every refinement of the model provides a reasonable explanation. The regression model introduced in this thesis is called progressively refined similarity computation, which combines attack testing with adversarial training. The similarity regression model of this thesis achieves state-of-the-art performance in handling edge cases.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源