论文标题
使用集合分类器自动化单标签专利分类
Automated Single-Label Patent Classification using Ensemble Classifiers
论文作者
论文摘要
每天有成千上万的专利申请到世界各地的专利办公室。提交专利申请时,一项重要的子任务是从复杂和分层的专利分类方案中分配一个或多个分类代码,该计划将使专利申请将专利申请的路由路由到对特定技术领域知识渊博的专利审查员。该任务通常是专利专业人员承担的,但是由于申请的数量大量和发明的潜在复杂性,通常会淹没它们。因此,需要根据将专利专业人员准确地对专利申请进行分类的分类系统的支持,甚至可以完全自动化此代码分配手册任务。就像在许多其他文本分析问题中一样,在过去的几年中,使用单词嵌入和深度学习技巧研究了这项智力上要求的任务。在本文中,我们很快就会使用有关子类水平的自动专利分类的不同特征表示,并使用类似的深度学习技术进行了相似的深度学习技术。最重要的是,我们提出了一种创新的合奏分类器方法,该合奏分类器接受了专利文档的不同部分的培训。据我们所知,这是第一次为专利分类问题提出合奏方法。我们的第一个结果非常有前途的结果表明,分类器的集合体系结构使用与独立解决方案相同的分类器明显优于当前最新技术。
Many thousands of patent applications arrive at patent offices around the world every day. One important subtask when a patent application is submitted is to assign one or more classification codes from the complex and hierarchical patent classification schemes that will enable routing of the patent application to a patent examiner who is knowledgeable about the specific technical field. This task is typically undertaken by patent professionals, however due to the large number of applications and the potential complexity of an invention, they are usually overwhelmed. Therefore, there is a need for this code assignment manual task to be supported or even fully automated by classification systems that will classify patent applications, hopefully with an accuracy close to patent professionals. Like in many other text analysis problems, in the last years, this intellectually demanding task has been studied using word embeddings and deep learning techniques. In this paper we shortly review these research efforts and experiment with similar deep learning techniques using different feature representations on automatic patent classification in the level of sub-classes. On top of that, we present an innovative method of ensemble classifiers trained with different parts of the patent document. To the best of our knowledge, this is the first time that an ensemble method was proposed for the patent classification problem. Our first results are quite promising showing that an ensemble architecture of classifiers significantly outperforms current state-of-the-art techniques using the same classifiers as standalone solutions.