论文标题
使用动态主题模型的技术进度分析技术术语来修改专利分类代码
Technical Progress Analysis Using a Dynamic Topic Model for Technical Terms to Revise Patent Classification Codes
论文作者
论文摘要
日本专利被分配了日本独有的专利分类代码FI(文件索引)。 FI是IPC(国际专利分类法规)的一个细分,与日本技术有关。修改了FIS以跟上技术发展。自2006年以来,这些修订已经建立了30,000多个新FI。但是,这些修订需要大量时间和工作量。此外,这些修订不是自动化的,因此效率低下。因此,使用机器学习来协助修订专利分类代码(FI)将提高准确性和效率。这项研究从这个新的角度分析了专利文件,该文档通过机器学习协助修订专利分类代码。为了分析专利的时间序列变化,我们使用了动态主题模型(DTM),这是潜在的Dirichlet分配(LDA)的扩展。另外,与英语不同,日语需要形态学分析。专利包含许多在日常生活中未使用的技术词,因此使用共同词典的形态分析是不够的。因此,我们使用了一种从文本中提取技术术语的技术。提取技术术语后,我们将其应用于DTM。在这项研究中,我们确定了14年照明级F21的技术进步,并将其与专利分类代码的实际修订版进行了比较。换句话说,我们从日本专利和应用DTM中提取技术术语来确定日本技术的进度。然后,我们从新的角度分析了通过机器学习修改专利分类代码的结果。结果,发现那些主题上升的人被认为是新技术。
Japanese patents are assigned a patent classification code, FI (File Index), that is unique to Japan. FI is a subdivision of the IPC, an international patent classification code, that is related to Japanese technology. FIs are revised to keep up with technological developments. These revisions have already established more than 30,000 new FIs since 2006. However, these revisions require a lot of time and workload. Moreover, these revisions are not automated and are thus inefficient. Therefore, using machine learning to assist in the revision of patent classification codes (FI) will lead to improved accuracy and efficiency. This study analyzes patent documents from this new perspective of assisting in the revision of patent classification codes with machine learning. To analyze time-series changes in patents, we used the dynamic topic model (DTM), which is an extension of the latent Dirichlet allocation (LDA). Also, unlike English, the Japanese language requires morphological analysis. Patents contain many technical words that are not used in everyday life, so morphological analysis using a common dictionary is not sufficient. Therefore, we used a technique for extracting technical terms from text. After extracting technical terms, we applied them to DTM. In this study, we determined the technological progress of the lighting class F21 for 14 years and compared it with the actual revision of patent classification codes. In other words, we extracted technical terms from Japanese patents and applied DTM to determine the progress of Japanese technology. Then, we analyzed the results from the new perspective of revising patent classification codes with machine learning. As a result, it was found that those whose topics were on the rise were judged to be new technologies.