部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Exploring Stroke-Level Modifications for Scene Text Editing

论文作者

Qu, Yadong, Tan, Qingfeng, Xie, Hongtao, Xu, Jianjun, Wang, Yuxin, Zhang, Yongdong

论文摘要

场景文本编辑（Ste）旨在用所需的文本替换文本，同时保留原始文本的背景和样式。但是，由于背景纹理和各种文本样式的复杂，现有方法在生成清晰且清晰的编辑文本图像方面缺乏。在这项研究中，我们将不良的编辑性能归因于两个问题：1）隐式脱钩结构。编辑整个图像的先前方法必须同时学习背景和文本区域的不同翻译规则。 2）域间隙。由于缺乏编辑的真实场景文本图像，因此网络只能通过合成对训练，并且在现实世界图像上的表现较差。为了解决上述问题，我们通过在中风级别（Mostel）修改场景文本图像来提出一个新颖的网络。首先，我们生成中风指导图，以明确指示要编辑的区域。通过直接修改图像级别上的所有像素，与隐式的指令不同，这种明确的指令过滤了分心，并指导网络专注于文本区域的编辑规则。其次，我们提出了一个半监督的混合动力学习，以使用标记的合成图像和未配对的真实场景文本图像来训练网络。因此，Ste模型适用于现实世界数据集发行。此外，提出了两个新的数据集（Tamper-Syn2k和Tamper-Scene），以填充公共评估数据集的空白。广泛的实验表明，我们的Mostel在定性和定量上都优于先前的方法。数据集和代码将在https://github.com/qqqyd/mostel上找到。

Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text. However, due to the complicated background textures and various text styles, existing methods fall short in generating clear and legible edited text images. In this study, we attribute the poor editing performance to two problems: 1) Implicit decoupling structure. Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously. 2) Domain gap. Due to the lack of edited real scene text images, the network can only be well trained on synthetic pairs and performs poorly on real-world images. To handle the above problems, we propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL). Firstly, we generate stroke guidance maps to explicitly indicate regions to be edited. Different from the implicit one by directly modifying all the pixels at image level, such explicit instructions filter out the distractions from background and guide the network to focus on editing rules of text regions. Secondly, we propose a Semi-supervised Hybrid Learning to train the network with both labeled synthetic images and unpaired real scene text images. Thus, the STE model is adapted to real-world datasets distributions. Moreover, two new datasets (Tamper-Syn2k and Tamper-Scene) are proposed to fill the blank of public evaluation datasets. Extensive experiments demonstrate that our MOSTEL outperforms previous methods both qualitatively and quantitatively. Datasets and code will be available at https://github.com/qqqyd/MOSTEL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题