{"id":5421,"date":"2024-03-13T09:39:32","date_gmt":"2024-03-13T01:39:32","guid":{"rendered":"https:\/\/www.1ai.net\/?p=5421"},"modified":"2024-03-13T09:39:32","modified_gmt":"2024-03-13T01:39:32","slug":"magi%e5%8f%af%e8%87%aa%e5%8a%a8%e5%b0%86%e6%bc%ab%e7%94%bb%e8%bd%ac%e5%bd%95%e6%88%90%e6%96%87%e5%ad%97-%e5%b9%b6%e8%87%aa%e5%8a%a8%e7%94%9f%e6%88%90%e5%89%a7%e6%9c%ac","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/5421.html","title":{"rendered":"Magi: Automatically transcribe comics into text and automatically generate scripts"},"content":{"rendered":"<p>The Visual Geometry Group in the Department of Engineering Sciences at the University of Oxford has developed a program called <a href=\"https:\/\/www.1ai.net\/en\/tag\/magi\" title=\"[See articles with [Magi] label]\" target=\"_blank\" >Magi<\/a> The model that can automatically put the<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e6%bc%ab%e7%94%bb\" title=\"[Sees articles with [Caricature] labels]\" target=\"_blank\" >comics<\/a>Pages are transcribed into text and a script is generated.<\/p>\n<p>The model implements a fully automated script generation function by recognizing panels, text blocks and characters on a comic page. Its main functions include panel detection, which recognizes individual panels on a comic page, and text block detection, which recognizes text blocks in panels, usually containing dialog or narrative text. In addition, the model is capable of detecting character images on the page and clustering them according to their identities in order to distinguish different characters.<\/p>\n<p class=\"article-content__img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-5422\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/03\/6384583530946891364935011.jpg\" alt=\"\" width=\"839\" height=\"495\" \/><\/p>\n<p>The Magi model also associates text with speakers, determining which text was spoken by which character on the page, ensuring the accuracy of the script. At the same time, the model sorts the text blocks in the order in which they are read to ensure that the narrative logic of the script is consistent with the original comic, allowing the reader to experience the comic story in its entirety by reading the text.<\/p>\n<p>In addition to the Magi model itself, the project includes a dataset called Mangadex-1.5M, which contains about 1.5 million comic pages covering a wide range of genres and art styles. This dataset is designed to provide support for the training of Magi models to solve the problem of automatic comprehension and script generation of comic pages, including panel detection, text block and character detection, character identity clustering, and text-speaker correlation.<\/p>\n<p>Through this project, the researchers hope to advance automated processing and comprehension techniques in the field of comics.<\/p>\n<p>Dissertation.<a href=\"https:\/\/arxiv.org\/abs\/2401.10224\">https:\/\/arxiv.org\/abs\/2401.10224<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>The Visual Geometry Group in the Department of Engineering Sciences at the University of Oxford has developed a model called Magi that automatically transcribes comic pages into text and generates scripts. The model implements a fully automated script generation function by recognizing panels, text blocks and characters on a comic page. Its main functions include panel detection, which recognizes individual panels on a comic page, and text block detection, which recognizes blocks of text in panels, usually containing dialog or narrative text. In addition, the model is capable of detecting character images on the page and clustering them according to their identity in order to distinguish between different characters. The Magi model can also correlate the text with the speaker to determine which text was spoken by which character on the page, ensuring script accuracy. At the same time, the model will sort the text blocks in the order in which the comics are read<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1654,1601],"collection":[],"class_list":["post-5421","post","type-post","status-publish","format-standard","hentry","category-news","tag-magi","tag-1601"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/5421","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=5421"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/5421\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=5421"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=5421"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=5421"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=5421"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}