IBM Publishing Lightweight Visual Language

The IBM has now officially released a small visual language model, Granite-Docling-258M, located for end-to-end file conversion scenarios, using the Apache 2.0 Open Source Protocol, which is now available online at Hugging Face (https://huggingface.co/ibm-granite/granite-docling-258M). The model parameter amounting to 258 million is referred to as a lightweight model designed specifically for the document table, and the output results complete the structure of the layout, tables, mathematical formulae, lists and code blocks, while being more accurate than the traditional OCR software. IBM reveals that the core of Granite-Docling lies in DocTags, a set of common file structure tags designed by IBM Research that can accurately describe the type, coordinates, reading order and cross-element relationships of the page elements, while separating content from the layout and achieving " first-recognizing the range of elements before the OCR identification" and, after the conversion, DocTags is also able to export content directly to Markdown, JSON, HTML, etc., and further access to the Docling Library for processing。

Search