Harnessing the Power of Layout-Aware Neural Models for Complex Table Structure Resolution

Sarang Shrivastava
3 min readMay 12, 2023

--

Image taken from our paper

I have tried to summarize the key findings of our paper here.

In the vast landscape of data processing and analytics, tables stand out as one of the most common and efficient ways to present and digest large amounts of data. They are an integral part of financial reports, research papers, technical documents, and a myriad of other informational sources. However, extracting and understanding the information within these tables can often be quite a daunting task, especially when dealing with complex, multi-layered tables.

Traditional methods for table extraction have often leaned heavily on rule-based systems or specifically hand-crafted features. While these methods have their own merits, they frequently fall short when encountering the complexity of hierarchical tables, particularly those that contain multi-level header structures and non-adjacent parent-child relationships.

Transitioning to Layout-Aware Neural Models

As we move forward in the data-driven era, a new, more sophisticated method for resolving table structures has emerged — the Layout-Aware Neural Models. These models, with LayoutLM being a prominent example, offer a fresh, powerful approach to understanding the intricate maze of complex tables.

Unlike traditional methods that rely on explicit rules or predefined features, LayoutLM leverages both the semantic and structural aspects of these complex figures by incorporating the bounding box information of the tokens in the pretraining stage. By being pre-trained on a large and diverse document corpus, it’s able to easily adapt and learn to predict table structures, effectively circumventing the need for hard-set rules or features.

Delving into Hierarchical Tables

To test the efficacy of LayoutLM, an extensive investigation was conducted using the IBM FinTabNet dataset. This dataset is a treasure trove of tabular data, comprising 112,887 tables scattered across 89,646 pages of S&P500 companies’ earning reports.

The primary task involved labeling cell types and recognizing parent-children relationships among the column header cells. These annotations helped capture the hierarchical structure of the tables and provided a rich resource to train and test the LayoutLM model.

To ensure a robust evaluation, the annotation process was conducted in two rounds. Initially, a random sample of 500 tables was drawn from the base dataset for annotation. A preliminary model was then trained using this initial dataset. This trained model was subsequently used to select a more complex set of tables for the second round of annotation. In total, 887 tables were manually annotated.

LayoutLM: A New Benchmark for Table Structure Resolution

To measure the effectiveness of LayoutLM, it was compared against heuristic models and BERT, a popular language representation model. The results were impressive: LayoutLM outperformed both models by a significant margin. On the task of cell label prediction, it achieved an F1 score of 95.08, and on relation prediction, it attained a score of 69.7.

These results not only demonstrate the efficacy of LayoutLM but also underscore the potential of layout-aware models in handling complex hierarchical table structures. The model’s performance also led to the creation of soft labels for the entire IBM FinTabNet dataset, further augmenting its utility.

Looking Ahead: The Future of Table Structure Resolution

The success of LayoutLM in resolving complex table structures has profound implications. Complex tables with intricate hierarchical structures are not the exception but the norm in many data-rich domains such as finance. As such, the ability to accurately understand and extract information from these tables is of paramount importance.

However, the journey of enhancing table structure resolution is far from over. Additional structural aspects, such as row hierarchy and caption-to-content block relationships, are still an unexplored territory. The future of table extraction and understanding is promising, and layout-aware models are leading the way.

Thank you for joining us on this exploration today.

--

--