VUB models teach Artificial Intelligence to read tables more correctly
"We want our models to understand the underlying structure of tables, just like humans do"
The growing volume of reports, invoices, scientific publications and other business documents increasingly challenges companies and institutions to process information quickly and reliably. In his doctoral research at VUB entitled Representation Learning for Table Understanding in Intelligent Document Processing, Willy Carlos Tchuitcheu (Mathematics & Data Science Research Group) developed an innovative method that teaches computers to handle those tables much better. Tables contain a lot of information but are often a difficult for AI to interpret within documents. His findings are an important asset for applications in artificial intelligence and automatic document processing.
The core data of a document are often summarised in tables. Frequently athey pose problems for current AI systems. Many Large Language Models convert tables into linear text, losing the two-dimensional structure, headings and relationships between cells. This leads to errors and inaccuracies. "We found that many LLMs often fail to preserve permutation invariance, meaning that a shuffled version of a table is treated as a completely new table", says Tchuitcheu. "This indicates that LLMs do not fully understand the underlying structure of tables, which can lead to misinterpretation. The consequences of this misinterpretation include weak associations between similar tables, resulting in poor accuracy when AI agents answer questions."
Tchuitcheu introduced the so-called Table Understanding principle, a theoretical framework that describes how people interpret tables by automatically connecting each cell with the correct row and column headings. From that principle, he developed a structure-aware method that no longer reduces tables to plain text.
"Our goal was to enable AI systems to understand tables more naturally", Tchuitcheu explains. "We want models to go beyond simply mimicking a principle determined by their training with textual data, and instead understand the underlying structure, just like humans. This makes for more reliable analysis and faster actionable insights, especially in industries where tabular data play a strategic role."
The new approach proves particularly robust, in part because it takes into account permutation invariance, the fact that tables typically retain their meaning even when rows or columns are rearranged. As a result, the model performs consistently even when the shape of the table changes.
Promotor Prof Ann Dooms emphasises the importance of the research for the broader evolution of artificial intelligence. "Document processing is a crucial component in many social and economic processes", she says. "Willy Tchuitcheu's work shows that we can make AI systems much more reliable by making them look at tables fundamentally differently. It opens the door to new applications in administrative automation, in scientific analysis and in data-intensive industries."
The PhD research shows strong results in two central applications: automatically recognising column types and answering queries based on table data. In addition, the method increases the speed and accuracy of information extraction, which is important for companies processing large volumes of documents.
Co-promoter Prof Tan Lu: "Although large language models are increasingly used for tasks such as document processing and automated reasoning, mathematically motivated modelling frameworks, such as Tchuitcheu's work, remain an important angle. In addition to data-driven automation, mathematical modelling enables deeper interpretability, transparency and reliability, which is essential for reliable AI systems."
About the researcher
Willy Carlos Tchuitcheu received his master's degree in mathematical sciences from the African Institute for Mathematical Sciences in Rwanda in 2019. He worked for three years as a research engineer at Camertronix in Cameroon and started his PhD at VUB within the Department of Mathematics and Data Science in 2021. His research resulted in three articles as first author in international journals, a patent application and a Best Poster Award at the Flanders AI Research Day 2021. He has also co-authored two additional publications, one of them again as first author.
For more information or interview requests: Willy Carlos Tchuitcheu: Department of Mathematics and Data Science: Willy.Carlos.Tchuitcheu@vub.be
Professor Ann Dooms: ann.dooms@vub.be
Professor Tan Lu: tan.lu@vub.be