The document madness.

As a columnist, it’s not really appropriate to write about your own business. But if the magazine is about documents, I can’t really help myself. With my company Parashift, we are working on a technological solution for the versatile extraction of data from documents. The topic keeps me on my toes day and night. So you can forgive me.

Document extraction is not solved, at least not in the way our time demands

When I first got involved with document extraction, we needed a solution for reading many different documents. My initial inquiries with established providers at the time revealed that this was not a problem at all. However, the more we looked into the solutions, the clearer it became that the providers’ definition of “the problem is solved” differed considerably from ours. While I imagined an API to which I could send a document and get results back, the providers talked about large setups, license costs and lengthy projects. We couldn’t afford any of that at the time.

An incredibly complex technical problem

Out of sheer necessity, we started to develop technology for document extraction. Anyone starting to extract document data with machine learning will realize that they can achieve good results quite quickly at first. This is misleading, because although good initial results are exciting from a technological point of view, they rarely have a high business value. And, the challenge becomes more complex the further you work on it. We realized that we needed to bring four components together to fully solve document extraction: a huge global number of documents, an autonomous learning machine learning cluster, the cloud infrastructure to handle high volumes in different compliance zones and high-quality learning data.

The first step was to bring these 4 components together on one platform. Since you can’t just buy documents and mass annotation for training data is far too expensive, we also integrated the 4 components directly into our business model. Today, we understand our model in such a way that all stakeholders who deal with documents (customers, partners, BPO, consultants) contribute by using our platform so that we can all take big steps towards the goal of versatile document extraction.

Document extraction as a game changer

Time and again, we see customers spending an incredible amount of time processing documents manually. How the lack of autonomous document processing blocks processes – and much worse – paralyses digitization efforts. This is because while companies invest a lot of money in solutions for high-volume document types, an investment for low-volume document types is almost never justified.

However, as these small-volume document types together usually account for more than 70% of all documents in a company, the majority of tedious, manual work remains. If we hadn’t gotten so used to it over the years, we would never accept this amount of work. The solution of versatile, autonomous document extraction will therefore unleash digitization and enable a whole new level of automation.

Artikel auf Social Media teilen:

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *