Coforge Open Source based content extraction framework for extracting data unstructured data from sources like Email, PDF, Word, Excel and scanned documents using AI, ML, NLP, OCR and other cognitive technologies.
Coforge has developed SLICE - A solution accelerator in the space of Intelligent Content Extraction that provides an intelligent approach to content extraction from structure/unstructured data from various sources like PDF, Word, Excel, HTML, Images. The solution is built on top of open technologies and is easily customizable for different types of documents. The framework can extract both printed text and handwritten content along with specific markings like checkboxes and radio buttons and data present in tables. Industries like travel, hospitality, banking, finance, insurance, logistics, retail, healthcare, etc. can benefit from it.
Most of the established organizations have thousands of documents which need to be digitized to extract value from them. These documents could be in various forms like legal contracts, customer forms, feedback forms, consignment notes, receipts, invoices etc. There is no one sized solution in the market that extracts information from the documents. The various OCR/ICR solutions available are either expensive or inflexible or too focused on specific type of document.
Extraction of content from images, PDF, Word, Excel, Unstructured Text
Dashboard to view extraction accuracy
Customization for a bespoke solution for niche problems not solved by COTS products
Flexible and Customizable
Based on Open Technologies
Industry agnostic
Scalable
Low cost of Ownership
On Premise or Cloud Based Deployment
Trained Developers
Flexible Pricing Models
Python, OpenCV, Tesseract, Flask, Spacy, Tabula
The demonstration shows the capabilities of SLICE to extract content from printed forms, handwritten content and how its advanced computer vision can be used to solve complex problems.
Live Demo