SLICE

Content Extraction Framework

Coforge Open Source based content extraction framework for extracting data unstructured data from sources like Email, PDF, Word, Excel and scanned documents using AI, ML, NLP, OCR and other cognitive technologies.

Coforge has developed SLICE - A solution accelerator in the space of Intelligent Content Extraction that provides an intelligent approach to content extraction from structure/unstructured data from various sources like PDF, Word, Excel, HTML, Images. The solution is built on top of open technologies and is easily customizable for different types of documents. The framework can extract both printed text and handwritten content along with specific markings like checkboxes and radio buttons and data present in tables. Industries like travel, hospitality, banking, finance, insurance, logistics, retail, healthcare, etc. can benefit from it.

    Solution Need

    • Most of the established organizations have thousands of documents which need to be digitized to extract value from them. These documents could be in various forms like legal contracts, customer forms, feedback forms, consignment notes, receipts, invoices etc. There is no one sized solution in the market that extracts information from the documents. The various OCR/ICR solutions available are either expensive or inflexible or too focused on specific type of document.

    Features

    • Extraction of content from images, PDF, Word, Excel, Unstructured Text

    • Dashboard to view extraction accuracy

    • Customization for a bespoke solution for niche problems not solved by COTS products

    Benefits

    • Flexible and Customizable

    • Based on Open Technologies

    • Industry agnostic

    • Scalable

    • Low cost of Ownership

    • On Premise or Cloud Based Deployment

    • Trained Developers

    • Flexible Pricing Models

    Technology Stack

    Python, OpenCV, Tesseract, Flask, Spacy, Tabula

    SLICE Demo

    The demonstration shows the capabilities of SLICE to extract content from printed forms, handwritten content and how its advanced computer vision can be used to solve complex problems.

    Live Demo
    Request for detailed information on this asset: