0

hi everyone I need your help in extracting arabic text from image using vb.net. I only found program for extracting english language text but not found for arabic.
I used the following code that support only english language.

Imports MODI
Imports System.IO
Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        OpenFileDialog1.ShowDialog()
        Dim filepath = OpenFileDialog1.FileName
        Dim extract As String = Me.ExtractTextFromImage(filepath)

        Label1.Text = extract.Replace(Environment.NewLine, "<br />")
    End Sub

    Private Function ExtractTextFromImage(ByVal filepath As String) As Object
        Dim modiDocument As New Document()
        modiDocument.Create(filepath)
        modiDocument.OCR(MiLANGUAGES.miLANG_ENGLISH)
        Dim modiImage As MODI.Image = TryCast(modiDocument.Images(0), MODI.Image)
        Dim extractedText As String = modiImage.Layout.Text
        modiDocument.Close()
        Return extractedText
    End Function

End Class
4
Contributors
3
Replies
22
Views
1 Week
Discussion Span
Last Post by Avedgent
1

Tesseract 3.0+ should have Arabic support. It's in C but there are wrappers for other languages, including two for .NET.

If that proves too hard there is also a JavaScript port of Tesseract that supports Arabic. You'll need node.js though.

Edited by Traevel: spelling

Votes + Comments
Was thinking Tessaract but didn't check language support. Bad me. Have used. Can be fun (as in image processing required.)
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.