First posting here, but I've gotten a lot of help from this site.

Here is my problem. I'm trying to automate some procedures for my production line. I have a word document that I have created a string from. What I'd like to is search the document for somestring after a couple keywords in the document.

I.E.

Display = Procedure.IndexOf("shall read") I want to return the next 15 characters after this string

Please help creating a search engine is a pain.

Recommended Answers

All 12 Replies

Are you building this in Word VBA or are you calling Word from your application or you got your string in your app and need to handle it?
Please any code you've got, for us to see how you've got this far.

This is how I am calling the document. I'm trying to develop the program in a way that doesn't required the user system to have Word licensed.

  If My.Computer.FileSystem.FileExists("\\mc-000\WebFTP\Drawings\LoveCalProcedures\65726210test.doc") Then
            MsgBox("File found.")
        Else
            MsgBox("File not found.")
        End If
        Dim objApp As Word.Application
        Dim objDoc As Word.Document

        'Open new instance
        objApp = New Word.Application
        objApp.Visible = False
        objDoc = New Word.Document
        objDoc = objApp.Documents.Open("\\mc-000\WebFTP\Drawings\LoveCalProcedures\65726210test.doc")

        Procedure = objDoc.Content.Text
        'objDoc.Close()
        objApp.Application.Quit()

I want to search this document for varying strings after a keyword string. So far I can only grab the first instance of this statement, but even that is still pretty crude code.

Sub ShallReadRoutine()
        LEDDisplaySearch = Procedure.IndexOf("shall read")
        LEDDisplayFind = Procedure.Substring(LEDDisplaySearch, 25)
        LEDDisplay = Strings.Right(LEDDisplayFind, 15)
        TopDisplay = Strings.Left(LEDDisplay, 7)
        BottomDisplay = Strings.Right(LEDDisplay, 7)
        UUTBox.RichTextBox4.Text = TopDisplay
        UUTBox.RichTextBox3.Text = BottomDisplay
    End Sub

I think that calling Word.Application requires word to be installed - and obviously licensed.
Anyway:

Sub ShallReadRoutine() 
dim keywords() = {"shall read","test1","test2") 

for each key as string in keywords 
LEDDisplaySearch = Procedure.IndexOf(key)

if LEDDisplaySearch <> -1
    TopDisplay = procedure.substring(LEDDisplaySearch + 10, 7)
    BottomDisplay = Procedure.substring(LEDDisplaySearch + 18,7)

select  case key
    case "shall read" 
        UUTBox.RichTextBox4.text = TopDisplay 
        UUTBox.RichTextBox3.text = BottomDisplay
    case else
        cobj(key & "upper").text = TopDisplay
        cobj(key & "lower").text = BottomDisplay 
end if 

next

This might give you an idea.
If you are looking for the next "shall read" part then you should use the overloaded IndexOf and specify as startIndex the last index or the last index + how many chars you need to.
Try to use meaningfull names in your objects. RichTextBox4 will only get you lost inside your code and will make troubleshooting or future releases harder.
In the case else part I've used test1upper, test1lower, test2upper and test2lower as textboxes (or richtextboxes) and used the same code for both sets of objects. If this isn't your case, you can repeat the case "shall read" part with different objects or whatever.

Please note that the above code hasn't been tested and may contain typos or other errors.

Good luck.

Hi,

As adam_k said if you are using the word libraries and interop to access word, the users machine must have a copy of Word installed.

Furthermore, if you specifically include references to a Word library you are depending on the user having the same version of Word installed i.e. Word 2003 uses a different library than Word 2010

To get round the library issue it is better to use late binding to access word:

'Example using an included reference to Word library - depends on client having same version

Imports Microsoft.Interop.Word 

dim MyWord as Word.Application
dim MyDoc as Word.Document

MyWord = new Word.Application
MyDoc  = new Word.Document 'I don't think you actually need this line
MyDoc =  MyWord.Documents.Open (FilePath)

'Example using late binding  - uses whatever version found on client
dim MyWord as Object
dim MyDoc as Object

MyWord = createobject("Word.Application") 
'create object will go and find whatever program string matches
MyDoc = MyWord.Documents.Open(FilePath)

Adam,
Thank you for the method at searching for the keywords. I'm still self-teaching here a little bit so I'm still a little unclear on the terminology. Can you dumb down what this part of the code actually does?
Sub ShallReadRoutine() dim keywords() = {"shall read","test1","test2") for each key as string in keywords LEDDisplaySearch = Procedure.IndexOf(key)
Also you mention "overload IndexOf" how do I call this function? (I assume this refers to the last instance of my Procedure.IndexOf(key)

G_Waddell,
Maybe I should have started 2 topics. So a work associate told my that if I add the Microsoft Word Object Library to my reference a user would be able to strip the word document text and process the data as a new string. It seems I was misinformed. There is no way my company is going to pay the Microsoft Word licensing on the 11 computers I am planning to purchase for this project. Any advice on a work-around?

The dim keywords() = {..} will declare an array.
The for each key as string in keywords will step through the array's ellements.
This way you can use the same code and save yourself a huge pile of if...else if.... else...
If you can't go with Word, perhaps it's time to get back to why do you use a word file? Can't it be a simple text file?

So a work associate told my that if I add the Microsoft Word Object Library to my reference a user would be able to strip the word document text and process the data as a new string. It seems I was misinformed. There is no way my company is going to pay the Microsoft Word licensing on the 11 computers I am planning to purchase for this project. Any advice on a work-around?

_

\mc-000\WebFTP\Drawings\LoveCalProcedures\65726210test.doc"

Does the Word Document absolutely need to be stored in the Word 97-2003 "doc" format? If it could be stored in the newer "docx" format, you can use the Open XML SDK 2.0 for Microsoft Office to read the file without any licensing issues.

Retrieving the document text is as simple as:

   Dim wpdoc As DocumentFormat.OpenXml.Packaging.WordprocessingDocument = WordprocessingDocument.Open("D:\My Documents\programming.docx", isEditable:=False)
   Dim body As DocumentFormat.OpenXml.Wordprocessing.Body = wpdoc.MainDocumentPart.Document.Body
   Dim doctext As String = body.InnerText
   wpdoc.Close()

In the .doc format the text itself is stored in plain text, a hex editor should be able to let you see the control codes that signal the text. With that you can parse the file to find the text.

Thanks for the help guys. I should have no problem changing the documents to the .docx format. Great answers great community.

As the guys said above, there are ways and means to get around using the word application... It just means you will not be using the Interop application.

If your issue is solved, please mark this thread as solved.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.