Hi,

I want to extract certain words from a sentence input by the user. For e.g., the user enters "Jones born in 1967" and the program will extract the words 'Jones, born and 1967' but will not extract the word 'in' and will print the extracted words out. Is there anyway I can do so? Please guide me.

Thanks

Recommended Answers

All 21 Replies

How much have you done so far? Have you tried to put together an algorithm, even if just in pseudocode, for what you are trying to accomplish and how you might think to achieve it?

try this out

Dim InputText As String = "Jack born in, 1987"
        Dim Separators() = New Char() {" "c, ","c} ' u can add more Separators
        Dim Words = InputText.Split(Separators)
        'filter words
        Words = Words.Where(Function(W) W.Trim.Length > 1 AndAlso _
                                        W.Equals("in", StringComparison.OrdinalIgnoreCase) _
                                        ).ToArray

        For Each Word In Words
            MsgBox(Word)
        Next

hope this helps

Thanks vbnetskywalker for the code.

I have a little problem here which I forgotten to mention. What if there are other sentences that users enter such as "President Obama is from US", "Today is Friday", etc. Do I have to change the input text and the not extracted words each time?

sure ..............

1 - for the input : I don't know where you get the input from, it might be from a TextBox , InputBox, File, ........etc

2 - for the not extracted words : I don't know what kind of words you don't want to extract , so you have to mention them manually

and after all.......... I don't know that exactly you're trying to do
if you provide more info on your whole app, that would help

Hi vbnetskywalker,

thanks again. Here is the details. I have created a database in Ms SQL Server with the clues and patterns that I want to extract, e.g. clues are those words like 'Lake', 'Mount', 'Co', 'Company', 'Corp', etc. The patterns are words like 'born in', 'company such as', 'located in', etc. But I want to discard those words like 'such as', 'in', 'a', 'an', etc.

What I want is when a user enters a short sentence like 'Jones is born in 1975', the program will recognize and print out those words like 'Jones, 1975, born'.

As the database already mention that Jones is a person, 1975 is a year, and born is a pattern.

well in such a case you can use the funciton InputText.Replace(,)
or it's better to use the RegEx

perform the replace before extracting words
if you don't know how to use the RegEx then I have to post it for you after about 6-7 hours from now cuz I have to sleeeeeeeeeeeeep now .......... execuse me

Thanks vbnetskywalker.

I do need your help on how to use RegEx.

I want to ask is it possible for me to do parsing with regular expression with the problem I stated above?

mister nickelmann , I'm so sorry, I promised you to post you the code after 6-7 hours (cuz I had to sleep), but I'm so sorry something went wrong and I couldn't
anyway here is a very good article about RegEx
and I'm sure you will be able to accomplish your goal after reading it

http://www.codeproject.com/KB/dotnet/regextutorial.aspx

here is a bit code to get the schema of the code you could use

Dim Input As String = "User Input"
Dim RegExPattern As String = "user|input"
Dim Matches = Regex.Matches(Input, _
                        RegExPattern, _
                        RegexOptions.IgnoreCase)

For Each m As Match In Matches
    MsgBox(m.Value)
Next

I'm sorry again for being late.........
hope this helps

Thanks vbnetskywalker. It's ok for the delay.

I read the thing and it helps a lot. But if I want to find the objects and subjects from the sentence, how do I do that?

hi........
I think it's pretty much hard to do such a thing (at least with my knowledge & exprecience)
I mean we're talking about a Natural Language here ......!!!!

anyways ....... my recommendations to you is post a Complete Example of an Expected Input and Expected Output ......., and let's see what experts here could come up for you with

and I don't think it's gonna be possible
sorry for not being such a help

hi, i drew out the plans already. I am supposed to tokenize the input then remove the bad words (e.g. 'a', 'an', 'the', 'that', 'than', etc). After that, I should match the extracted words which are not the bad words with the database of the words. These words when match to the database, it will tell the user what are these words e.g. Corp is actually a company. And also will check the proper noun. I have some of the codes already but I am stuck. Can anyone help me?

Imports System.Web
Imports System.Web.Services
Imports System.Web.Services.Protocols
Imports System.Collections
Imports System.Text.RegularExpressions


<WebService(Namespace:="http://tempuri.org/")> _
<WebServiceBinding(ConformsTo:=WsiProfiles.BasicProfile1_1)> _
<Global.Microsoft.VisualBasic.CompilerServices.DesignerGenerated()> _
Public Class Service
     Inherits System.Web.Services.WebService

    <WebMethod()> _
    Public Function Main(ByVal inputStr As String) As String

        Dim tockenizedWord() As String = {}
        Dim i As Integer = 0

        While inputStr.Length > 0
            tockenizedWord(i) = strTockenizer(inputStr)

            If tockenizedWord(i).length Then
                i = i
            Else
                i = i + 1
            End If

        End While

        'stnMaching()

        Return "done"
    End Function

    Public Function strTockenizer(ByVal inputStr As String) As String

        Dim tocWord As String

        if () then
            Return tocWord
        Else
            Return ""
        End If


    End Function

    Public Function stnMatching() As String
        Return "null"
    End Function

End Class

So sorry to all out there. I think I got confused by myself on the output that I am supposed to get. Here is the supposed input and output that I am supposed to do and get.

Step 1: To split and count the number of words in the sentence input by users.
Step 2: To extract subjects and objects e.g. nsubj, pobj, etc from the sentence input by users. Or to remove the bad words (stop words) from the split words.
Step 3: To check words that match the clues and patterns. E.g. Jones was born in 1975; the matching words are Jones, born, 1975.
Step 4: To check proper noun - capital word for each starting word)

I have done the splitting words from the sentence but I am not sure on how should I continue the program.

Here are my codes. Sorry forgotten to post together earlier

Imports System.Web
Imports System.Web.Services
Imports System.Web.Services.Protocols
Imports System.Text.RegularExpressions

<WebService(Namespace:="http://tempuri.org/")> _
<WebServiceBinding(ConformsTo:=WsiProfiles.BasicProfile1_1)> _
<Global.Microsoft.VisualBasic.CompilerServices.DesignerGenerated()> _
Public Class Service
     Inherits System.Web.Services.WebService


    Sub main()
        'Sample String
        Dim sampleText As String = "User input"

        'Convert double-spaces to single spaces
        Dim separators() As String = {" ", ".", ",", "?"}
        Dim wordArray() As String = sampleText.Split(separators, _
                                       StringSplitOptions.RemoveEmptyEntries)

        ' Sort the result 
        Array.Sort(wordArray)

    End Sub

'To count the words
Public Function getWordCount(ByVal InputString As String) As Integer
        Return Split(System.Text.RegularExpressions.Regex.Replace(InputString, "\s+", Space(1))).Length
    End Function


End Class

Hi Nickelmann...

I hope this helps you...

Protected Sub btnExtract_Click(ByVal sender As Object, ByVal e As EventArgs) Handles btnExtract.Click
        Dim textEntry As String = txtWordEntry.Text
        txtExtractedWords.Text = ""
        Dim words As String() = textEntry.Split(New Char() {" "c})
        Dim word As String
        For Each word In words
            If validWord(word) Then
                txtExtractedWords.Text = txtExtractedWords.Text & word & ", "
            End If
        Next
    End Sub

    Function validWord(ByVal theWord As String)
        If theWord = "in" Then
            Return False
        Else
            Return True
        End If
    End Function

Evan Dela-Grammticas, NeonBlue Business Consultant

I remember back in the day, about 10 years ago. There used to be a natural language scripting engine called Gerbil, this came with a web interface and was meant to simmulate natural language intelligence.

I think you might need to consider some sort of language plugin to help you out here.

Evan Dela-Grammaticas, NeonBlue Business Consultant

So sorry to all out there. I think I got confused by myself on the output that I am supposed to get. Here is the supposed input and output that I am supposed to do and get.

Step 1: To split and count the number of words in the sentence input by users.
Step 2: To extract subjects and objects e.g. nsubj, pobj, etc from the sentence input by users. Or to remove the bad words (stop words) from the split words.
Step 3: To check words that match the clues and patterns. E.g. Jones was born in 1975; the matching words are Jones, born, 1975.
Step 4: To check proper noun - capital word for each starting word)

I have done the splitting words from the sentence but I am not sure on how should I continue the program.

Hi Nickelmann...

I hope this helps you...

Protected Sub btnExtract_Click(ByVal sender As Object, ByVal e As EventArgs) Handles btnExtract.Click
Dim textEntry As String = txtWordEntry.Text
txtExtractedWords.Text = ""
Dim words As String() = textEntry.Split(New Char() {" "c})
Dim word As String
For Each word In words
If validWord(word) Then
txtExtractedWords.Text = txtExtractedWords.Text & word & ", "
End If
Next
End Sub

Function validWord(ByVal theWord As String)
If theWord = "in" Then
Return False
Else
Return True
End If
End Function

Evan Dela-Grammticas, NeonBlue Business Consultant

Hi NeonBlue007,
Thanks for the sample code. I was wondering if I want to keep changing the sentence, is it possible that I change it to

Function validWord(ByVal theWord As String)
        If theWord = "stopWords.txt" Then
           .....

It is because in every sentence, I'll be extracting different words of subjects and objects and removing different words of stop words.

Hi all, I think I got what I wanted already. Thanks for all your helps but I still have a minor problem. May I know how can I call the functions?

Imports System.Web
Imports System.Web.Services
Imports System.Web.Services.Protocols
Imports System.Text.RegularExpressions
Imports System
Imports System.IO
Imports System.Text

<WebService(Namespace:="http://tempuri.org/")> _
<WebServiceBinding(ConformsTo:=WsiProfiles.BasicProfile1_1)> _
<Global.Microsoft.VisualBasic.CompilerServices.DesignerGenerated()> _
Public Class Service
     Inherits System.Web.Services.WebService

    'To eliminate stop words and replace with a space and eliminate extra spaces
    Public Function StripStopWords(ByVal s As String) As String
        'Dim StopWords As String = ReadFile("C:\Users\jaimiechin\Desktop\stopwords.txt").Trim
        Dim temp As String
        Dim reader As StreamReader = New StreamReader("C:\Users\jaimiechin\Documents\Visual Studio 2005\WebSites\extract1\badWords.txt")
        temp = reader.ReadLine
        Dim StopWordsRegex As String = Regex.Replace("badWords.txt", "\s+", "|") ' about|after|all|also etc.
        StopWordsRegex = String.Format("\s?\b(?:{0})\b\s?", StopWordsRegex)
        Dim Result As String = Regex.Replace(s, StopWordsRegex, " ", RegexOptions.IgnoreCase) ' replace each stop word with a space
        Result = Regex.Replace(Result, "\s+", " ") ' eliminate any multiple spaces
        reader.Close()
        Return Result
    End Function

    'To count the words and split the sentence
    Public Function getWordCount(ByVal InputString As String) As Integer
        Return Split(System.Text.RegularExpressions.Regex.Replace(InputString, "\s+", Space(1))).Length
    End Function

    <WebMethod()> _
    Public Function main(ByVal inputStr As String) As String
        'Dim input As String = "User Input"
        Dim output As String

        If output = String.Format("[A-Z][a-z][0-9]") Or ("\d{1,2}\/\d{1,2}\/\d{4}") Or ("\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,6}") Then
            Return output
        Else
            Return ("Error")
        End If


        


    End Function

    
End Class

I want to input "are Jones and Jonas" and the program will eliminate "are, and" as it's a bad word and keep "Jones, Jonas". It will then split the words "Jones, Jonas" and display it.

OK… let’s see, if we are going to do this I suppose we should do it properly. Now I am nowhere near as good at programming as I would like to be, so if any of you code monkeys out there have any better ways of doing this, feel free to educate me.
I would start by…

1. Crate a database with 3 tables in it… tableNouns, tableVerbs, tableTrash
2. OnPageLoad, create 3 corresponding arrays arrayNouns, arrayVerbs, arrayTrash
3. Split user input into an array of words

For each word in UserInput
If word is NOT in arrayNouns then
If word is NOT in arrayVerbs then
If word is NOT in arrayTrash then
Ask user what to do with word
Load word into database
Load word into array
Else
Treat word as trash
End If
Else
Treat word as a Verb
End if
Else
Treat word as a noun
End if
next

This way, your application becomes better and better with each word although initially you will need to ‘train’ it.

Does this help?

Evan Dela-Grammaticas, NeonBlue Business Consultant

Hi,

Is there a way where I can remove stopwords from the user input sentence? the stopwords are in xml file.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.