nickelmann 0 Newbie Poster

Hi,

Is there a way where I can remove stopwords from the user input sentence? the stopwords are in xml file.

nickelmann 0 Newbie Poster

Hi all, I think I got what I wanted already. Thanks for all your helps but I still have a minor problem. May I know how can I call the functions?

Imports System.Web
Imports System.Web.Services
Imports System.Web.Services.Protocols
Imports System.Text.RegularExpressions
Imports System
Imports System.IO
Imports System.Text

<WebService(Namespace:="http://tempuri.org/")> _
<WebServiceBinding(ConformsTo:=WsiProfiles.BasicProfile1_1)> _
<Global.Microsoft.VisualBasic.CompilerServices.DesignerGenerated()> _
Public Class Service
     Inherits System.Web.Services.WebService

    'To eliminate stop words and replace with a space and eliminate extra spaces
    Public Function StripStopWords(ByVal s As String) As String
        'Dim StopWords As String = ReadFile("C:\Users\jaimiechin\Desktop\stopwords.txt").Trim
        Dim temp As String
        Dim reader As StreamReader = New StreamReader("C:\Users\jaimiechin\Documents\Visual Studio 2005\WebSites\extract1\badWords.txt")
        temp = reader.ReadLine
        Dim StopWordsRegex As String = Regex.Replace("badWords.txt", "\s+", "|") ' about|after|all|also etc.
        StopWordsRegex = String.Format("\s?\b(?:{0})\b\s?", StopWordsRegex)
        Dim Result As String = Regex.Replace(s, StopWordsRegex, " ", RegexOptions.IgnoreCase) ' replace each stop word with a space
        Result = Regex.Replace(Result, "\s+", " ") ' eliminate any multiple spaces
        reader.Close()
        Return Result
    End Function

    'To count the words and split the sentence
    Public Function getWordCount(ByVal InputString As String) As Integer
        Return Split(System.Text.RegularExpressions.Regex.Replace(InputString, "\s+", Space(1))).Length
    End Function

    <WebMethod()> _
    Public Function main(ByVal inputStr As String) As String
        'Dim input As String = "User Input"
        Dim output As String

        If output = String.Format("[A-Z][a-z][0-9]") Or ("\d{1,2}\/\d{1,2}\/\d{4}") Or ("\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,6}") Then
            Return output
        Else
            Return ("Error")
        End If


        


    End Function

    
End Class

I want to input "are Jones and Jonas" and the program will eliminate "are, and" as it's a bad word and keep "Jones, Jonas". It will then split the words "Jones, Jonas" and display it.

nickelmann 0 Newbie Poster

Hi Nickelmann...

I hope this helps you...

Protected Sub btnExtract_Click(ByVal sender As Object, ByVal e As EventArgs) Handles btnExtract.Click
Dim textEntry As String = txtWordEntry.Text
txtExtractedWords.Text = ""
Dim words As String() = textEntry.Split(New Char() {" "c})
Dim word As String
For Each word In words
If validWord(word) Then
txtExtractedWords.Text = txtExtractedWords.Text & word & ", "
End If
Next
End Sub

Function validWord(ByVal theWord As String)
If theWord = "in" Then
Return False
Else
Return True
End If
End Function

Evan Dela-Grammticas, NeonBlue Business Consultant

Hi NeonBlue007,
Thanks for the sample code. I was wondering if I want to keep changing the sentence, is it possible that I change it to

Function validWord(ByVal theWord As String)
        If theWord = "stopWords.txt" Then
           .....

It is because in every sentence, I'll be extracting different words of subjects and objects and removing different words of stop words.

nickelmann 0 Newbie Poster

Here are my codes. Sorry forgotten to post together earlier

Imports System.Web
Imports System.Web.Services
Imports System.Web.Services.Protocols
Imports System.Text.RegularExpressions

<WebService(Namespace:="http://tempuri.org/")> _
<WebServiceBinding(ConformsTo:=WsiProfiles.BasicProfile1_1)> _
<Global.Microsoft.VisualBasic.CompilerServices.DesignerGenerated()> _
Public Class Service
     Inherits System.Web.Services.WebService


    Sub main()
        'Sample String
        Dim sampleText As String = "User input"

        'Convert double-spaces to single spaces
        Dim separators() As String = {" ", ".", ",", "?"}
        Dim wordArray() As String = sampleText.Split(separators, _
                                       StringSplitOptions.RemoveEmptyEntries)

        ' Sort the result 
        Array.Sort(wordArray)

    End Sub

'To count the words
Public Function getWordCount(ByVal InputString As String) As Integer
        Return Split(System.Text.RegularExpressions.Regex.Replace(InputString, "\s+", Space(1))).Length
    End Function


End Class
nickelmann 0 Newbie Poster

So sorry to all out there. I think I got confused by myself on the output that I am supposed to get. Here is the supposed input and output that I am supposed to do and get.

Step 1: To split and count the number of words in the sentence input by users.
Step 2: To extract subjects and objects e.g. nsubj, pobj, etc from the sentence input by users. Or to remove the bad words (stop words) from the split words.
Step 3: To check words that match the clues and patterns. E.g. Jones was born in 1975; the matching words are Jones, born, 1975.
Step 4: To check proper noun - capital word for each starting word)

I have done the splitting words from the sentence but I am not sure on how should I continue the program.

nickelmann 0 Newbie Poster

hi, i drew out the plans already. I am supposed to tokenize the input then remove the bad words (e.g. 'a', 'an', 'the', 'that', 'than', etc). After that, I should match the extracted words which are not the bad words with the database of the words. These words when match to the database, it will tell the user what are these words e.g. Corp is actually a company. And also will check the proper noun. I have some of the codes already but I am stuck. Can anyone help me?

Imports System.Web
Imports System.Web.Services
Imports System.Web.Services.Protocols
Imports System.Collections
Imports System.Text.RegularExpressions


<WebService(Namespace:="http://tempuri.org/")> _
<WebServiceBinding(ConformsTo:=WsiProfiles.BasicProfile1_1)> _
<Global.Microsoft.VisualBasic.CompilerServices.DesignerGenerated()> _
Public Class Service
     Inherits System.Web.Services.WebService

    <WebMethod()> _
    Public Function Main(ByVal inputStr As String) As String

        Dim tockenizedWord() As String = {}
        Dim i As Integer = 0

        While inputStr.Length > 0
            tockenizedWord(i) = strTockenizer(inputStr)

            If tockenizedWord(i).length Then
                i = i
            Else
                i = i + 1
            End If

        End While

        'stnMaching()

        Return "done"
    End Function

    Public Function strTockenizer(ByVal inputStr As String) As String

        Dim tocWord As String

        if () then
            Return tocWord
        Else
            Return ""
        End If


    End Function

    Public Function stnMatching() As String
        Return "null"
    End Function

End Class
nickelmann 0 Newbie Poster

Thanks vbnetskywalker. It's ok for the delay.

I read the thing and it helps a lot. But if I want to find the objects and subjects from the sentence, how do I do that?

nickelmann 0 Newbie Poster

I want to ask is it possible for me to do parsing with regular expression with the problem I stated above?

nickelmann 0 Newbie Poster

Thanks vbnetskywalker.

I do need your help on how to use RegEx.

nickelmann 0 Newbie Poster

Hi vbnetskywalker,

thanks again. Here is the details. I have created a database in Ms SQL Server with the clues and patterns that I want to extract, e.g. clues are those words like 'Lake', 'Mount', 'Co', 'Company', 'Corp', etc. The patterns are words like 'born in', 'company such as', 'located in', etc. But I want to discard those words like 'such as', 'in', 'a', 'an', etc.

What I want is when a user enters a short sentence like 'Jones is born in 1975', the program will recognize and print out those words like 'Jones, 1975, born'.

As the database already mention that Jones is a person, 1975 is a year, and born is a pattern.

nickelmann 0 Newbie Poster

Thanks vbnetskywalker for the code.

I have a little problem here which I forgotten to mention. What if there are other sentences that users enter such as "President Obama is from US", "Today is Friday", etc. Do I have to change the input text and the not extracted words each time?

nickelmann 0 Newbie Poster

Hi,

I want to extract certain words from a sentence input by the user. For e.g., the user enters "Jones born in 1967" and the program will extract the words 'Jones, born and 1967' but will not extract the word 'in' and will print the extracted words out. Is there anyway I can do so? Please guide me.

Thanks