Hi,
Is there a way where I can remove stopwords from the user input sentence? the stopwords are in xml file.
Hi,
Is there a way where I can remove stopwords from the user input sentence? the stopwords are in xml file.
Hi all, I think I got what I wanted already. Thanks for all your helps but I still have a minor problem. May I know how can I call the functions?
Imports System.Web
Imports System.Web.Services
Imports System.Web.Services.Protocols
Imports System.Text.RegularExpressions
Imports System
Imports System.IO
Imports System.Text
<WebService(Namespace:="http://tempuri.org/")> _
<WebServiceBinding(ConformsTo:=WsiProfiles.BasicProfile1_1)> _
<Global.Microsoft.VisualBasic.CompilerServices.DesignerGenerated()> _
Public Class Service
Inherits System.Web.Services.WebService
'To eliminate stop words and replace with a space and eliminate extra spaces
Public Function StripStopWords(ByVal s As String) As String
'Dim StopWords As String = ReadFile("C:\Users\jaimiechin\Desktop\stopwords.txt").Trim
Dim temp As String
Dim reader As StreamReader = New StreamReader("C:\Users\jaimiechin\Documents\Visual Studio 2005\WebSites\extract1\badWords.txt")
temp = reader.ReadLine
Dim StopWordsRegex As String = Regex.Replace("badWords.txt", "\s+", "|") ' about|after|all|also etc.
StopWordsRegex = String.Format("\s?\b(?:{0})\b\s?", StopWordsRegex)
Dim Result As String = Regex.Replace(s, StopWordsRegex, " ", RegexOptions.IgnoreCase) ' replace each stop word with a space
Result = Regex.Replace(Result, "\s+", " ") ' eliminate any multiple spaces
reader.Close()
Return Result
End Function
'To count the words and split the sentence
Public Function getWordCount(ByVal InputString As String) As Integer
Return Split(System.Text.RegularExpressions.Regex.Replace(InputString, "\s+", Space(1))).Length
End Function
<WebMethod()> _
Public Function main(ByVal inputStr As String) As String
'Dim input As String = "User Input"
Dim output As String
If output = String.Format("[A-Z][a-z][0-9]") Or ("\d{1,2}\/\d{1,2}\/\d{4}") Or ("\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,6}") Then
Return output
Else
Return ("Error")
End If
End Function
End Class
I want to input "are Jones and Jonas" and the program will eliminate "are, and" as it's a bad word and keep "Jones, Jonas". It will then split the words "Jones, Jonas" and display it.
Hi Nickelmann...
I hope this helps you...
Protected Sub btnExtract_Click(ByVal sender As Object, ByVal e As EventArgs) Handles btnExtract.Click
Dim textEntry As String = txtWordEntry.Text
txtExtractedWords.Text = ""
Dim words As String() = textEntry.Split(New Char() {" "c})
Dim word As String
For Each word In words
If validWord(word) Then
txtExtractedWords.Text = txtExtractedWords.Text & word & ", "
End If
Next
End SubFunction validWord(ByVal theWord As String)
If theWord = "in" Then
Return False
Else
Return True
End If
End FunctionEvan Dela-Grammticas, NeonBlue Business Consultant
Hi NeonBlue007,
Thanks for the sample code. I was wondering if I want to keep changing the sentence, is it possible that I change it to
Function validWord(ByVal theWord As String)
If theWord = "stopWords.txt" Then
.....
It is because in every sentence, I'll be extracting different words of subjects and objects and removing different words of stop words.
Here are my codes. Sorry forgotten to post together earlier
Imports System.Web
Imports System.Web.Services
Imports System.Web.Services.Protocols
Imports System.Text.RegularExpressions
<WebService(Namespace:="http://tempuri.org/")> _
<WebServiceBinding(ConformsTo:=WsiProfiles.BasicProfile1_1)> _
<Global.Microsoft.VisualBasic.CompilerServices.DesignerGenerated()> _
Public Class Service
Inherits System.Web.Services.WebService
Sub main()
'Sample String
Dim sampleText As String = "User input"
'Convert double-spaces to single spaces
Dim separators() As String = {" ", ".", ",", "?"}
Dim wordArray() As String = sampleText.Split(separators, _
StringSplitOptions.RemoveEmptyEntries)
' Sort the result
Array.Sort(wordArray)
End Sub
'To count the words
Public Function getWordCount(ByVal InputString As String) As Integer
Return Split(System.Text.RegularExpressions.Regex.Replace(InputString, "\s+", Space(1))).Length
End Function
End Class
So sorry to all out there. I think I got confused by myself on the output that I am supposed to get. Here is the supposed input and output that I am supposed to do and get.
Step 1: To split and count the number of words in the sentence input by users.
Step 2: To extract subjects and objects e.g. nsubj, pobj, etc from the sentence input by users. Or to remove the bad words (stop words) from the split words.
Step 3: To check words that match the clues and patterns. E.g. Jones was born in 1975; the matching words are Jones, born, 1975.
Step 4: To check proper noun - capital word for each starting word)
I have done the splitting words from the sentence but I am not sure on how should I continue the program.
hi, i drew out the plans already. I am supposed to tokenize the input then remove the bad words (e.g. 'a', 'an', 'the', 'that', 'than', etc). After that, I should match the extracted words which are not the bad words with the database of the words. These words when match to the database, it will tell the user what are these words e.g. Corp is actually a company. And also will check the proper noun. I have some of the codes already but I am stuck. Can anyone help me?
Imports System.Web
Imports System.Web.Services
Imports System.Web.Services.Protocols
Imports System.Collections
Imports System.Text.RegularExpressions
<WebService(Namespace:="http://tempuri.org/")> _
<WebServiceBinding(ConformsTo:=WsiProfiles.BasicProfile1_1)> _
<Global.Microsoft.VisualBasic.CompilerServices.DesignerGenerated()> _
Public Class Service
Inherits System.Web.Services.WebService
<WebMethod()> _
Public Function Main(ByVal inputStr As String) As String
Dim tockenizedWord() As String = {}
Dim i As Integer = 0
While inputStr.Length > 0
tockenizedWord(i) = strTockenizer(inputStr)
If tockenizedWord(i).length Then
i = i
Else
i = i + 1
End If
End While
'stnMaching()
Return "done"
End Function
Public Function strTockenizer(ByVal inputStr As String) As String
Dim tocWord As String
if () then
Return tocWord
Else
Return ""
End If
End Function
Public Function stnMatching() As String
Return "null"
End Function
End Class
Thanks vbnetskywalker. It's ok for the delay.
I read the thing and it helps a lot. But if I want to find the objects and subjects from the sentence, how do I do that?
I want to ask is it possible for me to do parsing with regular expression with the problem I stated above?
Thanks vbnetskywalker.
I do need your help on how to use RegEx.
Hi vbnetskywalker,
thanks again. Here is the details. I have created a database in Ms SQL Server with the clues and patterns that I want to extract, e.g. clues are those words like 'Lake', 'Mount', 'Co', 'Company', 'Corp', etc. The patterns are words like 'born in', 'company such as', 'located in', etc. But I want to discard those words like 'such as', 'in', 'a', 'an', etc.
What I want is when a user enters a short sentence like 'Jones is born in 1975', the program will recognize and print out those words like 'Jones, 1975, born'.
As the database already mention that Jones is a person, 1975 is a year, and born is a pattern.
Thanks vbnetskywalker for the code.
I have a little problem here which I forgotten to mention. What if there are other sentences that users enter such as "President Obama is from US", "Today is Friday", etc. Do I have to change the input text and the not extracted words each time?
Hi,
I want to extract certain words from a sentence input by the user. For e.g., the user enters "Jones born in 1967" and the program will extract the words 'Jones, born and 1967' but will not extract the word 'in' and will print the extracted words out. Is there anyway I can do so? Please guide me.
Thanks