Apply a regex I already have to a text file

Question

elvis1 0 Newbie Poster

15 Years Ago

Hi guys, I am a complete newbie (but learning slowly). I am trying to make a script to check for proxies that are contained in a file using a regex.

Fact is that I lack of knowledge and I do not know how to make it go. This is my lame approach.

#this opens the file containing a list of proxy address in a txt file
fob=open('C:\Documents and Settings\Desktop\file.txt','r')
listme=fob.readlines()

##need to check for strings that are really proxy addresses using #this regex 
import re
[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\:[0-9]{1,5}

#writes the matched proxies into a new filename
fob=open('C:\Documents and Settings\Desktop\file2.txt','w') 
fob.writelines()
fob=close()

Best Regards and many thanks

file-system python regex

3 Contributors
8 Replies
192 Views
4 Days Discussion Span
Latest Post 15 Years Ago Latest Post by elvis1

All 8 Replies

snippsat 661 Master Poster

15 Years Ago

You can look at this,did some changes to the regex.
Shoud find valid proxy ok now.

import re

'''-->proxy.txt
202.9945.29.27:80
221.11.27.110:8080
11111.45454.4454.444
114.30.47.10:80
116.52.155.237:80
204.73.37.11255:80
220.227.90.154:8080455
'''
proxy_file = 'c:/test/proxy.txt'

proxy_pattern = r'[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\:[0-9]{1,5}\D'
my_compile = re.compile(proxy_pattern)

file2_read = open(proxy_file, 'r')
new_proxy = open('c:/test/new_proxy.txt', 'w')

for currentline in file2_read:
    match_obj = my_compile.search(currentline)
    if match_obj:        
        print currentline.rstrip()  ##Test print      
        new_proxy.write(currentline)    

file2_read.close()
new_proxy.close()

'''-->new_proxy.txt
221.11.27.110:8080
114.30.47.10:80
116.52.155.237:80
'''

Edited 15 Years Ago by snippsat because: n/a

snippsat 661 Master Poster

15 Years Ago

And result from my program is not:

Yes it work fine as an alterntive to regex.
But it was ask about how to use that regex and write to a file.

how could I append this code (which seems to be what I need) into yours?
http://www.daniweb.com/forums/post11...ml#post1192186
Many Thanks

Dont now what you mean here.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 1 · 2010-05-04T17:37:28+00:00

Sorry, but because nobody has given yet answer let me offer the Pythonic version in my 'magic separator check' approach

[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\:[0-9]{1,5}

I do have some basics on regurlar expressions, though they come in some variations.

I interpret that proxy is something in form
0+.0+.0+:9+
where 0+ is min 1, max 3 numbers
and 9+ is min 1, max 5 numbers.

For that I give this magic match adapted from earlier thread (my email check snippet) (I left the debug prints in so you can see how I got the checks)

# -*- coding: latin1 -*-
def validateProxy(a):
    sep=[x for x in a if not x.isdigit()]
##    print sep  debug
    if sep != ['.','.','.',':']: return False
    end=a
    for i in sep:
        part,i,end=end.partition(i)
        if i=='.' and (1<=len(part)<=3):
##            print i,3,part # debug
            continue
        elif i==':' and 1<=len(part)<=3 and 1<=len(end)<=5:
##            print i,5,part # debug
            continue
        else:
             return False
    return True

if __name__ == '__main__':
    proxy = [ "1.1.10.23:123","10.2.100.100:", "123.123.1.0:12345",
               "123.9.123.1234:12345", "123.123.1.:123456","123,123,1:12345"
               ]
    print "Valid proxies are:"
    for i in filter(validateProxy,proxy): print '\t',i

elvis1 0 Newbie Poster · Answer 2 · 2010-05-07T19:36:49+00:00

Many thanks mate for the answer. I do not get the code to be honest. Does it make what I described? Any idea which are the faults in my code?

Thanks again

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 3 · 2010-05-08T00:43:18+00:00

And result from my program is not:

# -*- coding: latin1 -*-
def validateProxy(a):
    """ Drop out digits, and check the separators confirm to pattern
        split with separators and see that each part is between 1 and 3 numbers
        check that end after ':' is 1 until 5 long """
    sep=[x for x in a if not x.isdigit()]
    if sep != ['.','.','.',':']: return False
    end=a
    for i in sep:
         ## part until separator, separator, part after separator until end
        part,i,end=end.partition(i)
        if i=='.' and (1<=len(part)<=3):
            continue ## 1..3 OK
        elif i==':' and 1<=len(part)<=3 and 1<=len(end)<=5:
            continue ## last 1..3 OK and end 1..5 long
        else:
             return False
    return True

if __name__ == '__main__':
    proxy = [ "202.9945.29.27:80",
              "221.11.27.110:8080",
              "11111.45454.4454.444",
              "114.30.47.10:80",
              "116.52.155.237:80",
              "204.73.37.11255:80,"
              "220.227.90.154:8080455"]
    print "Valid proxies are:"
    for i in filter(validateProxy,proxy): print '\t',i
""" Result:
Valid proxies are:
	221.11.27.110:8080
	114.30.47.10:80
	116.52.155.237:80
>>> """

elvis1 0 Newbie Poster · Answer 4 · 2010-05-08T01:12:27+00:00

@snippsat: nice code! I will try scale it just a bit more ( noobness alert :P ) by adding a way to eliminate duped proxy addresses.

how could I append this code (which seems to be what I need) into yours?

http://www.daniweb.com/forums/post1192186.html#post1192186

Many Thanks

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 5 · 2010-05-08T04:59:47+00:00

Here I have both re and my version with duplicates removal, but you said you wanted to add this yourself?

import re

proxy_file = 'proxy.txt'

def validate_proxy_re(this):
    proxy_pattern = r'[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\:[0-9]{1,5}\D'
    my_compile = re.compile(proxy_pattern)
    match_obj = my_compile.search(this)
    if match_obj:        
        return this
    else:
        return ''

def validate_proxy_py(this):
    """ Drop out digits, and check the separators confirm to pattern
        split with separators and see that each part is between 1 and 3 numbers
        check that end after ':' is 1 until 5 long """
    sep=[x for x in this.strip() if not x.isdigit()] ## strip to remove whitespace
    if sep != ['.','.','.',':']: return '' 
    end=this
    for i in sep:
         ## part until separator, separator, part after separator until end
        part,i,end=end.partition(i)
        if i=='.' and (1<=len(part)<=3):
            continue ## 1..3 OK
        elif i==':' and 1<=len(part)<=3 and 1<=len(end)<=5:
            continue ## last 1..3 OK and end 1..5 long
        else:
             return ''
    return this

new_proxy = open('new_proxy.txt', 'w')

proxies=set()
for currentline in open(proxy_file, 'r'):
    if currentline not in proxies:
        validated=validate_proxy_re(currentline)
##        validated=validate_proxy_py(currentline)
        
        if validated:
            new_proxy.write(validated)
            proxies.add(validated)

new_proxy.close()

elvis1 0 Newbie Poster · Answer 6 · 2010-05-08T09:55:01+00:00

tonyjv: many thanks will take a look at your code.. very interesting.. many thanks again!

Apply a regex I already have to a text file

Recommended Answers Collapse Answers

All 8 Replies

Recommended Answers