I have to get three pieces of information from the files i have
1. The name of the owner of the file which appears after the pattern tag
<foaf:name>
2. The ID of the owner of the file which is embedded in the filename, e.g.,
http%3A%2F%2Ftalk.ie%2Fvbulletin%2Ffoaf.php%3Fu%3D12
belongs to user 12. The pattern Fu%3D always appears before the ID in the
filename and not anywhere else in the file name.
3. The people known by the owner of the file, this may be more than one
person. The pattern foaf.php?u can be used to find these IDs
Here is the dir2.txt file:
http%3A%2F%2Ftalk.ie%2Fvbulletin%2Ffoaf.php%3Fu%3D12.txt
http%3A%2F%2Ftalk.ie%2Fvbulletin%2Ffoaf.php%3Fu%3D4.txt
http%3A%2F%2Ftalk.ie%2Fvbulletin%2Ffoaf.php%3Fu%3D374.txt
http%3A%2F%2Ftalk.ie%2Fvbulletin%2Ffoaf.php%3Fu%3D103.txt
http%3A%2F%2Ftalk.ie%2Fvbulletin%2Ffoaf.php%3Fu%3D57.txt
http%3A%2F%2Ftalk.ie%2Fvbulletin%2Ffoaf.php%3Fu%3D98.txt
Here is an example of what is in one of these files:
<?xml version="1.0" encoding="iso-8859-1" ?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<foaf:name>Donna</foaf:name>
<foaf:nick>Donna</foaf:nick>
<foaf:knows>
<foaf:Person rdf:about="http://talk.ie/vbulletin/foaf.php?u=21#person">
<foaf:nick>Kath</foaf:nick>
</foaf:Person>
</foaf:knows>
<foaf:Person rdf:about="http://talk.ie/vbulletin/foaf.php?u=3673#person">
<foaf:nick>Mick</foaf:nick>
</foaf:Person>
</foaf:knows>
</foaf:Person>
This code takes the user id's and stores them in an output file and i need to edit it so that it extracts the 3 pieces of info needed:
def add_userid(filename):
currfile = open(filename)
searchterm="foaf.php?u="
length=len(searchterm)
userid = ''
for line in currfile:
found = line.find(searchterm)
if found != -1:
position = found + length
i = position
while i < len(line) and line[i] != '#':
userid = userid + line[i]
i += 1
print "ID of user is", userid
writetofile("output.txt", userid)
def writetofile(filename, userid):
currfile = open(filename, 'a+')
currfile.write(userid)
currfile.write('\n') # to save each ID on a line
currfile.close()
def readfiles(filename):
filelist = open(filename)
for line in filelist:
files = line[:-1]
print files
add_userid(files)
>>> readfiles('dir2.txt')
http%3A%2F%2Ftalk.ie%2Fvbulletin%2Ffoaf.php%3Fu%3D12.txt
ID of user is 21
ID of user is 213673
http%3A%2F%2Ftalk.ie%2Fvbulletin%2Ffoaf.php%3Fu%3D4.txt
ID of user is 98
ID of user is 98194
ID of user is 98194265
ID of user is 98194265343
ID of user is 98194265343393
ID of user is 98194265343393585
ID of user is 98194265343393585851
ID of user is 981942653433935858511026
ID of user is 9819426534339358585110261163
ID of user is 98194265343393585851102611631172
ID of user is 981942653433935858511026116311721353
ID of user is 9819426534339358585110261163117213531955
ID of user is 98194265343393585851102611631172135319552160
ID of user is 981942653433935858511026116311721353195521602300
ID of user is 9819426534339358585110261163117213531955216023002563
ID of user is 98194265343393585851102611631172135319552160230025633091
ID of user is 981942653433935858511026116311721353195521602300256330913116
ID of user is 9819426534339358585110261163117213531955216023002563309131163289
ID of user is 98194265343393585851102611631172135319552160230025633091311632894091
ID of user is 981942653433935858511026116311721353195521602300256330913116328940915013
ID of user is 9819426534339358585110261163117213531955216023002563309131163289409150135419
ID of user is 98194265343393585851102611631172135319552160230025633091311632894091501354196202
http%3A%2F%2Ftalk.ie%2Fvbulletin%2Ffoaf.php%3Fu%3D374.txt
ID of user is 4
ID of user is 435
ID of user is 43544
ID of user is 4354448
ID of user is 435444852
ID of user is 43544485254
ID of user is 4354448525473
ID of user is 435444852547398
ID of user is 435444852547398108
ID of user is 435444852547398108109
ID of user is 435444852547398108109111
ID of user is 435444852547398108109111136
ID of user is 435444852547398108109111136156
http%3A%2F%2Ftalk.ie%2Fvbulletin%2Ffoaf.php%3Fu%3D103.txt
ID of user is 17954
ID of user is 179544
ID of user is 17954498
http%3A%2F%2Ftalk.ie%2Fvbulletin%2Ffoaf.php%3Fu%3D57.txt
ID of user is 4
ID of user is 41724
ID of user is 4172498
http%3A%2F%2Ftalk.ie%2Fvbulletin%2Ffoaf.php%3Fu%3D98.txt
ID of user is 422
ID of user is 42259856
It gives the first id correctly but then adds the next one it finds to it. e.g
ID of user is 422
ID of user is 42259856
When it should be
ID of user is 422
ID of user is 59856