Hi, I am undertaking a peice of work and may need a bit of help.

The problem i need to find a solution for is as follows -

I am requesting a Text based document through a http request, and currently have the document i want from the http request. Now i need to look for an element within this text and remove it

Does anyone have an idea how i could do this my code is below. also i am requesting the document from a solr database, which basically outputs the results from the query in a text view but

#!/bin/env python2.5
from urllib2 import *
import sys
import os
import pickle
import logging
from optparse import OptionParser, OptionGroup
import urllib

def set_solr_query(docID):
	return '[internal network http]'
	

def request_doc(url):
	conn = urlopen(url)
	rsp = eval( conn.read() )
	docedit = rsp['response']['docs'][0]
	#docedit = fileobject.read()
	#doc.readlines()
	find_keyword(docedit)
	print "number of matches=", rsp['response']['numFound']
	for doc in rsp['response']['docs']:
		print 'Year =', doc['year']
    
	
	
def find_keyword(x):
	print "opened File object and dumped doc into pickle"
	file_pi = open('filename_pi.obj', 'w') 
	pickle.dump(x,file_pi)
 


def main(argv):

   print '+++++++++++++++++++++++++++++++++++++++CONFIGURATIONS++++++++++++++++++++++++++++++++++++++++++++++'
   docID=argv[1]
   keyword=argv[2]

   url = set_solr_query(docID)
   request_doc(url)

   
   print docID
   print keyword
   print url
   print '++++++++++++++++++++++++++++++++++++++++++++END++++++++++++++++++++++++++++++++++++++++++++++++++++'

   

if __name__ == "__main__":
   main(sys.argv)

The only part of the above code i cannot publish is the internal http request. apologies for this.

I hope someone can help point a first time python user to this, i would be very greatful. I look foward to hearing back from someone.


Thanks

Dan

Hi, I am undertaking a peice of work and may need a bit of help.

The problem i need to find a solution for is as follows -

I am requesting a Text based document through a http request, and currently have the document i want from the http request. Now i need to look for an element within this text and remove it

Does anyone have an idea how i could do this my code is below. also i am requesting the document from a solr database, which basically outputs the results from the query in a text view but

#!/bin/env python2.5
from urllib2 import *
import sys
import os
import pickle
import logging
from optparse import OptionParser, OptionGroup
import urllib

def set_solr_query(docID):
	return '[internal network http]'
	

def request_doc(url):
	conn = urlopen(url)
	rsp = eval( conn.read() )
	docedit = rsp['response']['docs'][0]
	#docedit = fileobject.read()
	#doc.readlines()
	find_keyword(docedit)
	print "number of matches=", rsp['response']['numFound']
	for doc in rsp['response']['docs']:
		print 'Year =', doc['year']
    
	
	
def find_keyword(x):
	print "opened File object and dumped doc into pickle"
	file_pi = open('filename_pi.obj', 'w') 
	pickle.dump(x,file_pi)
 


def main(argv):

   print '+++++++++++++++++++++++++++++++++++++++CONFIGURATIONS++++++++++++++++++++++++++++++++++++++++++++++'
   docID=argv[1]
   keyword=argv[2]

   url = set_solr_query(docID)
   request_doc(url)

   
   print docID
   print keyword
   print url
   print '++++++++++++++++++++++++++++++++++++++++++++END++++++++++++++++++++++++++++++++++++++++++++++++++++'

   

if __name__ == "__main__":
   main(sys.argv)

The only part of the above code i cannot publish is the internal http request. apologies for this.

I hope someone can help point a first time python user to this, i would be very greatful. I look foward to hearing back from someone.


Thanks

Dan

Thanks

I'd look into beautifulsoup if you're looking to break this thing down into an object hierarchy type structure. I haven't used it much, but I know there's plenty of examples on this site of how to implement it.

This question has already been answered. Start a new discussion instead.