Hello friends i am new on DaniWeb and new in Python...
I have a few problems so i need help...
I have a text document with a code from some web page, how can i extract some string between a two different strings in line and then that stings write in a new file after all that program must continue to search more strings in that same line...
i have 50 lines with 150 000 chr Ex: (',...3627s-/a<<*g12-5<d/ajqjh5i1/*//*,.,.,-,')

EX: " (a,b){var c=encodeURIComponent,d=["//www.google.com/gen_204?atyp=i&zx=",(new Da "
How can i extract just '204?atyp=i&zx'

My string is always between 'google.com/gen_' and '="' in this example so i need a help to write a code that open a file, read from it, extract exact string between two strings in a line write him in a new file and then continue to search in that line and rest of the lines in file...
And is it possible that my program can't see all 150 000 chr in one line on web page because when he write it on new file it has just 72 775 something like that in a single line...
So is it possible to write a program who open's a file, read it, input first string, input second string and write string between this two in a new file...
So if you can write any part of the code and I would be very grateful to you...
Thanks

Recommended Answers

All 8 Replies

I have posted code for just a case as it is quite typical case:
Picking piece of string between separators

Also it is simple to get these pieces by regular expression using the re module. You then must take care that greedy matching does not take the first beginning separator and match all string until the last end separator:

{m,n}?
Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as few repetitions as possible. This is the non-greedy version of the previous qualifier. For example, on the 6-character string 'aaaaaa', a{3,5} will match 5 'a' characters, while a{3,5}? will only match 3 characters.

You may have to give more exact info,like post more of the file.
Here is a regex example with the string you posted,that may need some changes to be more greedy.

import re

s = '(a,b){var c=encodeURIComponent,d=["//www.google.com/gen_204?atyp=i&zx=",(new Da "'
r = re.findall(r'\gen_(\d.*)="', s)
print(r) #['204?atyp=i&zx']

When I use pyTony's method it look like this:

def between(left,right,s):
    before,_,a = s.partition(left)
    a,_,after = a.partition(right)
    return a

myFile = open('kod1.txt')
for lines in myFile.readlines():
     
    s = myFile.readline()
    print between('<script type="text/ja','" src="http://s',s)
  
myFile.close()

and then it just print 50 empty lines... I use that 'tags' from the kod1.txt

but when i use snippsat's method it look like this:

import re

myFile = open('kod1.txt')
for lines in myFile.readlines():
    
    s = myFile.readline()
r = re.findall(r'\<script type="text/javascript"(\d.*)src.php/v1/yP/r/jMx', s)
print(r)

and it just print this [] i replace 'tags' with something else no work...

Can you post a sample of the file.
You are making some basic mistake,so what you do will not work.

false;}" title="Search"><span class="hidden_elem">Search</span></button></span></span></div></div></div><input type="hidden" name="init" id="init" value="quick" /><input type="hidden" name="tas" class="search_sid_input" value="search_preload" /><input type="hidden" name="search_first_focus" id="search_first_focus" value="" /><

all that is from 1 line 1/50 of a line...

so how i will find 'eload" /><i' in this string if you are writing code pls. write how to open file properly... Thanks...

You gone struggle with this if your regex and python skill are not so good.
This file is a mix of javascript and html.
Regex and html is not and the best fit,that`s why it exist parser like lxml and BeautifulSoup.

So this time you want to find something completely diffrent than the first post.
I use with open() then you dont need to close fileobject.

import re

pattern = re.compile(r'pr(.+i)n')
with open('test.txt') as f:
    for match in pattern.finditer(f.read()):
        print(match.group(1)) #eload" /><i

And why do you want to only find a part of word and some tag delimiter?

I need a crtitical information from code, that code is a link but link is splited in two half by Ex: 'ss=\"passiveName\" href=\"http:\/\/www.example.com\/profile.php?id=0000000000\" data-hovercard=\"\/ajax\/hoverc'
So i need a program who will go from tag to tag Ex: tag1 'ss=\"passiveName\" href=\"http:\/\/' tag2 '\' and tag3 in this ex: '\" data-hovercard=\"\/ajax\/hoverc'
And then write this in new file 'www.example.com/profile.php?id=0000000000' or program who just extract ID of a file... but that program must continue to looking for new link's between tags...

And nobody didn't answer me is it possible that my program:

filehandle = p.cevapi()
myFile = open('kod1.txt','w')
for lines in filehandle:
    
    myFile.write(lines)
   
myFile.close()

'p.cevapi' is return of the function so the code of page is in there, my question is: is it possible that my program goes just to 72 000 char but in source code that line have 150 000...?

Does anyone know the answer to this last question and i have another one...
In which text "code" my program is writing in file when I download a code from a web page? Is it ASC11 or some other? Because when i want to find some code from web page in my file the text is changed...

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.