Hello all,

i was working on an R script that will read a huge text file and make some calculation inside it, but i figured out that it will need long time to do the job , so i'm trying to convert it into python.

is there any way to do this automatically ? because i'm having some troubles making the conversion line by line.

that's the code if someone can help.

this script will calculate 3 values : A,Cres and mb for each (p,q) wich is the line index pair.

tab=read.table(test.txt,sep=" ")
tab$value =((tab$V3+tab$V4)/(tab$V2+tab$V1))
    re=NULL
    rowsnb=nrow(tab)

    for(q in seq(1,rowsnb,by=150)) {
        for(p in seq(1,rowsnb,by=150)) {

             A=(rowsnb*(q^2))

             for(i in 1:q) {
                  Cres=(tab[i,]$value)*i
             }


     
           mb=((rowsnb*A)/Cres)*A
             final <- rbind(final, data.frame(avalue=A,Cvalue=Cres,mbvalue=mb))
        }
    }
    write.delim(final, file = "final.txt", row.names = TRUE, quote = FALSE, sep = " ")

thanks for help

Any sample of data possible say 20x20 top left corner? And formula for output in normal form with one example count from that data?

Hi,

that's an example of the input :

i will change the code a little bit to be able to transform an output for you .

the code will be like this :

library(pgirmess)
tab=read.table("test1.txt",stringsAsFactors=FALSE)

tab$value = log2((tab$V3+tab$V4)/(tab$V2+tab$V1))
    re=NULL
    rowsnb=nrow(tab)

    for(q in seq(1,rowsnb,by=1)) {
        for(p in seq(1,rowsnb,by=1)) {

             A=(rowsnb*(q^2)) + ((rowsnb*(rowsnb+1)*(2*rowsnb+1))/6)

             for(i in 1:q)
		 {
                  Cres=(tab[i,]$value)*i
             }


    
           mb=((rowsnb*A)/Cres)*A

             re<- rbind(re, data.frame(avalue=A,Cvalue=Cres,mbvalue=mb))
        }
    }
    write.delim(re, file = "final.txt", row.names = TRUE, quote = FALSE, sep = " ")

and kindly find attached the file that is a part of my data that i'm working on.

and this is some part of the output of the new script:

avalue Cvalue mbvalue
1 2127 0.173331602885562 469818086.513428
2 2127 0.173331602885562 469818086.513428
3 2127 0.173331602885562 469818086.513428
4 2127 0.173331602885562 469818086.513428
5 2127 0.173331602885562 469818086.513428
6 2127 0.173331602885562 469818086.513428
7 2127 0.173331602885562 469818086.513428
8 2127 0.173331602885562 469818086.513428
9 2127 0.173331602885562 469818086.513428
10 2127 0.173331602885562 469818086.513428
11 2127 0.173331602885562 469818086.513428
12 2127 0.173331602885562 469818086.513428
13 2127 0.173331602885562 469818086.513428
Attachments
0.190000	0.280000	0.250000	0.280000
0.190000	0.280000	0.240000	0.290000
0.200000	0.280000	0.230000	0.290000
0.200000	0.290000	0.220000	0.290000
0.200000	0.290000	0.220000	0.290000
0.200000	0.290000	0.220000	0.290000
0.190000	0.290000	0.220000	0.300000
0.190000	0.300000	0.220000	0.290000
0.200000	0.300000	0.210000	0.290000
0.210000	0.290000	0.210000	0.290000
0.200000	0.290000	0.210000	0.300000
0.210000	0.280000	0.210000	0.300000
0.210000	0.280000	0.200000	0.310000
0.210000	0.270000	0.200000	0.320000
0.220000	0.270000	0.190000	0.320000
0.210000	0.280000	0.190000	0.320000
0.220000	0.280000	0.190000	0.310000
0.210000	0.280000	0.200000	0.310000

All the output rows are same except the index incrementing?
Not any mathematical formula understandable if you only know Python, not R?

it's normal because the data set is too large , soo this small amount of data will give the same result

OK, then I do not know about the translation, did some interactive learning what those R lines do, quite simple in Python/Numpy also. But maybe your problem is not language, but the logic. You have invariant inner loop counting again same values. This changed R script gave same result file, Inner loop invariants count only once in outer loop:

library(pgirmess)
tab=read.table("test1.txt",stringsAsFactors=FALSE)

tab$value = log2((tab$V3+tab$V4)/(tab$V2+tab$V1))
re=NULL
rowsnb=nrow(tab)

for(q in seq(1,rowsnb,by=1)){
    A=(rowsnb*(q^2)) + ((rowsnb*(rowsnb+1)*(2*rowsnb+1))/6)
    for(i in 1:q){
         Cres=(tab[i,]$value)*i
    }
    mb=((rowsnb*A)/Cres)*A
    for(p in seq(1,rowsnb,by=1)){
         re<- rbind(re, data.frame(avalue=A,Cvalue=Cres,mbvalue=mb))
    }
}
write.delim(re, file = "final.txt", row.names = TRUE, quote = FALSE, sep = " ")

I checked the equality of old saved result "final1.txt" with:

for a,b in zip(open('final1.txt'), open('final.txt')):
	if a != b:
		print a
		print b
		print 80*'-'

Which output no differences between files.

Edited 5 Years Ago by pyTony: n/a

thanks , but this will not solve the problem , because the old value of Cres should be added to the old value.

Can not code the algorithm as you have not given it. I only found that line 4 adds 3rd and 4th values and second and first in each line, divides sums and takes log2 of result producing therefore one value from each row. rowsnb is number of lines in file. Big part of formula of A is also invariant in loop, so your code becomes:

library(pgirmess)
tab=read.table("test1.txt",stringsAsFactors=FALSE)

tab$value = log2((tab$V3+tab$V4)/(tab$V2+tab$V1))
re=NULL
rowsnb=nrow(tab)
row_stuff = ((rowsnb*(rowsnb+1)*(2*rowsnb+1))/6)

for(q in seq(1,rowsnb,by=1)){
    A=(rowsnb*(q^2)) + row_stuff
    for(i in 1:q){
         Cres=(tab[i,]$value)*i
    }
    mb=((rowsnb*A)/Cres)*A
    for(p in seq(1,rowsnb,by=1)){
         re<- rbind(re, data.frame(avalue=A,Cvalue=Cres,mbvalue=mb))
    }
}
write.delim(re, file = "final.txt", row.names = TRUE, quote = FALSE, sep = " ")

Edited 5 Years Ago by pyTony: n/a

This Python script gives same 324 line output (L:\R\final.txt) except slight formatting differences as original posted R script:

from __future__ import print_function
from math import log
import os

table = (tuple(map(float, b.split()) for b in open('test1.txt')))
values = [log(sum(row[2:])/sum(row[:2]), 2) for row in table]
rowsnb = len(table)
row_stuff = ((rowsnb*(rowsnb+1)*(2*rowsnb+1))/6)
Cres = [q*value for q, value in enumerate(values, 1)]
A = [rowsnb*(q**2) + row_stuff for q in range(1,rowsnb+1)]
mb = [rowsnb*a*a/cres if cres else float('inf') for a, cres in zip(A, Cres)]


try:
    os.remove('final2.txt')
except WindowsError:
    pass
    
with open('final2.txt','a') as outp:
    result = tuple('%d %.14f %.5f' % row
                    for row in zip(A, Cres, mb))
    print('avalue Cvalue mbvalue', file=outp)
    print('\n'.join('%i %s' % (rowno, row) for rowno, row in
                    enumerate((r for r in result for count in range(rowsnb)),1)
                    ),
          file=outp
          )
#test that output match within limit    
limit = 1E-5
for a,b in zip(open('final2.txt'), open('L:/R/final.txt')):
    if '.' in a and any(abs(float(aval) - float(bval)) > limit
                        for aval,bval in zip(a.split(), b.split())):
        print(a)
        print(b)
        print(80*'-')
This article has been dead for over six months. Start a new discussion instead.