from R to python

Question

weblover 0 Junior Poster

13 Years Ago

Hello all,

i was working on an R script that will read a huge text file and make some calculation inside it, but i figured out that it will need long time to do the job , so i'm trying to convert it into python.

is there any way to do this automatically ? because i'm having some troubles making the conversion line by line.

that's the code if someone can help.

this script will calculate 3 values : A,Cres and mb for each (p,q) wich is the line index pair.

tab=read.table(test.txt,sep=" ")
tab$value =((tab$V3+tab$V4)/(tab$V2+tab$V1))
    re=NULL
    rowsnb=nrow(tab)

    for(q in seq(1,rowsnb,by=150)) {
        for(p in seq(1,rowsnb,by=150)) {

             A=(rowsnb*(q^2))

             for(i in 1:q) {
                  Cres=(tab[i,]$value)*i
             }


     
           mb=((rowsnb*A)/Cres)*A
             final <- rbind(final, data.frame(avalue=A,Cvalue=Cres,mbvalue=mb))
        }
    }
    write.delim(final, file = "final.txt", row.names = TRUE, quote = FALSE, sep = " ")

thanks for help

python

2 Contributors
9 Replies
178 Views
1 Day Discussion Span
Latest Post 13 Years Ago Latest Post by TrustyTony

TrustyTony 888 pyMod

13 Years Ago

Any sample of data possible say 20x20 top left corner? And formula for output in normal form with one example count from that data?

TrustyTony 888 pyMod

13 Years Ago

All the output rows are same except the index incrementing?
Not any mathematical formula understandable if you only know Python, not R?

TrustyTony 888 pyMod

13 Years Ago

OK, then I do not know about the translation, did some interactive learning what those R lines do, quite simple in Python/Numpy also. But maybe your problem is not language, but the logic. You have invariant inner loop counting again same values. This changed R script gave same result file, Inner loop invariants count only once in outer loop:

library(pgirmess)
tab=read.table("test1.txt",stringsAsFactors=FALSE)

tab$value = log2((tab$V3+tab$V4)/(tab$V2+tab$V1))
re=NULL
rowsnb=nrow(tab)

for(q in seq(1,rowsnb,by=1)){
    A=(rowsnb*(q^2)) + ((rowsnb*(rowsnb+1)*(2*rowsnb+1))/6)
    for(i in 1:q){
         Cres=(tab[i,]$value)*i
    }
    mb=((rowsnb*A)/Cres)*A
    for(p in seq(1,rowsnb,by=1)){
         re<- rbind(re, data.frame(avalue=A,Cvalue=Cres,mbvalue=mb))
    }
}
write.delim(re, file = "final.txt", row.names = TRUE, quote = FALSE, sep = " ")

I checked the equality of old saved result "final1.txt" with:

for a,b in zip(open('final1.txt'), open('final.txt')):
	if a != b:
		print a
		print b
		print 80*'-'

Which output no differences between files.

Edited 13 Years Ago by TrustyTony because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

weblover 0 Junior Poster · Answer 1 · 2011-05-24T14:21:54+00:00

Hi,

that's an example of the input :

i will change the code a little bit to be able to transform an output for you .

the code will be like this :

library(pgirmess)
tab=read.table("test1.txt",stringsAsFactors=FALSE)

tab$value = log2((tab$V3+tab$V4)/(tab$V2+tab$V1))
    re=NULL
    rowsnb=nrow(tab)

    for(q in seq(1,rowsnb,by=1)) {
        for(p in seq(1,rowsnb,by=1)) {

             A=(rowsnb*(q^2)) + ((rowsnb*(rowsnb+1)*(2*rowsnb+1))/6)

             for(i in 1:q)
		 {
                  Cres=(tab[i,]$value)*i
             }


    
           mb=((rowsnb*A)/Cres)*A

             re<- rbind(re, data.frame(avalue=A,Cvalue=Cres,mbvalue=mb))
        }
    }
    write.delim(re, file = "final.txt", row.names = TRUE, quote = FALSE, sep = " ")

and kindly find attached the file that is a part of my data that i'm working on.

and this is some part of the output of the new script:

avalue Cvalue mbvalue
1 2127 0.173331602885562 469818086.513428
2 2127 0.173331602885562 469818086.513428
3 2127 0.173331602885562 469818086.513428
4 2127 0.173331602885562 469818086.513428
5 2127 0.173331602885562 469818086.513428
6 2127 0.173331602885562 469818086.513428
7 2127 0.173331602885562 469818086.513428
8 2127 0.173331602885562 469818086.513428
9 2127 0.173331602885562 469818086.513428
10 2127 0.173331602885562 469818086.513428
11 2127 0.173331602885562 469818086.513428
12 2127 0.173331602885562 469818086.513428
13 2127 0.173331602885562 469818086.513428

weblover 0 Junior Poster · Answer 2 · 2011-05-24T15:37:10+00:00

it's normal because the data set is too large , soo this small amount of data will give the same result

weblover 0 Junior Poster · Answer 3 · 2011-05-24T18:17:00+00:00

thanks , but this will not solve the problem , because the old value of Cres should be added to the old value.

TrustyTony 888 pyMod Team Colleague Featured Poster · Answer 4 · 2011-05-24T18:32:39+00:00

Can not code the algorithm as you have not given it. I only found that line 4 adds 3rd and 4th values and second and first in each line, divides sums and takes log2 of result producing therefore one value from each row. rowsnb is number of lines in file. Big part of formula of A is also invariant in loop, so your code becomes:

library(pgirmess)
tab=read.table("test1.txt",stringsAsFactors=FALSE)

tab$value = log2((tab$V3+tab$V4)/(tab$V2+tab$V1))
re=NULL
rowsnb=nrow(tab)
row_stuff = ((rowsnb*(rowsnb+1)*(2*rowsnb+1))/6)

for(q in seq(1,rowsnb,by=1)){
    A=(rowsnb*(q^2)) + row_stuff
    for(i in 1:q){
         Cres=(tab[i,]$value)*i
    }
    mb=((rowsnb*A)/Cres)*A
    for(p in seq(1,rowsnb,by=1)){
         re<- rbind(re, data.frame(avalue=A,Cvalue=Cres,mbvalue=mb))
    }
}
write.delim(re, file = "final.txt", row.names = TRUE, quote = FALSE, sep = " ")

weblover 0 Junior Poster · Answer 5 · 2011-05-24T18:39:17+00:00

weblover 0 Junior Poster

13 Years Ago

i'll give this a try

TrustyTony 888 pyMod Team Colleague Featured Poster · Answer 6 · 2011-05-25T05:05:20+00:00

This Python script gives same 324 line output (L:\R\final.txt) except slight formatting differences as original posted R script:

from __future__ import print_function
from math import log
import os

table = (tuple(map(float, b.split()) for b in open('test1.txt')))
values = [log(sum(row[2:])/sum(row[:2]), 2) for row in table]
rowsnb = len(table)
row_stuff = ((rowsnb*(rowsnb+1)*(2*rowsnb+1))/6)
Cres = [q*value for q, value in enumerate(values, 1)]
A = [rowsnb*(q**2) + row_stuff for q in range(1,rowsnb+1)]
mb = [rowsnb*a*a/cres if cres else float('inf') for a, cres in zip(A, Cres)]


try:
    os.remove('final2.txt')
except WindowsError:
    pass
    
with open('final2.txt','a') as outp:
    result = tuple('%d %.14f %.5f' % row
                    for row in zip(A, Cres, mb))
    print('avalue Cvalue mbvalue', file=outp)
    print('\n'.join('%i %s' % (rowno, row) for rowno, row in
                    enumerate((r for r in result for count in range(rowsnb)),1)
                    ),
          file=outp
          )
#test that output match within limit    
limit = 1E-5
for a,b in zip(open('final2.txt'), open('L:/R/final.txt')):
    if '.' in a and any(abs(float(aval) - float(bval)) > limit
                        for aval,bval in zip(a.split(), b.split())):
        print(a)
        print(b)
        print(80*'-')