954,525 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

generating a distance matrix

hello. I am quite new in python so i have a question. I have a csv file in which i have names of 30 cities and their coordinates(lat and long). i want to generate a distance matrix for these cities. How can i do this ?

toritza
Junior Poster in Training
51 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

You can generate list of list of distances to every other city by list comprehension or for loop using Pythagoras triangle formula, if exactness is not needed, else you need to find about calculating distances on surface of ball of 40000 km perimeter.

pyTony
pyMod
Moderator
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
 
You can generate list of list of distances to every other city by list comprehension or for loop using Pythagoras triangle formula, if exactness is not needed, else you need to find about calculating distances on surface of ball of 40000 km perimeter.

The formula i need is the second one but it seems that my file is not quite in the wright format
name1,lat1 long1
name2,lat2 long2
name3,lat3 long3
name4,lat4 long4

i tried using the replace function(tried to replace the empty space betwwen lat and long with',') but some of the names of the cities are compound(Los Angeles) so when trying to define distance_matrix a could not make it. My question is how can i replace the empty space between lat and long without changing the compund names?
If someone could give me some advice i would be grateful

toritza
Junior Poster in Training
51 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

line.rsplit(None, 1) , then split first element of result from comma.

pyTony
pyMod
Moderator
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
 
line.rsplit(None, 1) , then split first element of result from comma.



Thank you for the advice. I could print in shell in this format.
['name1,lat1','long1']
['name2,lat2','long2']
['name3,lat3','long3']

Still i have yet 2 problems to solve before i could define a distance_matrix.
---The first one is : where should i placethe split(',') of the first element of the list and of course how could i write after that in a new file so that i could print ['name1','lat1','long1']

---The second one is: i have tried to write the lines but it shows this error
lines = open("Better.csv").read().split(';')
lines.sort()
ot = open("Ord.csv", "w")
for line in lines:
print line.rsplit(None,1)
ot.write(line.rsplit(None,1))
ot.close()

File "C:/Users/gg/Desktop/city.py", line 6, in
ot.write(line.rsplit(None,1))
TypeError: expected a character buffer object
So could i fix this second problem?

I am again sorry if any of my question are silly but as i said i am just begining to learn python.

toritza
Junior Poster in Training
51 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

I was able in the end to write the file in the format i wanted but using this code

lines = open("Better.csv").read().split(';')
lines.sort()
t = open("Ord.csv", "w")
for line in lines:
    res = line.rsplit(None,1)
    ts = str(res)
    t.write(ts+'\n')
t.close()
words = open("Ord.csv", "r")
words2 = open("Ord2.csv", "w")
for word in words:
    res = word.rsplit(',')
    ts = str(res)
    words2.write(ts +'\n')
words.close()
words2.close()

f=open("Ord3.csv","w")
s=open("Ord2.csv").read()
s=s.replace('[','')
s=s.replace("']","")
s=s.replace("]","")
s=s.replace("'","")
s=s.replace('" ','"')
s=s.replace(' "',' ')
s=s.replace('"','')
f.write(s +'\n')
f.close()


Still now i have another problem. My format is
name, lat, long\n
name2, lat2, long2\n

How could i remove the \n and write the wanted format in a new file?
name, lat, long
name2, lat2, long2

I tried using strip() but the \n remaind there.

toritza
Junior Poster in Training
51 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

use strip('\n')

zizuno
Junior Poster in Training
62 posts since Jan 2011
Reputation Points: 10
Solved Threads: 8
 
use strip('\n')


I tried to add this to my cod but still nothing

s=s.rstrip('\n')

What is the problem ?

toritza
Junior Poster in Training
51 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

I tried to add this to my cod but still nothing

s=s.rstrip('\n')

What is the problem ?


If you could attach Better.csv to a post, we could see what your code does...

Gribouillis
Posting Maven
Moderator
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
 
If you could attach Better.csv to a post, we could see what your code does...

I tried to upload but i got this error:

Better.csv:
Invalid File
Should i change the format of the file into txt?

My file Better.csv has this format:
name3,long3 lat3;name1,long1 lat1;name2,long2 lat2;

And some of the names are compund. I tried first to sort them, write sorted in another file. After that i split long lat using split(None,1) but it give a format i did not really need.So using the posted code before i could write my new file in a format i could almost use for my matrix,
i said almost because each line has \n

So i should change the format of the file to be able to attach it?

toritza
Junior Poster in Training
51 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

I tried to upload but i got this error:

Better.csv: Invalid File Should i change the format of the file into txt?

My file Better.csv has this format: name3,long3 lat3;name1,long1 lat1;name2,long2 lat2;

And some of the names are compund. I tried first to sort them, write sorted in another file. After that i split long lat using split(None,1) but it give a format i did not really need.So using the posted code before i could write my new file in a format i could almost use for my matrix, i said almost because each line has \n

So i should change the format of the file to be able to attach it?


You can zip the file and attach the zipped file.

Gribouillis
Posting Maven
Moderator
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
 
You can zip the file and attach the zipped file.



Here is the zipped file.

Attachments Better.zip (14.09KB)
toritza
Junior Poster in Training
51 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

This could be what you want

SRC_FILE = "Better.csv"
DST_FILE = "Output.csv"

def gen_triples():
    with open(SRC_FILE) as fin:
        for record in fin.read().split(';'):
            if not record: # ignore empty record
                continue
            name, coords = record.split(",")
            lat, lon = coords.split()
            yield name, lat, lon

def write_output():
    with open(DST_FILE, "w") as fout:
        for name, lat, lon in sorted(gen_triples()):
            fout.write("{na}, {la}, {lo}\n".format(na = name, la = lat, lo = lon))
            
if __name__ == "__main__":
    write_output()

Try to understand every bit of it :)

Gribouillis
Posting Maven
Moderator
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
 

This could be what you want

SRC_FILE = "Better.csv"
DST_FILE = "Output.csv"

def gen_triples():
    with open(SRC_FILE) as fin:
        for record in fin.read().split(';'):
            if not record: # ignore empty record
                continue
            name, coords = record.split(",")
            lat, lon = coords.split()
            yield name, lat, lon

def write_output():
    with open(DST_FILE, "w") as fout:
        for name, lat, lon in sorted(gen_triples()):
            fout.write("{na}, {la}, {lo}\n".format(na = name, la = lat, lo = lon))
            
if __name__ == "__main__":
    write_output()

Try to understand every bit of it :)

It is quite advanced. I am just begining to learn python. I will try to understand but my originalcode can be in any way changed to come to the same result?

toritza
Junior Poster in Training
51 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 
It is quite advanced. I am just begining to learn python. I will try to understand but my originalcode can be in any way changed to come to the same result?


Yes, the key points are the way the records read in the source file are transformed into a tuple (name, lat, lon) containing the 3 values, and then how these 3 values are formatted for the output file. You can write a similar code without 'yield', 'with' and even without functions. For example instead of 'yield', you could append the tuple to a list.

Gribouillis
Posting Maven
Moderator
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
 
Yes, the key points are the way the records read in the source file are transformed into a tuple (name, lat, lon) containing the 3 values, and then how these 3 values are formatted for the output file. You can write a similar code without 'yield', 'with' and even without functions. For example instead of 'yield', you could append the tuple to a list.

I will try to make my own code work but meanwhile comming back to the name of the thred i now want to generate the distance matrix for my cities.

I tried to use the next code for generating it:

from math import sin as sin, cos as cos, acos as acos, radians as radians

coords_list=open("Output.csv").read()
print coords_list
ff=open("Te.csv","w")
        
def distance_matrix(coords_list):
    '''Calculates the distances (in km) between any two cities based on the formulas
    c = sin(lati1)*sin(lati2)+cos(longi1-longi2)*cos(lati1)*cos(lati2)
    d = EARTH_RADIUS*Arccos(c)
    where EARTH_RADIUS is in km and the angles are in radians.
    Source: http://mathforum.org/library/drmath/view/54680.html
    This function returns the matrix.'''
    
    matrix={}
    EARTH_RADIUS = 6378.1
    #Populate the matrix.
    for (name2,longi2,lati2) in coords_list:
        for (name1,longi1,lati1) in coords_list:
            if name1!=name2:
                #if name1==name2, then c will be equal to 1, and acos(c) will fail
                c = sin(radians(lati1)) * sin(radians(lati2)) + \
                    cos(radians(longi1-longi2)) * \
                    cos(radians(lati1)) * cos(radians(lati2))
                distance = EARTH_RADIUS * acos(c)
                matrix[name1,name2] = distance
            else:
                #Case when name1==name2...                                               
                matrix[name1,name2] = 0.0
    return matrix
    print matrix
    ff.write(matrix)
ff.close()


But it will only print the file(Output.csv) in shell without actualy creating the matrix.
So what is wrong with this code?
Again i am sorry if i asksome silly question but my knowledge of python is in the begining

toritza
Junior Poster in Training
51 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

I will try to make my own code work but meanwhile comming back to the name of the thred i now want to generate the distance matrix for my cities. I tried to use the next code for generating it:

from math import sin as sin, cos as cos, acos as acos, radians as radians

coords_list=open("Output.csv").read()
print coords_list
ff=open("Te.csv","w")
        
def distance_matrix(coords_list):
    '''Calculates the distances (in km) between any two cities based on the formulas
    c = sin(lati1)*sin(lati2)+cos(longi1-longi2)*cos(lati1)*cos(lati2)
    d = EARTH_RADIUS*Arccos(c)
    where EARTH_RADIUS is in km and the angles are in radians.
    Source: http://mathforum.org/library/drmath/view/54680.html
    This function returns the matrix.'''
    
    matrix={}
    EARTH_RADIUS = 6378.1
    #Populate the matrix.
    for (name2,longi2,lati2) in coords_list:
        for (name1,longi1,lati1) in coords_list:
            if name1!=name2:
                #if name1==name2, then c will be equal to 1, and acos(c) will fail
                c = sin(radians(lati1)) * sin(radians(lati2)) + \
                    cos(radians(longi1-longi2)) * \
                    cos(radians(lati1)) * cos(radians(lati2))
                distance = EARTH_RADIUS * acos(c)
                matrix[name1,name2] = distance
            else:
                #Case when name1==name2...                                               
                matrix[name1,name2] = 0.0
    return matrix
    print matrix
    ff.write(matrix)
ff.close()

But it will only print the file(Output.csv) in shell without actualy creating the matrix. So what is wrong with this code? Again i am sorry if i asksome silly question but my knowledge of python is in the begining


The block def distance_matrix(coords_list): ... defines a function distance_matrix() but does not execute that function. A function is executed when it is called, so replace the last line by

distance_matrix(coords_list) # call the function
ff.close()

There are other problems. The coord_list should be a list of tuples extracted from the cvs file and not a string, so you could write

coords_list = list()
for line in open("Output.csv"):
    line = line.strip()
    if line:
        name, lat, lon = line.split(",")
        lat = float(lat) # convert string to float value
        lon = float(lon)
        coords_list.append( (name, lat, lon) )

also the ff.write(matrix) will produce awful results. I suggest

from pprint import pprint
pprint(matrix, ff)

Another problem is that I thought the data were (name, latitude, longitude) and you seem to use (name, longitude, latitude).

Gribouillis
Posting Maven
Moderator
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
 

The format of the data in the Output.csv is name, longitude, latitude. I remember that in some of my fist posts i might have used the wrong format.I am sorry.

I tried to repair my code using the advice you gave but this code

from math import sin as sin, cos as cos, acos as acos, radians as radians

ff=open("Te.csv","w")

coords_list = list()
for line in open("Output.csv"):
    line = line.strip()
    if line:
        name, lon, lat = line.split(",")
        lon = float(lon) # convert string to float value
        lat = float(lat)
        coords_list.append( (name, lon, lat) )
    print line
        
def distance_matrix(coords_list):
    '''Calculates the distances (in km) between any two cities based on the formulas
    c = sin(lati1)*sin(lat2)+cos(lon1-lon2)*cos(lat1)*cos(lati2)
    d = EARTH_RADIUS*Arccos(c)
    where EARTH_RADIUS is in km and the angles are in radians.
    Source: http://mathforum.org/library/drmath/view/54680.html
    This function returns the matrix.'''
    
    matrix={}
    EARTH_RADIUS = 6378.1
    #Populate the matrix.
    for (name2,lon2,lat2) in coords_list:
        for (name1,lon1,lat1) in coords_list:
            if name1!=name2:
                #if name1==name2, then c will be equal to 1, and acos(c) will fail
                c = sin(radians(lat1)) * sin(radians(lat2)) + \
                    cos(radians(lon1-lon2)) * \
                    cos(radians(lat1)) * cos(radians(lat2))
                distance = EARTH_RADIUS * acos(c)
                matrix[name1,name2] = distance
            else:
                #Case when name1==name2...                                               
                matrix[name1,name2] = 0.0
distance_matrix(coords_list) # call the function
from pprint import pprint
pprint(matrix, ff)
ff.close()

will give this error
Traceback (most recent call last):
File "C:\Users\gg\Desktop\t.py", line 40, in
pprint(matrix, ff)
NameError: name 'matrix' is not defined

So if you could tell me what is wrong i would be grateful. I am in the begining in learning python and these are new things for me.

toritza
Junior Poster in Training
51 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

The format of the data in the Output.csv is name, longitude, latitude. I remember that in some of my fist posts i might have used the wrong format.I am sorry.

I tried to repair my code using the advice you gave but this code

from math import sin as sin, cos as cos, acos as acos, radians as radians

ff=open("Te.csv","w")

coords_list = list()
for line in open("Output.csv"):
    line = line.strip()
    if line:
        name, lon, lat = line.split(",")
        lon = float(lon) # convert string to float value
        lat = float(lat)
        coords_list.append( (name, lon, lat) )
    print line
        
def distance_matrix(coords_list):
    '''Calculates the distances (in km) between any two cities based on the formulas
    c = sin(lati1)*sin(lat2)+cos(lon1-lon2)*cos(lat1)*cos(lati2)
    d = EARTH_RADIUS*Arccos(c)
    where EARTH_RADIUS is in km and the angles are in radians.
    Source: http://mathforum.org/library/drmath/view/54680.html
    This function returns the matrix.'''
    
    matrix={}
    EARTH_RADIUS = 6378.1
    #Populate the matrix.
    for (name2,lon2,lat2) in coords_list:
        for (name1,lon1,lat1) in coords_list:
            if name1!=name2:
                #if name1==name2, then c will be equal to 1, and acos(c) will fail
                c = sin(radians(lat1)) * sin(radians(lat2)) + \
                    cos(radians(lon1-lon2)) * \
                    cos(radians(lat1)) * cos(radians(lat2))
                distance = EARTH_RADIUS * acos(c)
                matrix[name1,name2] = distance
            else:
                #Case when name1==name2...                                               
                matrix[name1,name2] = 0.0
distance_matrix(coords_list) # call the function
from pprint import pprint
pprint(matrix, ff)
ff.close()

will give this error Traceback (most recent call last): File "C:\Users\gg\Desktop\t.py", line 40, in pprint(matrix, ff) NameError: name 'matrix' is not defined

So if you could tell me what is wrong i would be grateful. I am in the begining in learning python and these are new things for me.


The problem is that matrix is a local variable of the function distance_matrix. It does not exist outside the function. The solution is that your function returns the matrix. Here is the code. It generates a 23 MB matrix file

from math import sin, cos, acos, radians

ff=open("Te.csv","w")

coords_list = list()
for line in open("Output.csv"):
    line = line.strip()
    if line:
        name, lon, lat = line.split(",")
        lon = float(lon) # convert string to float value
        lat = float(lat)
        coords_list.append( (name, lon, lat) )
        
def distance_matrix(coords_list):
    '''Calculates the distances (in km) between any two cities based on the formulas
    c = sin(lati1)*sin(lat2)+cos(lon1-lon2)*cos(lat1)*cos(lati2)
    d = EARTH_RADIUS*Arccos(c)
    where EARTH_RADIUS is in km and the angles are in radians.
    Source: http://mathforum.org/library/drmath/view/54680.html
    This function returns the matrix.'''
    
    matrix={}
    EARTH_RADIUS = 6378.1
    #Populate the matrix.
    for (name2,lon2,lat2) in coords_list:
        for (name1,lon1,lat1) in coords_list:
            if name1!=name2:
                #if name1==name2, then c will be equal to 1, and acos(c) will fail
                c = sin(radians(lat1)) * sin(radians(lat2)) + \
                    cos(radians(lon1-lon2)) * \
                    cos(radians(lat1)) * cos(radians(lat2))
                distance = EARTH_RADIUS * acos(c)
                matrix[name1,name2] = distance
            else:
                #Case when name1==name2...                                               
                matrix[name1,name2] = 0.0
    return matrix

matrix = distance_matrix(coords_list) # call the function and catch return value
from pprint import pprint
pprint(matrix, ff)
ff.close()

The format of the generated file could be improved. Our output is not a csv format.

Gribouillis
Posting Maven
Moderator
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
 

The problem is that matrix is a local variable of the function distance_matrix. It does not exist outside the function. The solution is that your function returns the matrix. Here is the code. It generates a 23 MB matrix file

The format of the generated file could be improved. Our output is not a csv format.

the matrix i wanted to print should have looked like this

name1    name2     name3
name1    0      distA     distB
name2  DistC     0        DisD
name3  DiistF   DistV      0

i want to print it in shell like this and write it in a file too.

toritza
Junior Poster in Training
51 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: