hello. I am quite new in python so i have a question. I have a csv file in which i have names of 30 cities and their coordinates(lat and long). i want to generate a distance matrix for these cities. How can i do this ?

## All 85 Replies

You can generate list of list of distances to every other city by list comprehension or for loop using Pythagoras triangle formula, if exactness is not needed, else you need to find about calculating distances on surface of ball of 40000 km perimeter.

The formula i need is the second one but it seems that my file is not quite in the wright format
name1,lat1 long1
name2,lat2 long2
name3,lat3 long3
name4,lat4 long4

i tried using the replace function(tried to replace the empty space betwwen lat and long with',') but some of the names of the cities are compound(Los Angeles) so when trying to define distance_matrix a could not make it. My question is how can i replace the empty space between lat and long without changing the compund names?
If someone could give me some advice i would be grateful

line.rsplit(None, 1) , then split first element of result from comma.

line.rsplit(None, 1) , then split first element of result from comma.

Thank you for the advice. I could print in shell in this format.

Still i have yet 2 problems to solve before i could define a distance_matrix.
---The first one is : where should i place the split(',') of the first element of the list and of course how could i write after that in a new file so that i could print

---The second one is: i have tried to write the lines but it shows this error
lines.sort()
ot = open("Ord.csv", "w")
for line in lines:
print line.rsplit(None,1)
ot.write(line.rsplit(None,1))
ot.close()

File "C:/Users/gg/Desktop/city.py", line 6, in <module>
ot.write(line.rsplit(None,1))
TypeError: expected a character buffer object
So could i fix this second problem?

I am again sorry if any of my question are silly but as i said i am just begining to learn python.

I was able in the end to write the file in the format i wanted but using this code

``````lines = open("Better.csv").read().split(';')
lines.sort()
t = open("Ord.csv", "w")
for line in lines:
res = line.rsplit(None,1)
ts = str(res)
t.write(ts+'\n')
t.close()
words = open("Ord.csv", "r")
words2 = open("Ord2.csv", "w")
for word in words:
res = word.rsplit(',')
ts = str(res)
words2.write(ts +'\n')
words.close()
words2.close()

f=open("Ord3.csv","w")
s=s.replace('[','')
s=s.replace("']","")
s=s.replace("]","")
s=s.replace("'","")
s=s.replace('" ','"')
s=s.replace(' "',' ')
s=s.replace('"','')
f.write(s +'\n')
f.close()``````

Still now i have another problem. My format is
name, lat, long\n
name2, lat2, long2\n

How could i remove the \n and write the wanted format in a new file?
name, lat, long
name2, lat2, long2

I tried using strip() but the \n remaind there.

use strip('\n')

use strip('\n')

I tried to add this to my cod but still nothing

``s=s.rstrip('\n')``

What is the problem ?

I tried to add this to my cod but still nothing

``s=s.rstrip('\n')``

What is the problem ?

If you could attach Better.csv to a post, we could see what your code does...

If you could attach Better.csv to a post, we could see what your code does...

I tried to upload but i got this error:

Better.csv:
Invalid File
Should i change the format of the file into txt?

My file Better.csv has this format:
name3,long3 lat3;name1,long1 lat1;name2,long2 lat2;

And some of the names are compund. I tried first to sort them, write sorted in another file. After that i split long lat using split(None,1) but it give a format i did not really need.So using the posted code before i could write my new file in a format i could almost use for my matrix,
i said almost because each line has \n

So i should change the format of the file to be able to attach it?

I tried to upload but i got this error:

Better.csv:
Invalid File
Should i change the format of the file into txt?

My file Better.csv has this format:
name3,long3 lat3;name1,long1 lat1;name2,long2 lat2;

And some of the names are compund. I tried first to sort them, write sorted in another file. After that i split long lat using split(None,1) but it give a format i did not really need.So using the posted code before i could write my new file in a format i could almost use for my matrix,
i said almost because each line has \n

So i should change the format of the file to be able to attach it?

You can zip the file and attach the zipped file.

You can zip the file and attach the zipped file.

Here is the zipped file.

This could be what you want

``````SRC_FILE = "Better.csv"
DST_FILE = "Output.csv"

def gen_triples():
with open(SRC_FILE) as fin:
if not record: # ignore empty record
continue
name, coords = record.split(",")
lat, lon = coords.split()
yield name, lat, lon

def write_output():
with open(DST_FILE, "w") as fout:
for name, lat, lon in sorted(gen_triples()):
fout.write("{na}, {la}, {lo}\n".format(na = name, la = lat, lo = lon))

if __name__ == "__main__":
write_output()``````

Try to understand every bit of it :)

This could be what you want

``````SRC_FILE = "Better.csv"
DST_FILE = "Output.csv"

def gen_triples():
with open(SRC_FILE) as fin:
if not record: # ignore empty record
continue
name, coords = record.split(",")
lat, lon = coords.split()
yield name, lat, lon

def write_output():
with open(DST_FILE, "w") as fout:
for name, lat, lon in sorted(gen_triples()):
fout.write("{na}, {la}, {lo}\n".format(na = name, la = lat, lo = lon))

if __name__ == "__main__":
write_output()``````

Try to understand every bit of it :)

It is quite advanced. I am just begining to learn python. I will try to understand but my originalcode can be in any way changed to come to the same result?

It is quite advanced. I am just begining to learn python. I will try to understand but my originalcode can be in any way changed to come to the same result?

Yes, the key points are the way the records read in the source file are transformed into a tuple (name, lat, lon) containing the 3 values, and then how these 3 values are formatted for the output file. You can write a similar code without 'yield', 'with' and even without functions. For example instead of 'yield', you could append the tuple to a list.

Yes, the key points are the way the records read in the source file are transformed into a tuple (name, lat, lon) containing the 3 values, and then how these 3 values are formatted for the output file. You can write a similar code without 'yield', 'with' and even without functions. For example instead of 'yield', you could append the tuple to a list.

I will try to make my own code work but meanwhile comming back to the name of the thred i now want to generate the distance matrix for my cities.

I tried to use the next code for generating it:

``````from math import sin as sin, cos as cos, acos as acos, radians as radians

print coords_list
ff=open("Te.csv","w")

def distance_matrix(coords_list):
'''Calculates the distances (in km) between any two cities based on the formulas
c = sin(lati1)*sin(lati2)+cos(longi1-longi2)*cos(lati1)*cos(lati2)
Source: http://mathforum.org/library/drmath/view/54680.html
This function returns the matrix.'''

matrix={}
#Populate the matrix.
for (name2,longi2,lati2) in coords_list:
for (name1,longi1,lati1) in coords_list:
if name1!=name2:
#if name1==name2, then c will be equal to 1, and acos(c) will fail
matrix[name1,name2] = distance
else:
#Case when name1==name2...
matrix[name1,name2] = 0.0
return matrix
print matrix
ff.write(matrix)
ff.close()``````

But it will only print the file(Output.csv) in shell without actualy creating the matrix.
So what is wrong with this code?
Again i am sorry if i asksome silly question but my knowledge of python is in the begining

I will try to make my own code work but meanwhile comming back to the name of the thred i now want to generate the distance matrix for my cities.

I tried to use the next code for generating it:

``````from math import sin as sin, cos as cos, acos as acos, radians as radians

print coords_list
ff=open("Te.csv","w")

def distance_matrix(coords_list):
'''Calculates the distances (in km) between any two cities based on the formulas
c = sin(lati1)*sin(lati2)+cos(longi1-longi2)*cos(lati1)*cos(lati2)
Source: http://mathforum.org/library/drmath/view/54680.html
This function returns the matrix.'''

matrix={}
#Populate the matrix.
for (name2,longi2,lati2) in coords_list:
for (name1,longi1,lati1) in coords_list:
if name1!=name2:
#if name1==name2, then c will be equal to 1, and acos(c) will fail
matrix[name1,name2] = distance
else:
#Case when name1==name2...
matrix[name1,name2] = 0.0
return matrix
print matrix
ff.write(matrix)
ff.close()``````

But it will only print the file(Output.csv) in shell without actualy creating the matrix.
So what is wrong with this code?
Again i am sorry if i asksome silly question but my knowledge of python is in the begining

The block `def distance_matrix(coords_list): ...` defines a function distance_matrix() but does not execute that function. A function is executed when it is called, so replace the last line by

``````distance_matrix(coords_list) # call the function
ff.close()``````

There are other problems. The coord_list should be a list of tuples extracted from the cvs file and not a string, so you could write

``````coords_list = list()
for line in open("Output.csv"):
line = line.strip()
if line:
name, lat, lon = line.split(",")
lat = float(lat) # convert string to float value
lon = float(lon)
coords_list.append( (name, lat, lon) )``````

also the ff.write(matrix) will produce awful results. I suggest

``````from pprint import pprint
pprint(matrix, ff)``````

Another problem is that I thought the data were (name, latitude, longitude) and you seem to use (name, longitude, latitude).

The format of the data in the Output.csv is name, longitude, latitude. I remember that in some of my fist posts i might have used the wrong format.I am sorry.

I tried to repair my code using the advice you gave but this code

``````from math import sin as sin, cos as cos, acos as acos, radians as radians

ff=open("Te.csv","w")

coords_list = list()
for line in open("Output.csv"):
line = line.strip()
if line:
name, lon, lat = line.split(",")
lon = float(lon) # convert string to float value
lat = float(lat)
coords_list.append( (name, lon, lat) )
print line

def distance_matrix(coords_list):
'''Calculates the distances (in km) between any two cities based on the formulas
c = sin(lati1)*sin(lat2)+cos(lon1-lon2)*cos(lat1)*cos(lati2)
Source: http://mathforum.org/library/drmath/view/54680.html
This function returns the matrix.'''

matrix={}
#Populate the matrix.
for (name2,lon2,lat2) in coords_list:
for (name1,lon1,lat1) in coords_list:
if name1!=name2:
#if name1==name2, then c will be equal to 1, and acos(c) will fail
matrix[name1,name2] = distance
else:
#Case when name1==name2...
matrix[name1,name2] = 0.0
distance_matrix(coords_list) # call the function
from pprint import pprint
pprint(matrix, ff)
ff.close()``````

will give this error
Traceback (most recent call last):
File "C:\Users\gg\Desktop\t.py", line 40, in <module>
pprint(matrix, ff)
NameError: name 'matrix' is not defined

So if you could tell me what is wrong i would be grateful. I am in the begining in learning python and these are new things for me.

The format of the data in the Output.csv is name, longitude, latitude. I remember that in some of my fist posts i might have used the wrong format.I am sorry.

I tried to repair my code using the advice you gave but this code

``````from math import sin as sin, cos as cos, acos as acos, radians as radians

ff=open("Te.csv","w")

coords_list = list()
for line in open("Output.csv"):
line = line.strip()
if line:
name, lon, lat = line.split(",")
lon = float(lon) # convert string to float value
lat = float(lat)
coords_list.append( (name, lon, lat) )
print line

def distance_matrix(coords_list):
'''Calculates the distances (in km) between any two cities based on the formulas
c = sin(lati1)*sin(lat2)+cos(lon1-lon2)*cos(lat1)*cos(lati2)
Source: http://mathforum.org/library/drmath/view/54680.html
This function returns the matrix.'''

matrix={}
#Populate the matrix.
for (name2,lon2,lat2) in coords_list:
for (name1,lon1,lat1) in coords_list:
if name1!=name2:
#if name1==name2, then c will be equal to 1, and acos(c) will fail
matrix[name1,name2] = distance
else:
#Case when name1==name2...
matrix[name1,name2] = 0.0
distance_matrix(coords_list) # call the function
from pprint import pprint
pprint(matrix, ff)
ff.close()``````

will give this error
Traceback (most recent call last):
File "C:\Users\gg\Desktop\t.py", line 40, in <module>
pprint(matrix, ff)
NameError: name 'matrix' is not defined

So if you could tell me what is wrong i would be grateful. I am in the begining in learning python and these are new things for me.

The problem is that matrix is a local variable of the function distance_matrix. It does not exist outside the function. The solution is that your function returns the matrix. Here is the code. It generates a 23 MB matrix file

``````from math import sin, cos, acos, radians

ff=open("Te.csv","w")

coords_list = list()
for line in open("Output.csv"):
line = line.strip()
if line:
name, lon, lat = line.split(",")
lon = float(lon) # convert string to float value
lat = float(lat)
coords_list.append( (name, lon, lat) )

def distance_matrix(coords_list):
'''Calculates the distances (in km) between any two cities based on the formulas
c = sin(lati1)*sin(lat2)+cos(lon1-lon2)*cos(lat1)*cos(lati2)
Source: http://mathforum.org/library/drmath/view/54680.html
This function returns the matrix.'''

matrix={}
#Populate the matrix.
for (name2,lon2,lat2) in coords_list:
for (name1,lon1,lat1) in coords_list:
if name1!=name2:
#if name1==name2, then c will be equal to 1, and acos(c) will fail
matrix[name1,name2] = distance
else:
#Case when name1==name2...
matrix[name1,name2] = 0.0
return matrix

matrix = distance_matrix(coords_list) # call the function and catch return value
from pprint import pprint
pprint(matrix, ff)
ff.close()``````

The format of the generated file could be improved. Our output is not a csv format.

The problem is that matrix is a local variable of the function distance_matrix. It does not exist outside the function. The solution is that your function returns the matrix. Here is the code. It generates a 23 MB matrix file

The format of the generated file could be improved. Our output is not a csv format.

the matrix i wanted to print should have looked like this

``````name1    name2     name3
name1    0      distA     distB
name2  DistC     0        DisD
name3  DiistF   DistV      0``````

i want to print it in shell like this and write it in a file too.

the matrix i wanted to print should have looked like this
name1 name2 name3
name1 0 distA distB
name2 DistC 0 DisD
name3 DiistF DistV 0

i want to print it in shell like this and write it in a file too.

There are 733 towns, some with very long names, while a typical shell line has 80 characters. I think your format is not realistic :)

There are 733 towns, some with very long names, while a typical shell line has 80 characters. I think your format is not realistic :)

``````name1    name2     name3
name1    0      distA     distB
name2  DistC     0        DisD
name3  DiistF   DistV      0``````

but is there any posibility to write in this format in another file?
And if it is possible what the format of the file should be ?

``````name1    name2     name3
name1    0      distA     distB
name2  DistC     0        DisD
name3  DiistF   DistV      0``````

but is there any posibility to write in this format in another file?

Yes it is possible to write in this format in a file. Notice that the width of each column will be determined by the length of the town's name.

Start by writing only the 2 first lines to see if it is feasable. You must learn a bit of string formatting, read a few posts of this code snippet http://www.daniweb.com/code/snippet232375.html .

The file should be a .txt, or may be even a .csv if you add commas.

Yes it is possible to write in this format in a file. Notice that the width of each column will be determined by the length of the town's name.

Start by writing only the 2 first lines to see if it is feasable. You must learn a bit of string formatting, read a few posts of this code snippet http://www.daniweb.com/code/snippet232375.html .

I will read the about string formatting but how could i alter the code so that it will even for those 3cities write lines in this format. I should define the matrix in another way?

``````name1    name2     name3
name1    0      distA     distB
name2  DistC     0        DisD
name3  DiistF   DistV      0``````

?

I will read the about string formatting but how could i alter the code so that it will read even for those 3cities write lines in this format. I should define the matrix in another way?

``````name1    name2     name3
name1    0      distA     distB
name2  DistC     0        DisD
name3  DiistF   DistV      0``````

?

I think you should first write a csv file in the format

``````name1, name1, 0
name1, name2, dist
name1, name3, dist``````

This can be done by replacing the pprint(matrix, ff) by a a few lines of code (use the list `sorted(matrix.items())` . Once you have this csv file, your code could easily read data in this file to produce the second format.

I think you should first write a csv file in the format

``````name1, name1, 0
name1, name2, dist
name1, name3, dist``````

This can be done by replacing the pprint(matrix, ff) by a a few lines of code (use the list `sorted(matrix.items())` . Once you have this csv file, your code could easily read data in this file to produce the second format.

So if i understood it wright:
I use in Output.csv jut 3 cities
run the code
I use the replace function to have my file (Te.csv)in the format

``````name1, name1, 0
name1, name2, dist
name1, name3, dist``````

After that i did not really understood what i have to do.
?

So if i understood it wright:
I use in Output.csv jut 3 cities
run the code
I use the replace function to have my file (Te.csv)in the format

``````name1, name1, 0
name1, name2, dist
name1, name3, dist``````

After that i did not really understood what i have to do.
?

The replace() function is not the appropriate tool. You must learn how to transform data read in a file into data usable by your program. Write the following functions

• A function get_entries(matrix) which takes our matrix and return an ordered list of python tuples `("name1", "name2", distance)` . These tuples are very easy to manipulate for python. The distance should be a floating point number.
• A function entry_to_line(entry) which takes a tuple as above and returns a string `"name1, name2, distance\n"` .
• A function line_to_entry(line) which takes such a string as argument and returns a python tuple tuple ("name1", "name2", distance). Here again, the distance should be a float.

Use these functions to create a Te.csv with the above format instead of what we have written before.

The replace() function is not the appropriate tool. You must learn how to transform data read in a file into data usable by your program. Write the following functions

• A function get_entries(matrix) which takes our matrix and return an ordered list of python tuples `("name1", "name2", distance)` . These tuples are very easy to manipulate for python. The distance should be a floating point number.
• A function entry_to_line(entry) which takes a tuple as above and returns a string `"name1, name2, distance\n"` .
• A function line_to_entry(line) which takes such a string as argument and returns a python tuple tuple ("name1", "name2", distance). Here again, the distance should be a float.

Use these functions to create a Te.csv with the above format instead of what we have written before.

I tried to write the first function as this :

``````TEST_FILE = "T3.csv"
def get_entries(matrix):
with open(TEST_FILE,"w") as fil:
for name1, name2, distance in sorted(distance_matrix(coords_list)):
fil.write("{na1}, {na2}, {dist}\n".format(na1 = name1, na2 = name2, dist = distance))

if __name__ == "__main__":
get_entries(matrix)``````

Where T3 is the file generated after i run the code for the matrix with just 3 cities

Using this code get an error like this:
for name1, name2, distance in sorted(distance_matrix(coords_list)):
ValueError: need more than 2 values to unpack

I tried to write the first function as this :

``````TEST_FILE = "T3.csv"
def get_entries(matrix):
with open(TEST_FILE,"w") as fil:
for name1, name2, distance in sorted(distance_matrix(coords_list)):
fil.write("{na1}, {na2}, {dist}\n".format(na1 = name1, na2 = name2, dist = distance))

if __name__ == "__main__":
get_entries(matrix)``````

Where T3 is the file generated after i run the code for the matrix with just 3 cities

Using this code get an error like this:
for name1, name2, distance in sorted(distance_matrix(coords_list)):
ValueError: need more than 2 values to unpack

You don't understand, functions must have precise parameters and return values. For get_entry(), you don't need a file. First look at the content of the matrix (if you are using python 3, replace iteritems() with items())

``````def print_one_item(matrix):
"""Print one item from the matrix generated by distance_matrix()
This function prints: (('Mestecanis', 'Recea'), 396.19161575474294)"""
for item in matrix.iteritems():
print( repr(item) )
return # exit the loop``````

Running this function prints `(('Mestecanis', 'Recea'), 396.19161575474294)` . You see that the matrix items are a pair (a tuple of length 2) containing a pair of cities and a number. To write get_entries() we only need to transform these tuples into triples

``````def get_entries(matrix):
"""Take a distance matrix and returns a ordered list of tuples (city, city, distance)"""
result = list()
for item in matrix.iteritems():
key, value = item # key is like ('Mestecanis', 'Recea'), value like 396.19161575474294
cityA, cityB = key # cityA is a string like 'Mestecanis', and cityB 'Recea'
entry = (cityA, cityB, value) # a triple like ('Mestecanis', 'Recea', 396.19161575474294)
result.append(entry)
result.sort()
return result # returns the sorted list of triples

if __name__ == "__main__":
matrix = distance_matrix(coords_list) # call the function
entries =  get_entries(matrix)
print(entries[:10]) # print the first 10 entries``````

This code prints

``[('Acatari', 'Acatari', 0.0), ('Acatari', 'Acis', 183.19842862166209), ('Acatari', 'Adamclisi', 372.52641231771526), ('Acatari', 'Adjud', 200.36162156879055), ('Acatari', 'Afumati', 251.49065927408915), ('Acatari', 'Agas', 121.63622537704428), ('Acatari', 'Agigea', 409.27692015889204), ('Acatari', 'Aiud', 72.639681086080628), ('Acatari', 'Alba Iulia', 92.832203609566207), ('Acatari', 'Albac', 127.94211546456722)]``

Now, your turn: save the following program as a separate file

``````def entry_to_line(entry):

def line_to_entry(line):

if __name__ ==  "__main__":
entry = ('Acatari', 'Aiud', 72.639)
line = entry_to_line(entry)
assert line == "Acatari, Aiud, 72.639\n"
entry2 = line_to_entry(line)
assert entry2 == entry``````

Complete the functions with your code until it runs without errors. You don't need a file or a matrix.

You don't understand, functions must have precise parameters and return values. For get_entry(), you don't need a file. First look at the content of the matrix (if you are using python 3, replace iteritems() with items())

``````def print_one_item(matrix):
"""Print one item from the matrix generated by distance_matrix()
This function prints: (('Mestecanis', 'Recea'), 396.19161575474294)"""
for item in matrix.iteritems():
print( repr(item) )
return # exit the loop``````

Running this function prints `(('Mestecanis', 'Recea'), 396.19161575474294)` . You see that the matrix items are a pair (a tuple of length 2) containing a pair of cities and a number. To write get_entries() we only need to transform these tuples into triples

``````def get_entries(matrix):
"""Take a distance matrix and returns a ordered list of tuples (city, city, distance)"""
result = list()
for item in matrix.iteritems():
key, value = item # key is like ('Mestecanis', 'Recea'), value like 396.19161575474294
cityA, cityB = key # cityA is a string like 'Mestecanis', and cityB 'Recea'
entry = (cityA, cityB, value) # a triple like ('Mestecanis', 'Recea', 396.19161575474294)
result.append(entry)
result.sort()
return result # returns the sorted list of triples

if __name__ == "__main__":
matrix = distance_matrix(coords_list) # call the function
entries =  get_entries(matrix)
print(entries[:10]) # print the first 10 entries``````

This code prints

``[('Acatari', 'Acatari', 0.0), ('Acatari', 'Acis', 183.19842862166209), ('Acatari', 'Adamclisi', 372.52641231771526), ('Acatari', 'Adjud', 200.36162156879055), ('Acatari', 'Afumati', 251.49065927408915), ('Acatari', 'Agas', 121.63622537704428), ('Acatari', 'Agigea', 409.27692015889204), ('Acatari', 'Aiud', 72.639681086080628), ('Acatari', 'Alba Iulia', 92.832203609566207), ('Acatari', 'Albac', 127.94211546456722)]``

Now, your turn: save the following program as a separate file

``````def entry_to_line(entry):

def line_to_entry(line):

if __name__ ==  "__main__":
entry = ('Acatari', 'Aiud', 72.639)
line = entry_to_line(entry)
assert line == "Acatari, Aiud, 72.639\n"
entry2 = line_to_entry(line)
assert entry2 == entry``````

Complete the functions with your code until it runs without errors. You don't need a file or a matrix.

Indeed i have a lot to learn about python but i have a question why replace function is not something what we could use?

i tried using this code

``````f=open("Te2.csv","w")
s=s.replace('{','')
s=s.replace('}','')
s=s.replace('(','')
s=s.replace(')','')
s=s.replace("'","")
s=s.replace(":",",")
f.write(s)
f.close()

w = open("Te2.csv",'w')
w.writelines([item for item in lines[:10]])
w.close()``````

and i came to se same result writen in the new file Te2

``````Acatari, Acatari, 0.0,
Acatari, Acis, 183.1984286216621,