hello. I am quite new in python so i have a question. I have a csv file in which i have names of 30 cities and their coordinates(lat and long). i want to generate a distance matrix for these cities. How can i do this ?
You can generate list of list of distances to every other city by list comprehension or for loop using Pythagoras triangle formula, if exactness is not needed, else you need to find about calculating distances on surface of ball of 40000 km perimeter.
You can generate list of list of distances to every other city by list comprehension or for loop using Pythagoras triangle formula, if exactness is not needed, else you need to find about calculating distances on surface of ball of 40000 km perimeter.
The formula i need is the second one but it seems that my file is not quite in the wright format
name1,lat1 long1
name2,lat2 long2
name3,lat3 long3
name4,lat4 long4
i tried using the replace function(tried to replace the empty space betwwen lat and long with',') but some of the names of the cities are compound(Los Angeles) so when trying to define distance_matrix a could not make it. My question is how can i replace the empty space between lat and long without changing the compund names?
If someone could give me some advice i would be grateful
line.rsplit(None, 1) , then split first element of result from comma.
Thank you for the advice. I could print in shell in this format.
['name1,lat1','long1']
['name2,lat2','long2']
['name3,lat3','long3']
Still i have yet 2 problems to solve before i could define a distance_matrix.
---The first one is : where should i placethe split(',') of the first element of the list and of course how could i write after that in a new file so that i could print ['name1','lat1','long1']
---The second one is: i have tried to write the lines but it shows this error
lines = open("Better.csv").read().split(';')
lines.sort()
ot = open("Ord.csv", "w")
for line in lines:
print line.rsplit(None,1)
ot.write(line.rsplit(None,1))
ot.close()
File "C:/Users/gg/Desktop/city.py", line 6, in
ot.write(line.rsplit(None,1))
TypeError: expected a character buffer object
So could i fix this second problem?
I am again sorry if any of my question are silly but as i said i am just begining to learn python.
I was able in the end to write the file in the format i wanted but using this code
lines = open("Better.csv").read().split(';')
lines.sort()
t = open("Ord.csv", "w")
for line in lines:
res = line.rsplit(None,1)
ts = str(res)
t.write(ts+'\n')
t.close()
words = open("Ord.csv", "r")
words2 = open("Ord2.csv", "w")
for word in words:
res = word.rsplit(',')
ts = str(res)
words2.write(ts +'\n')
words.close()
words2.close()
f=open("Ord3.csv","w")
s=open("Ord2.csv").read()
s=s.replace('[','')
s=s.replace("']","")
s=s.replace("]","")
s=s.replace("'","")
s=s.replace('" ','"')
s=s.replace(' "',' ')
s=s.replace('"','')
f.write(s +'\n')
f.close()
Still now i have another problem. My format is
name, lat, long\n
name2, lat2, long2\n
How could i remove the \n and write the wanted format in a new file?
name, lat, long
name2, lat2, long2
I tried using strip() but the \n remaind there.
use strip('\n')
I tried to add this to my cod but still nothing
s=s.rstrip('\n') What is the problem ?
I tried to add this to my cod but still nothing
s=s.rstrip('\n')What is the problem ?
If you could attach Better.csv to a post, we could see what your code does...
If you could attach Better.csv to a post, we could see what your code does...
I tried to upload but i got this error:
Better.csv:
Invalid File
Should i change the format of the file into txt?
My file Better.csv has this format:
name3,long3 lat3;name1,long1 lat1;name2,long2 lat2;
And some of the names are compund. I tried first to sort them, write sorted in another file. After that i split long lat using split(None,1) but it give a format i did not really need.So using the posted code before i could write my new file in a format i could almost use for my matrix,
i said almost because each line has \n
So i should change the format of the file to be able to attach it?
I tried to upload but i got this error:
Better.csv: Invalid File Should i change the format of the file into txt?
My file Better.csv has this format: name3,long3 lat3;name1,long1 lat1;name2,long2 lat2;
And some of the names are compund. I tried first to sort them, write sorted in another file. After that i split long lat using split(None,1) but it give a format i did not really need.So using the posted code before i could write my new file in a format i could almost use for my matrix, i said almost because each line has \n
So i should change the format of the file to be able to attach it?
You can zip the file and attach the zipped file.
This could be what you want
SRC_FILE = "Better.csv"
DST_FILE = "Output.csv"
def gen_triples():
with open(SRC_FILE) as fin:
for record in fin.read().split(';'):
if not record: # ignore empty record
continue
name, coords = record.split(",")
lat, lon = coords.split()
yield name, lat, lon
def write_output():
with open(DST_FILE, "w") as fout:
for name, lat, lon in sorted(gen_triples()):
fout.write("{na}, {la}, {lo}\n".format(na = name, la = lat, lo = lon))
if __name__ == "__main__":
write_output() Try to understand every bit of it :)
This could be what you want
SRC_FILE = "Better.csv" DST_FILE = "Output.csv" def gen_triples(): with open(SRC_FILE) as fin: for record in fin.read().split(';'): if not record: # ignore empty record continue name, coords = record.split(",") lat, lon = coords.split() yield name, lat, lon def write_output(): with open(DST_FILE, "w") as fout: for name, lat, lon in sorted(gen_triples()): fout.write("{na}, {la}, {lo}\n".format(na = name, la = lat, lo = lon)) if __name__ == "__main__": write_output()Try to understand every bit of it :)
It is quite advanced. I am just begining to learn python. I will try to understand but my originalcode can be in any way changed to come to the same result?
It is quite advanced. I am just begining to learn python. I will try to understand but my originalcode can be in any way changed to come to the same result?
Yes, the key points are the way the records read in the source file are transformed into a tuple (name, lat, lon) containing the 3 values, and then how these 3 values are formatted for the output file. You can write a similar code without 'yield', 'with' and even without functions. For example instead of 'yield', you could append the tuple to a list.
Yes, the key points are the way the records read in the source file are transformed into a tuple (name, lat, lon) containing the 3 values, and then how these 3 values are formatted for the output file. You can write a similar code without 'yield', 'with' and even without functions. For example instead of 'yield', you could append the tuple to a list.
I will try to make my own code work but meanwhile comming back to the name of the thred i now want to generate the distance matrix for my cities.
I tried to use the next code for generating it:
from math import sin as sin, cos as cos, acos as acos, radians as radians
coords_list=open("Output.csv").read()
print coords_list
ff=open("Te.csv","w")
def distance_matrix(coords_list):
'''Calculates the distances (in km) between any two cities based on the formulas
c = sin(lati1)*sin(lati2)+cos(longi1-longi2)*cos(lati1)*cos(lati2)
d = EARTH_RADIUS*Arccos(c)
where EARTH_RADIUS is in km and the angles are in radians.
Source: http://mathforum.org/library/drmath/view/54680.html
This function returns the matrix.'''
matrix={}
EARTH_RADIUS = 6378.1
#Populate the matrix.
for (name2,longi2,lati2) in coords_list:
for (name1,longi1,lati1) in coords_list:
if name1!=name2:
#if name1==name2, then c will be equal to 1, and acos(c) will fail
c = sin(radians(lati1)) * sin(radians(lati2)) + \
cos(radians(longi1-longi2)) * \
cos(radians(lati1)) * cos(radians(lati2))
distance = EARTH_RADIUS * acos(c)
matrix[name1,name2] = distance
else:
#Case when name1==name2...
matrix[name1,name2] = 0.0
return matrix
print matrix
ff.write(matrix)
ff.close()
But it will only print the file(Output.csv) in shell without actualy creating the matrix.
So what is wrong with this code?
Again i am sorry if i asksome silly question but my knowledge of python is in the begining
I will try to make my own code work but meanwhile comming back to the name of the thred i now want to generate the distance matrix for my cities. I tried to use the next code for generating it:
from math import sin as sin, cos as cos, acos as acos, radians as radians coords_list=open("Output.csv").read() print coords_list ff=open("Te.csv","w") def distance_matrix(coords_list): '''Calculates the distances (in km) between any two cities based on the formulas c = sin(lati1)*sin(lati2)+cos(longi1-longi2)*cos(lati1)*cos(lati2) d = EARTH_RADIUS*Arccos(c) where EARTH_RADIUS is in km and the angles are in radians. Source: http://mathforum.org/library/drmath/view/54680.html This function returns the matrix.''' matrix={} EARTH_RADIUS = 6378.1 #Populate the matrix. for (name2,longi2,lati2) in coords_list: for (name1,longi1,lati1) in coords_list: if name1!=name2: #if name1==name2, then c will be equal to 1, and acos(c) will fail c = sin(radians(lati1)) * sin(radians(lati2)) + \ cos(radians(longi1-longi2)) * \ cos(radians(lati1)) * cos(radians(lati2)) distance = EARTH_RADIUS * acos(c) matrix[name1,name2] = distance else: #Case when name1==name2... matrix[name1,name2] = 0.0 return matrix print matrix ff.write(matrix) ff.close()But it will only print the file(Output.csv) in shell without actualy creating the matrix. So what is wrong with this code? Again i am sorry if i asksome silly question but my knowledge of python is in the begining
The block def distance_matrix(coords_list): ... defines a function distance_matrix() but does not execute that function. A function is executed when it is called, so replace the last line by
distance_matrix(coords_list) # call the function
ff.close() There are other problems. The coord_list should be a list of tuples extracted from the cvs file and not a string, so you could write
coords_list = list()
for line in open("Output.csv"):
line = line.strip()
if line:
name, lat, lon = line.split(",")
lat = float(lat) # convert string to float value
lon = float(lon)
coords_list.append( (name, lat, lon) ) also the ff.write(matrix) will produce awful results. I suggest
from pprint import pprint
pprint(matrix, ff) Another problem is that I thought the data were (name, latitude, longitude) and you seem to use (name, longitude, latitude).
The format of the data in the Output.csv is name, longitude, latitude. I remember that in some of my fist posts i might have used the wrong format.I am sorry.
I tried to repair my code using the advice you gave but this code
from math import sin as sin, cos as cos, acos as acos, radians as radians
ff=open("Te.csv","w")
coords_list = list()
for line in open("Output.csv"):
line = line.strip()
if line:
name, lon, lat = line.split(",")
lon = float(lon) # convert string to float value
lat = float(lat)
coords_list.append( (name, lon, lat) )
print line
def distance_matrix(coords_list):
'''Calculates the distances (in km) between any two cities based on the formulas
c = sin(lati1)*sin(lat2)+cos(lon1-lon2)*cos(lat1)*cos(lati2)
d = EARTH_RADIUS*Arccos(c)
where EARTH_RADIUS is in km and the angles are in radians.
Source: http://mathforum.org/library/drmath/view/54680.html
This function returns the matrix.'''
matrix={}
EARTH_RADIUS = 6378.1
#Populate the matrix.
for (name2,lon2,lat2) in coords_list:
for (name1,lon1,lat1) in coords_list:
if name1!=name2:
#if name1==name2, then c will be equal to 1, and acos(c) will fail
c = sin(radians(lat1)) * sin(radians(lat2)) + \
cos(radians(lon1-lon2)) * \
cos(radians(lat1)) * cos(radians(lat2))
distance = EARTH_RADIUS * acos(c)
matrix[name1,name2] = distance
else:
#Case when name1==name2...
matrix[name1,name2] = 0.0
distance_matrix(coords_list) # call the function
from pprint import pprint
pprint(matrix, ff)
ff.close() will give this error
Traceback (most recent call last):
File "C:\Users\gg\Desktop\t.py", line 40, in
pprint(matrix, ff)
NameError: name 'matrix' is not defined
So if you could tell me what is wrong i would be grateful. I am in the begining in learning python and these are new things for me.
The format of the data in the Output.csv is name, longitude, latitude. I remember that in some of my fist posts i might have used the wrong format.I am sorry.
I tried to repair my code using the advice you gave but this code
from math import sin as sin, cos as cos, acos as acos, radians as radians ff=open("Te.csv","w") coords_list = list() for line in open("Output.csv"): line = line.strip() if line: name, lon, lat = line.split(",") lon = float(lon) # convert string to float value lat = float(lat) coords_list.append( (name, lon, lat) ) print line def distance_matrix(coords_list): '''Calculates the distances (in km) between any two cities based on the formulas c = sin(lati1)*sin(lat2)+cos(lon1-lon2)*cos(lat1)*cos(lati2) d = EARTH_RADIUS*Arccos(c) where EARTH_RADIUS is in km and the angles are in radians. Source: http://mathforum.org/library/drmath/view/54680.html This function returns the matrix.''' matrix={} EARTH_RADIUS = 6378.1 #Populate the matrix. for (name2,lon2,lat2) in coords_list: for (name1,lon1,lat1) in coords_list: if name1!=name2: #if name1==name2, then c will be equal to 1, and acos(c) will fail c = sin(radians(lat1)) * sin(radians(lat2)) + \ cos(radians(lon1-lon2)) * \ cos(radians(lat1)) * cos(radians(lat2)) distance = EARTH_RADIUS * acos(c) matrix[name1,name2] = distance else: #Case when name1==name2... matrix[name1,name2] = 0.0 distance_matrix(coords_list) # call the function from pprint import pprint pprint(matrix, ff) ff.close()will give this error Traceback (most recent call last): File "C:\Users\gg\Desktop\t.py", line 40, in pprint(matrix, ff) NameError: name 'matrix' is not defined
So if you could tell me what is wrong i would be grateful. I am in the begining in learning python and these are new things for me.
The problem is that matrix is a local variable of the function distance_matrix. It does not exist outside the function. The solution is that your function returns the matrix. Here is the code. It generates a 23 MB matrix file
from math import sin, cos, acos, radians
ff=open("Te.csv","w")
coords_list = list()
for line in open("Output.csv"):
line = line.strip()
if line:
name, lon, lat = line.split(",")
lon = float(lon) # convert string to float value
lat = float(lat)
coords_list.append( (name, lon, lat) )
def distance_matrix(coords_list):
'''Calculates the distances (in km) between any two cities based on the formulas
c = sin(lati1)*sin(lat2)+cos(lon1-lon2)*cos(lat1)*cos(lati2)
d = EARTH_RADIUS*Arccos(c)
where EARTH_RADIUS is in km and the angles are in radians.
Source: http://mathforum.org/library/drmath/view/54680.html
This function returns the matrix.'''
matrix={}
EARTH_RADIUS = 6378.1
#Populate the matrix.
for (name2,lon2,lat2) in coords_list:
for (name1,lon1,lat1) in coords_list:
if name1!=name2:
#if name1==name2, then c will be equal to 1, and acos(c) will fail
c = sin(radians(lat1)) * sin(radians(lat2)) + \
cos(radians(lon1-lon2)) * \
cos(radians(lat1)) * cos(radians(lat2))
distance = EARTH_RADIUS * acos(c)
matrix[name1,name2] = distance
else:
#Case when name1==name2...
matrix[name1,name2] = 0.0
return matrix
matrix = distance_matrix(coords_list) # call the function and catch return value
from pprint import pprint
pprint(matrix, ff)
ff.close() The format of the generated file could be improved. Our output is not a csv format.
The problem is that matrix is a local variable of the function distance_matrix. It does not exist outside the function. The solution is that your function returns the matrix. Here is the code. It generates a 23 MB matrix file
The format of the generated file could be improved. Our output is not a csv format.
the matrix i wanted to print should have looked like this
name1 name2 name3
name1 0 distA distB
name2 DistC 0 DisD
name3 DiistF DistV 0 i want to print it in shell like this and write it in a file too.