Extracting Customized Data from Open Street Maps into Pandas DataFrames

usmanmalik57 2 Tallied Votes 327 Views Share


I was recently working on a project that required me to extract location information from the OpenStreetMap, an open license map database of the world. The OpenStreetMap database allows you to extract location data along with the location meta information in the form of tags. My task was to extract locations along with all their associated tags.

This article will explain how I extracted customized location information from the OpenStreetMap in Python.

Before I explain the code I wrote, it is essential to understand the organization of locations in the OpenStreetMap database. At a high level, the OpenStreetMap database categorizes locations into the following categories:

  1. Nodes: data points on maps, primarily representing a single entity, e.g., a bench, a chair, a telephone booth, etc.
  2. Ways: an ordered list of nodes, for example, a street or road.
  3. Relations: an ordered list of nodes, ways, or other relations, for example, an intersection, a public park, etc.

The Problem

I needed to accomplish the task of extracting information (tags) from all the nodes, ways, and relations within a geographical location, where at least one of the tags is a name tag. In other words, I wanted to extract information about named nodes, ways, and relations within a specific geographical location.

As an example, in this article, I will extract location information from all nodes, ways, and relations from the Baker street in London.

I will use the Python Overpass library to extract information from the OpenStreetMap database. The library returns a list of all the nodes/ways/relations within a specific geographical location parameterized by latitude and longitude values of the bounding box's south, west, north, and east boundaries.

Though you can get locations with specific tag values with the overpass library, I could not find a way to filter all locations having specific tag(s). For instance, I could not find a method in the overpass library that returns a list of all locations having name tags.

Therefore, I had to further process the overpass library information to filter the locations with name tags. Finally, all the extracted nodes, ways, and relations along with their tags will be converted to a Pandas dataframe.

You need to install the overpass library with the following command before running the scripts in this article:

$ pip install overpy

Before we extract the data, you will need the latitude and longitude coordinates of the location from the OpenStreetMap database.

To do so, go to the OpenStreetMap homepage, and enter the name of the location you want to search.


To get the latitude, and longitude values for the bounding box, click the Export button from the top. The result will look like this:


Extracting Nodes

Let’s first extract the nodes from our desired location. To do so, you need to perform the following steps:

  1. Import the overpy module
  2. Create an object of the Overpass() class.
  3. Call the query() method and pass it your query string.

The following script performs the above three steps. The output shows that there are 94566 nodes within our desired location.

import overpy
api = overpy.Overpass()
result = api.query("[timeout:100000];node(51.5133, -0.1821, 51.5319, -0.1312);out;")



Next, we will filter nodes with name tags out of all nodes.

nodes_with_nametags = []
for node in result.nodes:
    if len(node.tags) > 0 and 'name' in node.tags:




Extracting Ways

You can extract ways in the same way as you extracted nodes.

result = api.query("[timeout:100000];way(51.5133, -0.1821, 51.5319, -0.1312);out;")



And similarly, you can extract ways with name tags via the following script.

ways_with_nametags = []
for way in result.ways:
    if len(way.tags) > 0 and 'name' in way.tags:




Extracting Relations

Finally, the following two scripts extract all the relations and then filter relations with name tags.

result = api.query("[timeout:100000];relation(51.5133, -0.1821, 51.5319, -0.1312);out;")



relations_with_nametags = []
for relation in result.relations:
    if len(relation.tags) > 0 and 'name' in relation.tags:




Creating a Locations DataFrame

We have extracted the desired nodes, ways, and relations, now we will create a Pandas dataframe where rows correspond to location entities (nodes, ways, relations) and columns correspond to tags for the location entities.

To create such a dataframe, we need to create a list of all the unique tags in nodes, ways, and relations.

The following script defines the get_unique_tags() function that returns a list of all unique tags from locations. I added a new column, Node_Way_Relation, which keeps track of the type of location.

def get_unique_tags(locations):
    tag_list = []

    for loc in locations:
        tag_list = tag_list +list(loc.tags.keys())

    tag_list = list(set(tag_list))
    return tag_list

The script below calls the get_unique_tags() function to get a list of all unique tags from the extracted nodes.

unique_node_tags = get_unique_tags(nodes_with_nametags)



The script below returns a list of all unique tags from the extracted ways.

unique_way_tags = get_unique_tags(ways_with_nametags)



Similarly, the following script returns a list of all unique tags from the extracted locations.

unique_relation_tags = get_unique_tags(relations_with_nametags)



Finally, the script below returns the union of unique tags from nodes, ways, and relations.

all_tags_list = list(set(unique_node_tags + unique_way_tags + unique_relation_tags))



We need to create a Pandas dataframe whose columns correspond to tags. One way to do so is to create a dictionary whose keys correspond to tag names. Later, we will iterate through nodes, ways, and relations and assign tag values to corresponding keys in the dictionary. The dictionary will be converted to a Pandas dataframe.

The following script creates the tags dictionary.

tag_dictionary = dict.fromkeys(all_tags_list)

The script below iterates through all the extracted nodes and assigns the tag values from the nodes to the corresponding keys in the tag_dictionary you created in the previous step. The dictionary is added to a list of dictionaries, i.e., tag_dictionaries.

tag_dictionaries = []

for node in nodes_with_nametags:
    tag_dictionary = dict.fromkeys(all_tags_list)

    for k, v in node.tags.items():
        tag_dictionary[k] = v

    tag_dictionary["Node_Way_Relation"] = "node"


Similarly, the script below creates tag dictionaries for extracted ways.

for way in ways_with_nametags:
    tag_dictionary = dict.fromkeys(all_tags_list)

    for k, v in way.tags.items():
        tag_dictionary[k] = v

    tag_dictionary["Node_Way_Relation"] = "way"


Finally, the following script creates tag dictionaries for extracted relations.

for relation in relations_with_nametags:
    tag_dictionary = dict.fromkeys(all_tags_list)

    for k, v in relation.tags.items():
        tag_dictionary[k] = v

    tag_dictionary["Node_Way_Relation"] = "relation"





You can convert the tag_dictionaries into a Pandas dataframe using the following script. In the output, you can see the first five rows of the Pandas dataframe.

import pandas as pd

locations_df = pd.DataFrame(tag_dictionaries)



You can manipulate this Pandas dataframe and extract the required information. For instance, I extracted all locations with a name in Italian using the following script.

locations_df_filtered = locations_df[locations_df['name:it'].notnull()]




The Python Overpass library does provide options to extract information from the OpenStreetMap database. However, it doesn't provide a way to extract all nodes, ways, or relations having a specific tag. To do so, you need to process further the information returned by the Overpass library, which I explained in this article. I hope this information will be helpful to you.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.