Review of OpenStreetMap (in 2021)

There has been more than one revolutionary event in the field of web maps since the beginning of the 21st century. One of those is the rise of google maps. The other is the success of the OpenStreetMap project. Google maps is built for end-user consumers. While OpenStreetMap is built for users, developers, and machines on the internet to take the full potential of the global covered geographic data. The difference comes from the fact that OpenStreetMap is an open geographic database.  

Google maps and OpenStreetMap are essentially 2 different products. although they all aim to provide people with better comprehension and understanding of this world. The functionalities of the OpenStreetMap are only limited to the possible applications created by developers. I will give reviews on the OpenStreetMap and its ecosystem with the same criteria from 3 aspects: data, functionalities, and interoperability & extensibility. 

  1. Data

OpenStreetMap devotes the collaborative efforts to feature data, A.K.A vector format of geographic data with coordinates on points, lines, and polygons. The data is all about coordinates. No imagery and terrain are provided. External data sources such as satellite imagery from Bing and Mapbox are imported as backgrounds for users to trace features on top of it. Due to the characteristics of an open database, everyone can download, process, publish, distribute the data. According to the statistics, Currently, there are more than 7 billion nodes, 800 million ways, and 9 million relations in the database. It is likely to be the largest geographic database in the world even comparing to Google maps’ vault of vector data. 

2. Functionalities 

Various applications have been developed around the OpenStreetMap database. It includes all kinds of maps, software, and services for drivers, cyclists, fishers, and hikers, etc. here is a short list I am aware of:

1. Online editing. This is where OpenStreetMap started. 

2. Interactive maps such as and and More on this page.

3. geo-name searching. The service is called “Nominatim” and is accessible on the main site.

  4. Route planning, “OSRM” and “Graphhopper” are available among others on the main site. 

5. map rendering with customized style. all kinds of map render engines can be used on the database, mapnik for instance. 

6. Database subsetting and tailoring. Developers can do what they want on the database under the ODBL. 

3. interoperability & extensibility

OpenStreetMap is more powerful in terms of interoperability & extensibility than Google Maps. The only limit is your imagination. In my opinion, the most promising application of OpenStreetMap data will be acting as building bricks for everyone to create a map of his/her own. The Geographic features in OpenStreetMap are like a vocabulary for people. People use words and phrases to write a post on the internet to express themselves, to communicate, to make fun of. People can also use features from the OpenStreetMap database to composite a map to express themselves, communicate, and make fun of. People can write, talk, make videos to express and communicate online easily these days, why not to craft a map to do that in the near future?

Review of Google Maps (in 2021)

Undoubtedly Google Maps is the most successful product Google ever built alongside its search engine. Since its launch in 2005, Google maps reinvented maps on the internet, changed the idea of how maps are shown, served, and used on various devices through the internet. Today Google maps evolved from a single-page web application to a full-featured platform including many products such as mobile app and APIs. Here I will give my comments about google maps from 3 aspects from a perspective of a map enthusiast and map developer: Data, Functionalities, Interoperability & Extensibility. and I will emphasize the web version.

  1. Data

The strength of Google maps mostly comes from the completeness, currentness, consistency, and efficiency of the geographic data Google collected and managed. Any other advantages such as simple UI, aesthetic quality, and performance optimization can not substitute the importance of data. here is a list of the categories of these data:
1. High spatial resolution satellite imagery(in Global coverages, with time series archive which can be viewed in Google earth pro)
2. Vector data (including road networks, water bodies, building outlines, administrative boundaries, and POIs, etc.)
3. Terrain data(rendered in contour lines or color relief)
4. Street views
5. 3D models
6. Transit routes
7. Place reviews.
Some of those data are exclusive and expensive that only paid clients such as the government or military departments can use before Google brings those to the general public. I believe that Google maintains the largest geospatial databases in the world although, in some subcategories, others could beyond it, such as OpenStreetMap’s geo-feature database has the potential to surpass that of Google’s. Google maps are the only choice sometimes when there is no other data source available in the market especially when you focus on the satellite imageries. However, Google didn’t take the full power of these data, I will discuss it in the section “What google maps could have done but didn’t”.
Let’s close this section with an answer to a simple question: Is google maps’ data good enough for a user? The answer will be: Absolutely, Google maps define the upper bound of a map service’s data capability. A map user even didn’t know what mean good data for maps if there is no such product.

2. functionalities

Google maps are so versatile in functionalities that it is no longer a simple map. Traditionally, a map is a print of drawings or graphics of geographic things. To read it, you can get the locations, extents, distributions, and distances of something on Earth. But as a software, Google maps are interactive, searchable, and able to give navigation instructions with details. here is a summary of the main functionalities:

  1. Map interactions such as panning, zooming in/out 
  2. Place searching 
  3. Route planning and turn-by-turn navigation
  4. Geocoding and reverse geocoding
  5. Map creating and sharing
  6. Map style customization by layers control and customized map import
  7. Print, and save

The functions of map interactions by arranging map tiles were first created by the Google maps team using AJAX in 2005, an innovational design then and a de facto solution now. Searching, route planning, and navigation are really good because Google is the expert in answering queries at its birth. However, comparing with OpenStreetMap, Map creating, sharing, customization are not good enough. Even considering the Google Map Maker, The data user created is not well integrated into the underlying data. We will close this section with an answer to another simple question: Do Google maps have good functionalities? The answer is: yes, It does for most scenarios except you are obsessed with map creation and curation. 

3. Interoperability & Extensibility
To some extend, Google is so focused on the end user’s experience in their product that many products did not encourage developers to extend it. Or Google may think the extension will hinder its original design or damage the perfection. For Google maps, JavaScript API v3 is available for developers who want to create their maps and applications. Developers can integrate their homebrew data on top of Google maps’ data, there are more than 150 Google Maps JavaScript Samples on the docs page for people to learn from. You can take a look at here if you are interested. But when I was asked a question: Are the Google maps good enough concerning interoperability & extensibility? I will answer: Not really. Because you have little control over the data google maps offered. The granularity of the control of data is so coarse that you can only have the option to topple on/off a layer. You can not select a subset of geo-features such as roads, waterways, or buildings, let alone the vertices of the features.

After all, Google maps are great but not perfect. It is still in evolution for a better future. There are many weaknesses of it to be improved. For example, Do you ever wonder when search “Amazon River” or “Congo River” on google maps, why you can’t get a highlighted outline of the Result like your searched U.S.A.?

How to extract all titles of biography and geography articles from english Wikipedia

There are more than 1.8 million biography articles and 1.2 million geography articles in english Wikipedia. Here we need to extract all the titles of these articles from Wikipedia’s data files.

  1. Get the dump file of Wikipedia from the database download page. (file size ~ 20GB)
  2. Get database dump file of category links whose name like this one: “enwiki-20210501-categorylinks.sql.gz” (file size ~ 3GB)
  3. Restore the database dump file of category link to a docker Mysql server. (database size ~ 30GB)
  4. Save the query result on the table “categorylinks” on the Mysql server. The query is to retrieve all the ids of page that was in the category “WikiProject_Biography_articles”.
  5. Parse the Wikipedia articles file downloaded at step 1 with Python, SAX and mwparserfromhell. Retrieving all the biography article titles whose talk page id was in the result set of step 4.
  6. Retrieving all the geography article titles from file downloaded at step 1 by filter articles that contain {{Coord:}} template.
  7. The title of all titles of biography articles and geography articles in english Wikipedia as of 2021 can be downloaded below.

How many biography articles and geography articles are there in english Wikipedia?

According to the statistics from Wikipedia, especially the info conveyed from this image, biography and geography are two largest categories of english Wikipedia. Actually, take the number of articles as a measurement, nearly half of Wikipedia’s articles fall in these two categories in my survey.

To get the number of biography articles on Wikipedia is not a trivial task as it seems to be. Due to the crowdsourcing nature of Wikipedia, articles come and go at any moment, the style and format of articles are not consistent. However, some concepts are very helpful in statistics of Wikipedia. These concepts include “category” and “template“.

One can derive the number of biography articles by getting the number of articles under the category “WikiProject_Biography_articles”. here is the query result of MediaWiki table “Category“.

List of category names with its page number in descending order

From the query results above we can tell that the number of pages which are in “WikiProject_Biography_articles” category is 1.8 million. This is a solid estimate of the number of biography articles on Wikipedia.

Unfortunately, there is not a single category to be used to ‘tag’ a geography article. We can estimate the number of geography articles by counting ‘Coord’ template which is included in geographic articles. By traversing the 20GB dump file of Wikipedia’s all articles and detecting the specific template, we can get the number which is 1.2 million.

So, here is the conclusions:

more than one-fourth (1.8 million / 6.3 million) of all articles of wikipedia are biographies. and nearly one-fifth (1.2 million / 6.3 million) are geography articles. They combined occupy almost half( 3 million / 6.3 million) of the english Wikipedia. To show this kind of data on some open maps, the fundamental step is to extract those information, I will cover the work in the next post.