I presented an api for location entity recognition and geocoding at fosdem 2016 Geocoding the World with openaddresses.
This API can extract and geocode locations from Wikipedia articles or microblog posts with a high confidence score in places where the data is available. The EU version running at geocode.xyz covers some countries pretty well (eg, Spain, The Netherlands, Belgium, Denmark) and others not so well (Poland, Slovakia, Estonia) because in the later case openaddresses.io is missing street address names.
If you have your own address data get a copy of the server on the AWS and load it up on the local Mysql server. Then build your own real time twitter feed location tracker, or map the news of the day.
Or better yet, contribute to openaddresses.io to improve this tool.
A minimal location entity set is an alphanumeric description of a location which maps to a single point (Latitude, Longitude) without ambiguity, and is the shortest such description.
La Fàbrica del Sol, Passeig de Joan Salvat Papasseit, 1, 08003 Barcelona, Spain
can be reduced to 1 SALVAT PAPASSEIT, BARCELONA, ES
While many geocoding systems attempt to return as much data as possible (such as alternative road names, neighborhood name, timezones, geohashes, Ordnance Survey gridrefs, calling codes, what3words, sunrise/sunset, and more) all this seems unnecessary since you may get this additional information of you perform a reverse geocoding lookup on the point.
Returning so much redundant information on an address lookup seems like a waste of space to me.
Let’s build a two step geocoding system, one that a) returns the most minimal info on the first request and b) returns all the rest of location details on the second.
CanadaPost’s lawsuit, now in its 4rth year, is ongoing and it looks like it is finally getting a court date soon (they have been quiet for a while, probably wishing to keep this under wraps until after the Federal Election.)
Either way, we are still here, and we are still providing a free database of postal addresses and postal codes that is bigger and better than ever.
The main database has grown considerably in the last 4 years, further proof that crowdsourcing works! As of the last update on 2015-09-30, 12613 new postal codes were added, with the total now approaching 1 million.
You may download all these data for free at http://geocoder.ca/?freedata=1 (under the Creative Commons Attribution 2.5 Canada License.)
Will keep you posted as to the latest developments from the federal court. All the best and thank you for your support.
Ervin Ruci Geocoder.ca
@Yapc::EU 2015 explaining fuzzy geocoding with Geocode.xyz
I gave a talk at State of the Map openstreetmap.org conference on crowdsourcing. (The slides are posted on my twitter feed https://twitter.com/geolytica/status/531501476708114432 , also available on Vimeo)
One observation I took home from the conference is that the state of public data around the word is similar to that in Canada, in the sense that governments and their affiliate entities hold on to the data for as long a possible, despite the fact that doing so, adversely affects the state of their economies and goes against the public good. (According to OSM France “Bano” project, a country loses up to 0.5% of its GDP due to lack of publicly available addressing data. Source)
Crowdsourcing the data is not an optimal solution, in the face of the lack a data feed from its authoritative source, because it results in datasets that contain errors. Still, this seems the only way to open up the data in my view, when the decision makers are convinced that keeping the data closed is better for their budgets (an interesting figure of 0.5 billion pounds was thrown around as the value of a closed post code list by the Royal Mail CEO Moya Greene, in her arguments for keeping the dataset closed. She also happened to be Canadapost’s CEO at the time they started their legal efforts aimed at enforcing CP’s alleged intellectual rights over Canadian postal codes.)
In France on the other hand they don’t have such problems, as they have not yet made the effort to create a post code system like the Canadian or the British ones, hence it is hard to make the half billion dollar argument there, still that does not mean that whatever system they have is open. People from the “Bano” project had to lobby hard to get the list of up to 1000 postal codes created by the French postal service open to the public, and when they actually did it was full of errors. Not only that but the French postal service has 4 different street address datasets (one for regular mail, one for advertising mail, one for parcels and another one for a purpose I can’t remember now.) All 4 have quality issues, and the 4 different departments that created them do not talk nor cooperate with each other to improve their respective datasets. Funny stories of government inefficiency at the public’s expense. The Economist also wrote a piece on this topic a month ago.
In closing, public data all around the world is at various stages of unavailability because certain people of influence are convinced they are worth a lot of money. Nobody has yet shown how much money they are actually making from licensing this data. I doubt it is half a billion. Or 0.5% of the GDP.
I am certain it is more akin to a hidden tax we all have to pay.
11 of the 17 years I’ve been in Canada I’ve resided at this postal code: “K2C 1N5.” It is the only otherwise meaningless alphanumeric string I remember most clearly, having written it over the years on countless letters, web forms, tax forms, applications and the like. As of today I can no longer call it my ‘postal code.’ Because as of today I have been personally sued in federal court over the use of the trademarked word combination: ‘postal code.’ So, I am taking the personal decision to just call it my ‘zip code.’ You’ll still know what I mean.
Sounds incredible right? Wrong!
The Canadapost corporation has just amended their statement of claim against my company, geocoder.ca, to name me personally as a defendant in their ongoing legal action to assert copyright over everything ‘postal code.’ (Both the Amended Statement of Claim and the Amended Statement of Defense will be posted here soon).
These people do move slowly (it has been over a year since their original claim) but they do just when I thought they were ready to drop their claims, which were ridiculous to begin with. It is funny because just a few days ago I was reading in the news about the financial woes of the crown corporation that enjoys the position of monopoly in certain areas. It is also fitting though for a mismanaged company to spend their last dollars on lawyers, we’ve seen it before (remember SCO?)
Other than filling my mailbox with junk mail, misusing the legal system to pursue absurd claims appears to be to be the way those overpaid pencil sharpeners see as the way out of the woods.
Maybe they envision a world where every website that mentions the word ‘postal code’ or uses the postal code to identify a location, pays a fee to the corporation. They want me to be the first one to do so.
They even extended their claim to other websites I own, namely foodpages.ca, to sieze any profits this website makes by having people type various location entities (address, intersection, city, postal code) to find a nearby restaurant. Perhaps in their ideal world, these profits belong to them because of the ‘postal code’ option.
Alas, there are thousands of websites that let users look up information associated with a postal code. Some of them use the geocoder.ca free XML port. There are lots of other alternatives too. Those websites would operate just fine if they stop accepting postal codes mind you. The user would simply have to type in a few more letters te enter their address instead.
In my case though they overdo it a bit. The revised statement of claim also mentions as targets the so called “Ruci Websites”, (FoodPages.ca, FoodHouse.com, AussieSalon.com, Yelpus.com, DineHere.us, FoodPages.us and SalonPages.com)
This is just absurdly funny. Their $1000 an hour lawyers can not even muster the competence of a simple whois query which would reveal that at least three of these websites have nothing to do with me, indeed they are not websites at all, just domain names registered by some unknown spammer and plastered with ads. I just verified that.
Maybe someone at Canadapost just heard the proverbial sentence that “there is money on the internet” and they unleashed their lawyers to get it. Maybe they think that their ‘postal code’ (if the courts decide it is ‘theirs’ in the first place) is the backbone of the internet and these websites can not exist without this ingredient. So, they must pay the ‘postal code’ tax!
Seeing no end to this madness I am taking the following preemptive action. I am renaming the ‘postal code’ to ‘zip code’ on every website, blog, application or other publication I write.
I will also kindly ask the federal government to stop asking me about my postal code on tax forms, and any other communications they have with me. Due to copyright reasons I might not be able to answer that question and I also hold them responsible for bringing this situation to a head.
Finally, like I mentioned a few paragraphs ago, it is unfortunate that the corporation is facing financial problems. It is tough being short of a $1 billion a year. They will never get it from me though. From a very very large number of people in my position, maybe, but I doubt it. Taxpayers who cover their $1b shortfall, CIPPIC who is representing me pro bono, myself and those who have donated to my legal defense fund will bear the cost of such legal action.
A company founded and still operating on the mentality of the pre-internet age, or even the pre-transistor age, has to understand the realities of new technologies. Starting with the simplest fact that a ‘postal code’ is not solely for the purpose of sending mail via Canadapost’ as they claim. There are opportunities for this old company to make money in this new environment and they are much better positioned than a one-man company like geocoder.ca, to do so.
I do not know who advises the Canada Post CEO on all these matters, but I do have one advice for him. Fire them!
Or better yet. Fire Yourself!