Vancouver business licence data mapping at Open Data Hack Day
Today at the Open Data Hack Day, we spent some time working with the Vancouver business licence data, trying to see if we could make an interesting visualization. The set has 59,356 rows of business licences registered in Vancouver and I was most interested in a map of employees per neighbourhood. Unfortunately, we found some problems with the data.
The biggest issue was that the number of employees per licence is self reported and, as a result, very inaccurate. For example, Inlets Bistro & Lounge is reported as having 148,358 employees which is obviously incorrect. There are also multiple licences issued for many businesses and it is unclear if this is for different departments of the same company or just being reported twice. As a specific example, PricewaterhouseCoopers LLP has 45 active business licences for the same address and 0, 1, 3, 700, and 850 employees reported. Choosing how to use these records is then very difficult.
Another problem is that although the business type and business sub types are fairly well defined within the dataset, this is inconsistent with external services. One of the suggestions that arose would be to conform to the North American Industry Classification System, allowing the data to be mashed up with data from Statistics Canada. This could allow for a much richer analysis.
We managed to geocode the data using Google Fusion Tables (many thanks to Tim), which resulted in the following map:
It was very interesting to see where all of the businesses were, some of them were even located in the US. I managed to create a map of the reported number of employees for each neighbourhood from the licences:

This is not a very useful map since the data was so bad and it is certainly not reflective of the actual number of employees in Vancouver, but it was interesting to try.

Please see
Please see http://code.google.com/p/google-refine/ for all of your data-refinement needs!
You still have to figure out why your data is inconsistant, but at least you'll be using fancy tools to do so.
Oh wow, this looks really
Oh wow, this looks really great. Love that it installs on your computer too so you're not uploading huge datasets to Google to be lost in the ether. Awesome!