Born to Dog Run: My Fight With The NYC OpenData Dog Run API


NYC OpenData API

Last week I was introduced to the NYC OpenData API and I started poking around. I thought it would be fun to make a little program and see if the past few months of learning Ruby could actually produce something tangible. I figured that there was going to be some data set among the 1300+ on the NYC OpenData site that was both manageable and interesting enough to devote some time parsing. So I found myself gravitating towards the one that had dogs.

NYC OpenData provides an XML directory of dog runs and off-leash areas. A typical entry looks like this:

<Name>Wallenberg Forest Park Off-Leash Area</Name>
<Address>Palisade Avenue, Douglas Avenue, West 235 Street</Address>

It is a nightmare

Honestly this data set is not malicious. It is just very confused. It does not appear to know what it wants to let its users know or even if it wants its users to know anything at all. I am guessing that this data set just lists all parks on that contain "dog friendly areas." This information might be easy for them to pull off their own website, as parks with dog friendly areas are flagged with this dog friendly cartoon: dog. But it does not make for easy programming.

Any program that uses the Dog Run API would likely want to know the name of the dog friendly area, it's location, and whether it is a dog run or a larger off-leash area. However the above code does not make the first two pieces of data very clear or standardized.

Wallenberg Forrest

First, consider the name. In the above example, the dog run is the called "Wallenberg Forest Park Off-Leash Area." This area is part of the Raoul Wallenberg Forest, a beautiful little patch of trees across the street from my elementary school (P.S. 24 class of '99 RULES!). But until I searched for "Raoul Wallenberg Forest" while writing this post I had no idea that was the official name for this park. I always had referred to it mentally as the part of Seton Park where the kids who had pegs on their bikes hung out. This is not a very useful name either, though it made me think more carefully about all of the names in the API. Also it made me feel a longing for my lost childhood.

An Address Ain't Nothing Without A Number

While I could moan and groan about the official name of the park where I learned to ride a bike (that didn't even have pegs) naming wouldn't matter if I had a good address to look up. Other APIs in NYC OpenData (like the DYCD After School Program API) have location attributes that both list latitude and longitude coordinates and a human readable street address. As you can see in the Wallenberg Forest entry, the dog run API has neither:

<Address>Palisade Avenue, Douglas Avenue, West 235 Street</Address>

The address is just a comma separated list of three streets that do not intersect (here I am pulling data from my childhood). Some address entries are more specific and list intersections or ranges of blocks. A few entries indicate the whole park is dog friendly. The address attribute for Central Park is particularly fun:

Though there are no enclosed dog runs, there are 23 particularly dog-friendly areas scattered throughout the Park.

No addresses indicate what borough or zip code a dog run is in. And many entries are empty.

This does not make it easy to search for dog runs in any reasonable way. Although it did make me realize that I was being unreasonable in my desire for dog runs to have a mailing address that I could look up. But dog runs, even the giant off-leash areas of Central and Prospect parks, have latitude and longitude coordinates. And this API failed to even include those. Originally I had hoped to be able to write a program that showed dog runs close to a given location. But at this point it looked as if I would just find many particularly dog-friendly areas scattered throughout a mess of XML.


Back when I was just a fresh faced kid from the Bronx, I assumed that I could use some Ruby code to compare one set of latitude and longitude coordinates to another and compute distance between those two points. But even if the API returned accurate latitude and longitude coordinates, I knew that the math involved would be hard and I am forgetting my spherical geometry. Luckily the Geocoder remembers those things.

In fact the Geocoder can do many more things than just spherical geometry. In fact, that documentation discusses those features as an afterthought. Geocoder helps create geocoded address objects that know many things about their relationship with other geocoded objects. But instead of trying to understand our entire Geocoded world, I focused on two methods: distance_between and coordinates.

First, Geocoder contains a Calculations class with a distance_between method. This method allows one to input latitude and longitude coordinates (as an array) and return the distance (in miles) between them:

flatiron_building = [40.741109, -73.989452]
flatiron_school = [40.705329, -74.0139696]
Geocoder::Calculations.distance_between(flatiron_building, flatiron_school)
=> 2.7856416746890913

Even more powerful is the coordinates method that takes in a string and attempts to return an array with latitude and longitude coordinates.

Geocoder.coordinates("11 Broadway, New York, NY, 10004")
=> [40.705329, -74.0139696]

This method could be treated like Google maps. Except that if it did not know how treat the string like an address, it would break down. So can it handle dog runs?

Geocoder.coordinates("Palisade Avenue, Douglas Avenue, West 235 Street")
=> nil

Not quite.

Finding Some Dog Runs, Some Of The Time

But Geocoder can find some of the dog runs, some of the time. In the case that the dog run address contains an intersection, Geocoder runs perfectly. For example the Chelsea Waterside Park Dog Run has an address of "11th Ave and 22nd Street". Appending "New York, NY" to the string to let Geocoder know what city the dog run is in I was able to get the correct coordinates.

Geocoder.coordinates("11th Ave and 22nd Street, New York, NY")
=> [40.74819850000001, -74.00740619999999]

You can check here. In fact, enough of the dog runs have addresses that trying to parse the DogRun XML file produced a number of accurate addresses. Storing these addresses in an array allowed me to see how close a given location was to a dog run. Even though it only works a little bit, some of the time.

So I wrote a program to do this. It works kinda slowly but it gets the job done (kinda). Here is a link to it on Github. But when you have a slow dog who doesn't behave all of the time do you get rid of it? Or do you take it to a particularly dog friendly area and have a fetch?