The Simple Act of Integrating a Weather API

The new LPGA website has launched. I talk about one of the features that should have been pretty easy, and all of the tools I used to solve it :)

Over the course of development of the new LPGA website, one requirement came up that really didn't cause me any concern at all. Display "real-time" weather for a live tournament. The third party API landscape is rich. Since it's part of an official project, if I need it, I can get a paid subscription for something if it comes to that. However, this wasn't even our first option.

Initially, the LPGA's data team, who I like and deal with a bit, said they would bring in the weather to their database that I pull all of the data from. Data like stats and past tournament data, results, even scorecards. There are millions of records. So that was great. We waited on that. However, when it was added to their API, I noticed it was not updating. So I asked. Their intention was to pull in the weather at the end of the tournament, since they're not getting a real time feed either, they just get it in a single dump at the end from their scoring partner. However, our requirements were a bit different, as we wanted to show the weather on the website as a tournament is going on. I needed to find something else.

Enter tomorrow.io. I figured anything will do, if we don't end up using tomorrow.io, I can keep the interface the same and just pull the data from a new API. Development plowed forward. I got myself an API key, read through the documentation, built it into my data sync tool to store the tournament data and the weather by date, set it to run every 30 minutes. The downside of the free tier API key that I got, I'm only able to make 4 requests per hour. So between the LPGA and Epson Tour, they are running two times every hour on each tour, if there's a tournament. So in the end it'll average much less than 4 per hour over an extended period of time. This was ready! So easy :)

Tomorrow.io has a few options to look up the weather for a specific location. It can take a city name, or a US zip code, or a UK zip code, or latitude and longitude. LPGA is a global organization with tournaments happening all over the world every year. Those simple options would not suffice, but I did try passing some international locations to it without success.

Quickly realizing that this won't work, I set out to find a suitable option for getting the coordinates of any location. I've done this before with geolocation APIs. I wanted to go a different route. With my search I found geonames.org. It offers a ton of data! However, they are downloadable text files (great), but it offers a ton of data that I'm not really interested in. I left myself the most options and downloaded the file that indicates that it has data for all cities with a population of 500 people or more. That's a TON of cities. The zip file for it is 11 MB and it contains a single text file that is 35 MB! It's big.

So I looked to extract the data that I wanted and put it into a format that would be simple. This lead to my Go project that I simply called geonames. There are also a lot of different file formats that geonames.org offers, for different types of data. This probably only works against the cities files. It reads the data and outputs the city, country, state, time zone, and the coveted latitude and longitude. Geonames is a great resource! The other issue I ran into at this early stage was that the country code in geonames.org files are two characters, but my data source, the LPGA data, has country codes that are 3 characters. Ok, I'll need a mapping. Luckily, I found this almost immediately.

So now my process is this:  take the country from the tournament, find the 2 letter code, take the city, optional state, and 2 letter country code, and find the location to get the coordinates, use that to get the weather. DONE!

This was obviously not done. The first issue I came across, was when a tournament was in a city that didn't appear in the geonames.org database. For the curious, this was Galloway, NJ. I checked the population, and it's over 500... (35K). So I don't know why it wasn't in there. The problem with the cities500.txt and even my generated JSON which is still 24 MB, is that you can't just open it in a text editor and search it. So I actually wrote other utilities to find certain strings within the file and output the entire line. But I digress.

So instead of seeing how Galloway NJ might be represented in the cities500.txt file, I just decided to modify my geonames program to be able to add cities. This takes a separate manual.json file as its input, which is the same format as the output, but with data pre-populated. It just adds them to the list after processing the cities500.txt file.

While an extra step, a lot of tournaments are typically in the same city / same course every year, so this won't have to be updated too frequently. Now I'm DONE!

Not quite :)  I came across my first international tournament. It being international shouldn't have been a problem, and if the entire world spoke English and used only letters in the English alphabet, I would have been done.  But of course, that's not going to be the case. And geonames.org database and text files are UTF-8, and they use the local spellings, while the LPGA database takes a city like "Évian-les-Bains" and turns it into "Evian-les-Bains, France", same names sans accents. Back to my geonames program!

I googled for a little bit, and learned about the Go package transform. This offers methods to do exactly what I need, which is to turn accented characters in a string into the closest representation in ASCII. I used it like this

func clean(name string) string {
	result, _, err := transform.String(transform.Chain(norm.NFD, runes.Remove(runes.In(unicode.Mn))), name)
	if err != nil {
		log.Println(err)
		return name
	}
	return result
}

It transforms a string using norm.NFD (norm from the UTF8 package), unicode normalization form D, where the rune is in unicode.Mn, or "Mark, nonspacing". I don't really know much about UTF8 normalization or other specific usages of the transform package, but it mostly worked. Evian-les-Bains was in my database, not Évian-les-Bains. This time I might be done, finally!

It has now been weeks since I've had to touch this solution, and we've been running it in testing against many tournaments throughout. No doubt I'll find cities that don't exist in the cities500.txt database, and those will have to be updated. Maybe I'll try to automate that as well, or provide an admin interface to update my cities.json file! Lots of options. In the meantime, enjoy weather on the new lpga.com tournaments!

Happy coding!

blog comments powered by Disqus