Data Science, Vision Quests, and The Raft

18 Dec 2013

Wherein I ramble about a few of the things on my mind as I hurtle eastward at 780km/h.

Data Science: Like regular science, but with … data?

Throughout the last few years there has been a lot of hype surrounding the role of the so-called “Data Scientist” (DS), to the extent that some journalists have called it the “Sexiest Career of the decade” (Google it). A somewhat rare (but less so every minute) specimen, the DS simultaneously possesses several moderately scarce skills, such as fundamental knowledge of statistics and linear algebra, and a proficiency in one-or-more programming languages. They may have even blogged about some application of the former to the latter (or vice versa).

The DS’s role is theoretically that of a scientist; scientists ask questions (form hypothesis) and attempt to divine truth through empirical experiments that produce data.

The Vision Quest

Any project which is not guided by a question with a tangible answer, is analogous to a drug-fuelled vision quest. If something cannot be measured it does not exist. Success is impossible if it has not been defined.

That is not to say that vision quests are not inherently valuable experiences. When engaged on a vision quest (hallucinatory or otherwise) we are exploring an unfamiliar landscape, all the while developing contextual awareness. These can also be fantastic learning experiences for all those involved (but not necessarily the most efficient learning experiences).

Projects develop from questions, and questions are formed/found in vision quests. Before you start a company, I’ll expect that you’ve already returned from the wilderness, and any lingering effects of the many hallucinogens involved have worn off.

If you come back from a vision quest without a question, or at least an interesting story, you had a bad trip.

The Raft

Picture a mysterious island, separated from the mainland by a treacherous or otherwise shenanigan-ridden body of water. The Analyst wishes to explore the island, it’s not entirely clear why (one might speculate that the island is populated by licentious bayesian nudists). At the shore, she gathers the tools that she’ll need for the journey: A pocket-knife, first-aid kit, a working knowledge of postgres, and a copy of “Python for Data Analysis” by Wes McKinney. By using her noteworthy intuition, and through no small amount of frustration, she assembles out of the surrounding wilderness something that resembles a pile of leaves and sticks. This is the raft. After convincing herself (via logical inference) that the raft will indeed float, she throws it into the water, and jumps aboard. As if through providence, the raft doesn’t disintegrate.

Some days pass …

One morning the Analyst returns, ragged and weary, and covered in something that could be described as blood, sweat and tears, but not necessarily limited to those three. A crazed look on her face, she (somewhat deliriously) explains that the other side was wonderful, and that the rest of the world should have the opportunity to experience it.

A group of engineers is commissioned to build a bridge across the water: the project is completed over budget within some reasonable time frame.

Wheels within wheels

Back when I was commuting in New York, I ended up reading quite a bit. During that time I became a fan of Neil Stephenson’s work—Tangent: I highly recommend “Anathem”, and “The Cryptonomicon”.

The idea that really stuck with me after reading the Cryptonomicon, is made even more poignant in today’s world of easily accessible/available compute. I’m paraphrasing, for the actual (much better written) quote you’ll need to read the book:

Imagine yourself in London, and you had attached a special device to the top all the citizen’s heads. This device simply measures the altitude (to arbitrary precision) every second or so—thus you could infer from slight changes in altitude when a person is walking, or when they stepped on to, or off of a street curb.

The postulate: if you combined all the data from all of these devices from as many people as possible, you could infer (with exceptional accuracy) the entire (internally consistent) street plan of London, down to the locations of individual curbs.