spencer nelson

"How do I become a data scientist?"

I got an email recently asking something along these lines:

I'm a smart ex-engineer who likes stats. I want to be a data scientist. How difficult will it be for me to find a job doing data science work at a startup?

I think a lot of people have this question. It's interesting that it's common given how hard it is to find a quality data scientist. I sent back an email which looked more or less like the following post - these are my thoughts on how to get hired, as seen from the eyes of someone who actually really wants to find more talented data scientists to hire.

Quantitative intuition

The first thing that I'm looking for is general quantitative sense. This is a really low bar. It's not really important whether someone knows the quirks of Cauchy distributions or whatever. All I want is to see lots of confidence in thinking about things quantitatively. Example things I'd talk about:

(answer: retention, obviously - compounding interest is kind of a big deal)

(answer: they should expect a very steep concentration of viewers in a small set of channels, pretty much anyone who has been around large data sets quickly notices the power law distributions that show up everywhere)

Programming skill

The second thing that I'm looking for is programming skill. This is less important than quantitative sense, since it's easier to teach, but teaching takes a lot of time so it's still pretty important. Again, not looking for a whole lot here, mostly willingness to get their hands dirty. Example questions:

(answer: I'm really checking that they don't overthink it here - simple, direct solutions are what we want. This is a really simple problem and they should recognize that, not hem and haw or try to be cute. Regex, subtring matching with python, perl, grep - all are fine. The point is whether you recognize that it's simple.)

(answer I don't expect anyone to say yes to all of these or anything - this just points to breadth of experience, and more importantly it predicts how quickly they could start doing real work, because that summarizes our toolset pretty well)

Business knowledge

The third thing is business knowledge and creativity. This is hard to evaluate, but it's probably the most important facet. Depth of knowledge in our industry is tremendously helpful: it lets someone put their numbers in context and give them a sense of what's reasonable.

And that's pretty crucial. It's really dangerous to have a data scientist who doesn't know what "reasonable" is - they have a tough time identifying bugs or bad data.

For this I usually just ask them to talk a little about what they'd be interested in researching in their first few months at twitch. Bad answer: "I want to use ensemble methods incorporating deep learning to make recommendations!" Good answer: "I want to figure out what differentiates successful League of Legends broadcasters from unsuccessful ones!"

Oh yeah, and also the usuals

The whole time, I'm also trying to evaluate some intangibles, like

Nobody is perfect

This might be an intimidating list of requirements at a glance, but I don't think the bar is very high in any area, really. The hard part is finding someone who is reasonably good in all three areas.

Most startups need generalist data scientists; Kaggle champs who do tons of machine learning just don't have enough work to do at a startup, and analysts who can't program will need too much support to be a net positive. So, the goal is breadth of skills more than anything. I'd bet it's different at much larger companies than Twitch - that's just my experience.

That said, the ideal is a T-shaped person who is waaaaay better than anyone else on the current team in at least one area so that they can help us all get better.

Be a generalist!

Anyways, I hope that gives you some sense of the way I think about things when looking for somebody. I don't think every company would have the same criteria as me - smaller companies will care more about generalism and gumption, while larger companies will look for more specialized people who do one axis particularly well, but I think the overall scheme is pretty common.

Now is a good time to slip in the ad: Twitch is hiring like crazy! It's a great place to work, you'd love it here. Seriously.

So drop me a line: s@spenczar.com (PS: We're looking for data engineers too, not just science-y analyst types!)