Stackline data scientist, Ameya Gokhale, is headed to the SoCal AI & Data Science Conference this weekend to present his hierarchical product classification model. We pulled him away from his work for a quick conversation about data science at Stackline and his career journey thus far.
Tell me about life before Stackline?
During university, I started considering career opportunities in the overlap between what I liked, what I’m good at, and where I could find the most career potential. Data science was a natural fit. Plus it was named ‘the sexiest career of the 21st Century,’ so I figured I had to be on the right track!
But after taking a role in digital advertising management in my home country of India, I quickly realized I wanted to work in a more technical industry, so I applied to Carnegie Mellon’s graduate program — which included three amazing semesters in Adelaide, Australia. After that, I returned to the states to find a research-oriented company where I could really fast-track my data science learning curve.
Stackline was actually one of the first places I applied, and I instantly liked the close-knit culture and the innovative nature of the work.
Data science job seekers need to be really vigilant, because data science is such an attractive field that companies have started advertising data science roles that are in fact more like data analytics roles, but at Stackline it’s the real deal.
From your point of view, what’s the difference between data analytics and data science?
Data analytics leans towards business performance and how a company can grow using insights from data. Data science is the application of machine learning to solve complex business problems.
What projects are you working on here?
My first project deals with classification of products at scale. We get data on millions of products, and we want to classify those products down to the sub-category level rather than just the category level like most other data providers in this space. But how do you reliably determine the difference between a fashion boot and a snow boot, for instance, or a 2-in-1 laptop and a gaming laptop? And how do you do that across thousands, or hundreds of thousands, of similar products?
As businesses grow, product lines tend to grow, and the number of niche products serving niche segments of the market has grown almost exponentially with the rise of online commerce. If I sell 2-in-1 laptops, I don’t necessarily gain a lot of meaningful insight if I’m measuring the performance of my laptops against the entire universe of electronics products. I need to know how I’m competing against laptops more specifically. And even within that, I need to understand how my 2-in-1 laptop is competing against other 2-in-1 laptops that serve the needs of a similar audience. Only when I have accurate data at that level of granularity can I make business decisions that build my competitive advantage.
We developed a hierarchical model of sorting that leads to much more accurate results at the sub-category level. We assign a product to a “group” first with a high level of confidence. Then we assign it a category, then a sub-category. We've also combined two classification algorithms in such a way that we benefit from the positives of both while mitigating the negatives.
How do you feel the data science team at Stackline is pushing out the boundaries of the industry?
Everyone at Stackline feels that their team’s job is the most important. And that’s awesome. That means we are all giving our very best. But data science here is definitely out on the forefront of the industry. Data has been called “the new oil,” and I think that’s very true. The amount of power a company has is highly correlated to the amount of data it has. Data projects are extremely important right now and should be treated extremely seriously, as they are at Stackline.
How much cross-disciplinary collaboration do you have here?
I work with the engineers all of the time to get data to feed my models or to get feedback on the downstream impacts of my work. And I feel like I can approach anyone on the Channel Operations team at any time to understand the business case for our technologies. It’s a very collaborative and open culture.
What’s unique about the way Stackline approaches data science projects?
We’re not afraid of trying new things. There’s a lot of emphasis on R&D. You can’t always expect immediate results from the work of data scientists because half of the job is research. If you can build a successful prototype, you can scale it. But the work upfront takes a lot of time, and Stackline appreciates the full breadth of the process, which is pretty unique. Of course we have deadlines and time is of the essence, but research is also of the essence. We are given the confidence and the freedom to test out different approaches and contribute in our own way.
You had opportunities to join some of the most storied tech companies in the world. What tipped the scales towards Stackline?
I realized during my time at Carnegie Mellon that I like to work in a particular way, with the freedom to explore, try out new things even if they fail, and determine the best possible solution. I don’t want to be told exactly what to do. But in big companies, there is a rigid structure and a lot less freedom to deliver the best solution as opposed to the dictated solution.
I can also take a lot of ownership over my work here and then directly see its impact on the direction of the company. I’m not hunting to find my little snippet of code and wondering if it's being used. I know the work I’m doing matters, and I believe the work the company is doing matters. Nothing can beat that.
What makes you believe so strongly in Stackline?
I feel everyone’s motivation. When I talk with my friends working elsewhere, I don’t hear that they are on teams with the same rigorous work ethic or the same passion for the direction of the business. I see the way people work here, and they clearly think we can build on what we have and the amount of data that we have to create something extremely special.
What does it take to thrive at Stackline?
Be relentless in your work. And be kind.
What does it take to thrive in a technical role at Stackline?
You have the freedom to code in the language you’re comfortable in, but you should be really comfortable with at least one base language.
What are you going to be speaking about at the upcoming SoCal AI & Data Science Conference?
This conference has a great reputation in the data science community on the West Coast, and I had planned to go as a participant. But after getting in touch with the conference organizer — he’s a member of the International Data Science & Data Engineering Association, which I’d also like to join — I learned that anyone can submit a speaking proposal, and I decided to go for it. He let me know a few weeks later that the conference wanted me to host a session on our innovative hierarchical classification model.
What are some of the other themes the conference will focus on this weekend?
Data science is really split into two tracks: the academic track and the applied track. This conference brings those two tracks together for conversations about the market applications of cutting edge research in the field. Many different industries are represented — healthcare, IoT, finance, robotics, retail. It’s a fantastic opportunity to get caught up on the evolution of data science in multiple realms.
I want my conversation about data science in the retail sector to reinforce for my fellow data scientists that hierarchical modeling and highly-granular classification can be implemented at tremendous scale. There’s never been a better time to be in this industry or on the Stackline team, and I’m excited to show off our progress.