Data science is a fairly new thing for most companies; even a few years ago, many didn’t have any on their payroll. Now, though, enterprises might have a large team of people whose job it is to build analytic models and extract insights from the data that is now streaming into these organizations. There are major problems, though, including an inability to share information, causing the data to be siloed and not be accessible to the entire organization. It’s no wonder that 85 percent of big data strategies are currently failing.
Now comes Vectice, a new company that is attempting to solve these problems by giving data scientists the ability to share, collaborate and track their assets on a single platform compatible with the apps the Data Science teams are already using. On Tuesday, the company announced that it raised a $3 million seed round co-led by Spider Capital and Crosslink Capital, along with Global Founders Capital, early executives from Salesforce and other “leading enterprise cloud and data science executives.”
The company, which was founded only in March by former Arrayent CEO and Vator.tv co-founder Cyril Brignone, along with former Lattice Engines CTO Gregory Haardt, came about after they had reached out to numerous enterprises, particularly Fortune 2000 companies, that they had been working with in order to understand how data science was being organized in the enterprise.
“What we discovered in those interviews is that there are a lot of things that we are not working as well as they could be. It’s really hard for those data scientist teams to track their assets and build on their knowledge, especially with the large turn around that there is in those teams. It’s hard for those teams to easily track the assets that they’re working with. It’s not just about code, but it’s also about all the experiments they run, the model they build, the data sets they use, the notebooks they create,” Brignone told me in an interview.
“It’s about tracking all those assets that they have and the walkways and the knowledge on top of them, and then trusting the decisions that are made in all those projects, and then encouraging the collaboration with the business they work with. And all the things I just listed, most of the time they are still done through conversations, through email, through meetings and kind of being traced in a gigantic Excel file, or Google Doc kind of tool. So, it’s really hard for those teams to be successful once they start expanding to tens of hundreds of people.”
Another problem is high turnover, Haardt added, which occurs “because they have a lot of new graduates, and people are easily changing jobs in that profession, given the lack of talent available for enterprises to hire.”
“It’s very difficult for companies even to preserve the existing knowledge, and when they leave the company, all the work they’ve done leaves with that person. The other challenge is around traceability. It’s really difficult for enterprises to understand, when there is this decision that is made, how the team came to that conclusion. These problems therefore affect everyone from CIO, management team, data teams to new hires,” he said.
Vectice’s solution is to take all of the different tools that data scientists are currently using, including GitHub, Airflow, GitLab and Google Docs, and put all of that data onto a single dashboard that everyone can view across the organization.
The point is not to be another tool for these data scientists to use, but to be a repository for all of their information, Haardt explained.
“We want to be one of your systems of record, but we don’t want to be another execution environment for those teams, because they’re already using a lot of different tools to do that and sometimes information is in those tools. So, if you are using some cloud vendors like AWS Sagemaker, Azure ML or Google AI to do your data science experiments, you’re gonna have a lot of information in those systems,” he said.
“The problem with that is the information gets captured, depending on the project, in different systems, and you don’t really have an end-to-end view. You will have a lot of the information in that system, part of the information in a different system, so you don’t have visibility end-to-end on all the knowledge you created along the project life cycle.”
The other thing that Vectice does is asset tracking, meaning it helps data scientists understand the relationship between all their different assets, and how they relate each other to each other.
“If you have a model, what version of the model are you currently deploying in the environment, and what version of the data set have you used? When was the last time it was refreshed? Just to be able to understand all the relationships between the assets you have and how they were built, who built them and so on. That’s what people will refer to maybe in some other industries as metadata management,” said Haardt.
“You can continue, as a data scientist, to use whatever tool you want to use; we’re not replacing your tool, we are giving you a way to have a global view of all of your assets across your tools. And one of those assets is knowledge. One of those assets is, how do they relate to each other, meaning what data did I use to train which model? What data did I use to test which model? What experiments did I do on that data that came into building that model? Which of those models are running in production? Which of those following are not?” Brignone added.
“There is no solution today to track those assets, there is no solution today, that we can see on the market, to help you figure out the relationship between those assets, and there is no solution today that helps to push the knowledge sharing of your team on top of those assets and the relationship between them. So it’s about managing all of that metadata and knowledge on top of your AI.”
Since Vectice is less than two month old right now, it doesn’t have any customers yet, but it has been working for the last six or seven months with around 30 Fortune 2000 customers (the names of which could not be disclosed at this time) to understand their problems and how to fix them.
The reason that the company is going after larger clients, and not SMBs, is because the solution is geared toward large teams of data scientists who work in different divisions, Brignone explained.
“When you have 20 or 50 data scientists, and some of them will be working a week with the procurement team of that company to help do a new model to facilitate and save the procurement and supply chain, but the following week that same data scientist may be working on another project for marketing, for things that are happening on the website. Those data scientists serve multiple functions, and a project with many of the functions of the company, so it’s impossible to scale it if you don’t have a way to organize all your assets, organize the link between those assets and organize your knowledge on top of them,” he said comparing to a CRM, where a small company with just one or two salespeople might not need that kind of software, but a larger company would.
“Once you have 20 sales guys, how do you manage all of the accounts, all the contacts in those accounts? How do you make sure that if they leave that you don’t lose everything, all the knowledge about how to sell to those accounts? How do you make predictions about what’s going to be the revenue? What do you know if you don’t have a CRM? It’s impossible. And same thing with data science, if you only have one data scientist then maybe you don’t need such a big system, but if you have 20 or 30 of them, then it’s critical, we believe, for the success of data science in the enterprise.”
The company will be using the new funding to build its product, with the expectation of launching a private beta in August. That means also expanding its team by hiring more engineers and sales people; Vectice currently has an headcount of 14 and plans to more than double that number by the end of the year.
Ultimately, both founders believe that data science is going to be crucial to the success of the enterprise in the years to come, and they want Vectice to be the company that allows them to make the most of those assets.
“We believe that data science is going to become essential to the success of most enterprises, and we want to equip those teams to succeed as quickly as possible. Everybody that we’ve seen in the industry is working on creating a better way to run a model, a better way to do a prediction, so all on the technical layer. We believe that for this new function in the enterprise there’s a fair amount to be done on the software to support the people and support the team to do more with the people they have,” said Brignone.
“What we want to concentrate on is equipping those teams, not with a way to do the data science itself, or the implementation of the model itself, but to align better with the business side enough to have a bigger impact on the business.”
Added Haardt, “Data science is central to the future of the enterprise, and we see data science as a team sport. And the only way we’re going to enable this trust and this predictability into their data science initiatives is by providing supporting tools and supporting mechanisms for the enterprise to leverage those data science teams. So, the long term goal is to be this driving function to enable those enterprises so we get them return on their investment and build trust and get everybody to collaborate around those data science initiatives.”
This article originally appeared in Vator.TV