Technology
Robin Sutara on Responsible AI, Governance, Diversity, and People Behind Data
In this episode of Future of Data and AI, host Rajay Kval interviews Robin Sutara, Chief Data Strategy Officer at Databricks, about her unique journey from Apache helicopters to the forefront of data ...
Robin Sutara on Responsible AI, Governance, Diversity, and People Behind Data
Technology •
0:00 / 0:00
Interactive Transcript
spk_0
I ran out of funding for my university and I ended up in Listine in the US Army.
spk_0
I told everybody when I got out of the army, I was going to go work for Microsoft.
spk_0
I always tell people I've gone full circle now, Apache to Apache.
spk_0
The tagline for your career from Excel to Data Lake House.
spk_0
Data Bricks has done something all of the work actually.
spk_0
I can almost tell always when I walk into an organization, depending on what team is using
spk_0
Data Bricks platform, on how long they've been with us.
spk_0
Our enterprises fine-cuning their models or are they building their custom models?
spk_0
How do you see AI adoption actually shaping the future of workforce?
spk_0
We can solve some amazing problems with Data AI, but it does require not just the data being
spk_0
diverse, but the people being diverse. My 20-something kid says, well,
spk_0
Chatshipy T said this happened in the 80s and I was like, uh,
spk_0
baby, I was there in the 80s. I can tell you for sure that did not happen now.
spk_0
I will AI result in job elimination, job creation,
spk_0
the job displacement. Job evolution.
spk_0
Hello everyone and welcome to future of Data and AI.
spk_0
I'm your host Rajay Kval. My guest today is Robin Sotara.
spk_0
Robin is the Chief Data Strategy Officer at Data Bricks.
spk_0
Robin has previously worked in key roles like Chief Data Officer for Microsoft UK
spk_0
and Chief Operating Officer of Azure Data Engineering.
spk_0
It is my pleasure to have Robin on this show. Welcome to the show, Robin.
spk_0
Thank you Rajay. So glad to be here.
spk_0
So I was looking at your background. You started your career as a technician for Apache Hercoptors.
spk_0
And now you are in data, and AI roles at Data Bricks. So tell us about the journey. I mean,
spk_0
I would not say it is unusual, but interesting, right? So I mean, from Apache Hercoptors,
spk_0
due data breaks in a very important key role. So tell us about the journey.
spk_0
Yeah, I think it maybe isn't unusual these days for people to come from differing backgrounds
spk_0
into data. I think I am a bit eclectic in that I didn't start as a traditional sort of
spk_0
undergraduate STEM background with the caveat that I actually did start university for studying
spk_0
computer engineering. But at the time, I think AI was really random number,
spk_0
generating the first in chipsets was really what people were focused on. I mean, I always tease.
spk_0
I sort of coded my first coding project was in Ada and Fortran. So I don't even know that people
spk_0
use those languages anymore. So I really did think I've always had sort of this interest in technology
spk_0
and this interest in and sort of that background. Unfortunately, due to a lack of funding,
spk_0
I had to figure out an alternative route to sort of enter into the career field. And so,
spk_0
after two years of school, I ran out of funding for my university and I ended up in Listine
spk_0
in the US Army, which is where based on testing, they sort of placed you into roles. And I was
spk_0
fortunate to land at the time with the Apache AH-64 helicopters doing the electrical and weapon
spk_0
systems. And so it was very much still gravitating toward my passion around technology.
spk_0
But how was I going to be able to turn this into a career? And so I was really just turning
spk_0
sort of screwdrivers, right? Loading hellfire missiles, loading the 50 millimeter machine guns
spk_0
with station and career on the DMZ for a while. And so I really just sort of focused on
spk_0
how is I going to serve my time in the military so that I could use my GI Bill and eventually go back
spk_0
to school for technology. Interestingly, at the time though, while I was stationed in Fort Campbell,
spk_0
Kentucky, Microsoft came out with this amazing product called Excel, which I'm sure everybody
spk_0
at some point has used that as their database of choice. But because it was still relatively novel
spk_0
and new, we were trying to figure out, particularly at my duty station, how are we going to use things
spk_0
like Excel to be able to track things like maintenance records, Apache helicopter parts, to be able
spk_0
to deliver. And because it was a computer sort of typing job, they thought it fell to the girl.
spk_0
So I think now they all sort of regret making that decision, but it was wonderful for me to have
spk_0
the opportunity to figure out sort of that was my first, I think, footstep into data and to say,
spk_0
oh, wow, we can really start to think about how to re-optimize our processes, how do we make
spk_0
sure we have the right parts at the right place at the right time to be able to do these repairs as
spk_0
efficiently as possible. And so I told everybody when I got out of the army, I was going to go work
spk_0
for Microsoft and they just really never thought that that would happen. So when I got out, I went to
spk_0
night school and did my MCSE while I did computer hardware repair during the day. And then I was super
spk_0
fortunate to get an opportunity to interview for Microsoft to come in and do IE5 support on Windows
spk_0
3.1. And so I got hired into Microsoft in the late 90s and then I had a fabulous 20 plus year
spk_0
career in various sort of roles, starting in technology and moving back and forth, I think,
spk_0
between business and technical sort of roles. As you mentioned, sort of my last two roles were
spk_0
those that were really trying to help Microsoft think about their internal transformation and how
spk_0
were they going to use data and AI to be able to deliver on those transformational goals that
spk_0
Satya and Nadella came in and had for the company. And so I had the opportunity to be a chief operating
spk_0
officer for Azure Data Engineering, which is the group that owns everything from SQL Server on
spk_0
PrM up into the point of, at the time up into the point of visualization. So as all the database is
spk_0
the warehouse, ingestion tools, governance via purview, etc. So very, very exciting times as they
spk_0
were looking to do exponential growth. How could I help them drive the business to be more data driven,
spk_0
as opposed to just being conversational in their decision making? Could we really sort of ground those
spk_0
decisions and data? And then based on that role that I got asked to move to London in the
spk_0
United Kingdom and serve as the chief data officer, where my role was sort of half internal facing
spk_0
on helping the organization be more data driven, focus on data and AI and the capabilities of the
spk_0
platform had to help the company operate internally and then externally. How could we represent that
spk_0
to the organization, give feedback into the product group on what customers were trying to do?
spk_0
And then as you mentioned, last two and a half years now with Databricks in this role as the field
spk_0
CDO, which essentially means I get to travel the world and advise organizations on how to
spk_0
partner with Databricks as they think about using our data platform as their foundation for
spk_0
their data and AI transformations that they're looking to do internally. How do they think about not
spk_0
just the technology but the people, the process, the organizational design, operating models, etc.
spk_0
So can I bring the 20 plus years experience in Microsoft, bring some best practices from the 12,000
spk_0
customers of Databricks to really help our customers be successful? That is great. Thank you for the
spk_0
overview, Robin. So as you were describing it to me, I had this, I was almost thinking like,
spk_0
you know, the tagline for your career from Excel to Data Lake House. If you look at this, you
spk_0
started with Excel, that was your first job and you're advising people how to scale. So
spk_0
beyond those million rows, I think that was. I always tell people I've gone full circle now,
spk_0
Apache to Apache, right? So I've gotten Apache Holocaust. That's very interesting about you to
spk_0
tell me. I love that. That's also a good one. So you work as a chief data officer for Microsoft,
spk_0
okay? And then now you're working in the US. So you have been on both sides of the ocean, right?
spk_0
So what similarity is that dissimilarities and anything that comes up as a result of that in terms
spk_0
of enterprise adoption of data and AI and Europe versus the United States, right? I think regulations
spk_0
are the biggest difference. Yeah. Maybe any other thing that you see in terms of enterprise adoption
spk_0
of AI or for that matter, you know, data and any challenges that come with journey. I think
spk_0
that's a great question. I actually think, again, because my experience is relatively limited to
spk_0
developed countries as opposed to developing, I would probably say my point of view is maybe narrow
spk_0
in that aspect. Because if I think about the UK and many of the EU countries that exist as well as,
spk_0
you know, sort of Canada and the US, Mexico are my primary areas where I've worked in. For many of
spk_0
them, they are a developed country. And so the problems that they're facing are actually relatively
spk_0
similar. So you brought up regulatory requirements, etc. Right? If I think about the EU AI Act and GDPR
spk_0
and sort of all of those things, and then I look at the US and think about, while we don't have the
spk_0
equivalent of the EU AI Act, right, we do have state legislation that emulates similar sort of
spk_0
requirements, regulatory requirements at this point. You see the EAA or foreign.
spk_0
The BPA Delaware has a version. I think even right now, you'll see there are multiple states,
spk_0
like California, Delaware tends to lead in this space. States had created some level of AI
spk_0
regulation waiting for federal legislation or regulation to come into place. And so while we say
spk_0
there, you know, they are different countries and there's different sort of cultural, you know,
spk_0
expectations and backgrounds in organizations, which I would say is the biggest difference is
spk_0
right. British organizations are definitely different culturally than American organizations,
spk_0
which are different than Canadian or different than, you know, Mexican organizations and companies
spk_0
that we work with. And so if I think, well, even between Germany and the UK, right, like there
spk_0
is a difference, I think, with the people when you think about a transformation, but when I think
spk_0
about the technical requirements for many of those countries, it's very similar, right? So the
spk_0
regulatory requirements may be slightly different, but for the most part, they're all talking about,
spk_0
you know, explainability, transparency, Indian lineage, you know, the impact of tabbing on consumers
spk_0
and can you be able to articulate that all the way from the data sets that you use to the models,
spk_0
to the algorithms within the models, the weightings, etc., to the data products and services.
spk_0
And things like GDPR, CCPA, if a consumer wants to be forgotten, do you have the technical
spk_0
capabilities to be able to deliver against that expectation, the consumer expectation or citizen
spk_0
patient expectation, whatever it might be? And so that tends to be very, very similar in developed
spk_0
countries, regardless of where you reside. But I do find, for most organizations, the biggest
spk_0
difference that they're struggling with are the expectations of their employees, the expectations
spk_0
of their consumers, etc., in the data, any products and services that they're delivering for them.
spk_0
So let me maybe give you an example. So when I moved with Microsoft from the US to the UK,
spk_0
it was actually during the course of the pandemic and it was one week after Brexit. So I landed in
spk_0
the UK in London, immediately had to go into lockdown and there was very little products on the
spk_0
shelf at the time because most of the lorries were being stopped at the border because they hadn't
spk_0
figured out that whole EU UK movement of the supply chain. And so it's super sort of fascinating
spk_0
to think about, okay, how much information, despite the fact that I now live in the UK, where I have
spk_0
much broader protection as a consumer, how much personal information or health information
spk_0
was I willing to give up to NHS or Tesco as a retailer to get things like my groceries delivered,
spk_0
because there was such a shortage of supply. So I think examples like that sort of show us that,
spk_0
yes, there are regulatory requirements, but situational requirements may create this in environment
spk_0
where you're willing to rethink what information you're willing to disclose, what you're
spk_0
willing that information to be used for. And now that we've come out of the pandemic, you're
spk_0
seeing sort of maybe a re-hardening, I think of some of those GDPR requirements, particularly being
spk_0
enforced out of Europe, less so in the US. I think as we continue to go through regulatory
spk_0
decision-making with the new presidential cabinet and members that exist today, there is still
spk_0
a little bit of uncertainty for us. And so I think it's always interesting to watch those global
spk_0
dynamics and what you're willing to opt into are out of and how it impacts an organizational
spk_0
decision on how they're going to use data or AI to deliver. So then enterprises, let's actually
spk_0
continue the discussion here. So there's the regulatory environment, do you think that the
spk_0
difference in the regulatory environment that can make a difference in terms of how adoption
spk_0
happens because EU tends to be very conservative when it comes to how to govern AI and data.
spk_0
Do you think that imposes some barrier on how enterprises are going to use AI?
spk_0
For some organizations, it can slow the pace of innovation, but to be honest with you over the
spk_0
last 18 months, I think since then, this generative AI hype that has happened, most organizations
spk_0
are actually trying to figure out practical applications of how to leverage AI in some capacity,
spk_0
whether it's internal facing process optimization, employee empowerment, whatever they're looking
spk_0
to deliver, even in the EU, or maybe they're less likely to do a customer-facing application because
spk_0
of those risks or regulatory requirements. It doesn't mean I think that they're slowing down the
spk_0
pace of innovation. I think they're just rethinking the application of those AI capabilities.
spk_0
And so how do they do it more internally where they can minimize their risks, etc. But I think,
spk_0
to be honest with you, I think with the GDPR has been around for a significant period of time,
spk_0
EU AI Act was communicated, I think, relatively early. And so for many organizations,
spk_0
until I think they see litigation around the EU AI Act and how it's actually being enforced for
spk_0
many of them, I don't see it actually slowing down their innovation. I do think they are being
spk_0
cautious on making sure that they have the right components in place. They have the right reporting
spk_0
in place. They have the right capabilities that should they be questioned or should the
spk_0
regulator come back to ask about a specific application or AI execution implementation that they're
spk_0
doing across their environment. I think they are making sure that they're taking that into account.
spk_0
But I don't see actually a difference in the work that I have done between the US and the EU
spk_0
or the UK in sort of their pace of innovation or their want to drive or desire to leverage AI to
spk_0
innovate just as quickly as the other side of the pond. I don't see it being prohibitive
spk_0
for organizations to actually execute against it. In fact, most US organizations are slower than
spk_0
what I see in the UK or the EU because there's still a lot of uncertainty in the US on what is the
spk_0
regulatory requirement going to be and particularly in regulated industries like financial services,
spk_0
healthcare, you know, public sector government where there's higher levels of scrutiny,
spk_0
there's maybe almost a slower pace of innovation there for those customer facing applications that
spk_0
we saw 18 months ago in the UK or the EU. But I don't see there being a big difference between
spk_0
organizations, how they're executing what they're executing or their pace of innovation, to be honest.
spk_0
That's a very interesting view pointer because usually so what I hear from you and please correct
spk_0
me if I'm interpreting it correctly, what I hear from you is that the absence of regulation is
spk_0
actually slowing things down as opposed to it is accelerating. You know, because sometimes I mean,
spk_0
are we here about like a EU is they're creating a lot of regulations but you know, and which is
spk_0
slowing down how innovation can happen because in the way what I hear from you is slightly different,
spk_0
right? Because they're clear on what regulations are there that's actually allowing them to adopt,
spk_0
especially in more regulated industries like healthcare and finance.
spk_0
Yeah, in broad strokes, yes, right? You will always have, there will always be a difference between
spk_0
organizations, their appetite for risk versus innovation. I think every company is trying to decide
spk_0
what that balance is and what is the right thing for them to do what, you know, how much risk are
spk_0
they willing to sort of take on? Databricks has done some phenomenal work actually. We have a field
spk_0
Sizzo that has put together an entire security framework that takes into account 68 attributes of AI
spk_0
that organizations should think about and come to agreement across legal compliance, the business,
spk_0
IT, etc. And how much of that right sort of establishing what is that risk versus innovation
spk_0
balance that they're willing to take on in an effort to actually leverage and execute against
spk_0
their AI strategies. And so as much as I'm making sort of a broad statement, I would say every
spk_0
organization has to determine for themselves how much risk appetite they're willing to take on
spk_0
to allow for a pace of innovation because it is a give or take, right? And making sure that they're
spk_0
providing it. But I do find that for most EU organizations, because they have those standards that
spk_0
they're going to be held accountable to and the monetary right implication of not complying could
spk_0
be relatively significant, that almost lays a base, you know, a basis for them to be able to leverage
spk_0
whereas in the US, we have an executive order, the executive order is no longer, you know,
spk_0
in effect, there's just there's a lot of uncertainty. And so for many of them, unless they have a state
spk_0
legislation or regulation that they're trying to comply with based on their business operations
spk_0
within that state boundaries, for most of those organizations, it is a lot more of how do we balance
spk_0
innovation and the pace of innovation while minimizing our risk of our exposure for whatever the
spk_0
legislation will be at the time that it comes out. But again, for most of them, if you look at the
spk_0
legislation, regardless of whether it's state or in the case of the EU or the UK's version of the EU
spk_0
AI act, for most of them, the fundamentals of it are the same, right? There has to be traceability,
spk_0
explainability, you know, transparency, like there's just fundamental things that are required of
spk_0
whatever they are. And so for those organizations that have a little bit more of a risk appetite
spk_0
to innovate, they're still executing with those foundations in place to protect themselves as
spk_0
regulation and legislation starts to get decided outside of the EU. And so I don't know that I see
spk_0
either side of the pond completely slowing down on the pace of innovation. If anything, I think
spk_0
the pace of innovation in the technology space is now creating an environment where organizations
spk_0
can leverage technology in better ways to be able to execute whatever pace of innovation that they
spk_0
want to execute on. So, I mean, is it safe to say that more than regulatory environment, it is the
spk_0
industry, perhaps the culture of the company in the company size, the risk appetite that dictates
spk_0
how quickly they adopt as opposed to. Correct. And then probably the one other factor I would add in
spk_0
there would be the culture of the organization. Like you said, the company themselves, because I
spk_0
have worked with some organizations who have created some amazing innovation, but they can't get the
spk_0
business users to actually use that data product or service or AI that they've created within the
spk_0
organization. And for many of them, it's because they forgot about the people and they forgot
spk_0
about the change management that would be required or how to bring the organization along on the
spk_0
evolution of the innovation that they're trying to execute. So it's a complex, I think.
spk_0
Yeah, it's a complex and powerful formula. And I would love to just tell you in broad terms, but
spk_0
I don't see any particularly country or region or area of the world, at least in my travels and
spk_0
interactions that are moving significantly slower than anybody else. Okay, that's great. And
spk_0
thanks for elaborating on this. In terms of when you look at Databricks, it is interesting that Databricks
spk_0
started as a machine learning company. And then for a long time, people almost forgot that they
spk_0
started as a machine learning company and they became the data platform. So the data platform of
spk_0
choice, significant player in that space. And then now going full circle. So now again, AI,
spk_0
so tell us how is Databricks actually adapting to this? Because they really are seen as a
spk_0
data platform at the moment. So at least that's how I look at it. I mean, maybe there are others who
spk_0
look at it as a ML company, but yeah, I mean, for a long time, they have been the data platform.
spk_0
Yeah, I can almost tell always when I walk into an organization, depending on what team is using
spk_0
Databricks platform, but on how long they've been with us, right, as a customer. So as you mentioned,
spk_0
I mean, the company was founded 11 years ago, five PhDs out of UC Berkeley, the creators of Spark.
spk_0
So really, they were trying to solve the big data and ML sort of issues that organizations
spk_0
were struggling with. I think early on, they realized that it wasn't just unstructured data in the
spk_0
lake that they needed to be concerned with. They was also the structured data. And so they went from
spk_0
that, how do we solve this big data ML problem to how do we now help customers have a data platform
spk_0
that allows them to bridge the gap between their structured and unstructured data. And that was
spk_0
the creation of the leak house right, eight years ago, and being able to, I think, evolve since then,
spk_0
that we are the data platform of choice for many, many organizations. And so it's been interesting
spk_0
that if you walk into a company primarily used on the data science, they've probably been with
spk_0
us since the beginning. If you have your data in, if it's the data engineering team, that's the
spk_0
strongest group using the Databricks platform, they tend to be the lake house sort of era of users
spk_0
that came into the company. And today, right, if I think about it, we have actually thought about
spk_0
how do we continue to evolve the platform leveraging the technical capabilities that exist today
spk_0
that didn't exist when the company started 11 years ago. And so a lot of that has been
spk_0
how do we apply AI and generative AI and genetic systems, etc. within the platform. So there's
spk_0
sort of two problems that we're looking to solve about we did the acquisition of mosaic ML about two
spk_0
years ago was that announcement. And we really had to rethink like what does a data platform of the
spk_0
future look like? It's no longer sort of this separation between AI and BI and sort of engineering
spk_0
versus data science, etc. And so while we had broken some of those barriers with the lake house,
spk_0
I think what we're thinking about now is a data intelligence platform. And there's two factors
spk_0
to that. I think one is how do we help companies think about the AI that they're looking to build,
spk_0
right? And how do we make sure that they can do that on their structured and unstructured data?
spk_0
So built on the lake house foundations. But how do we think about the pace of technology that's
spk_0
happening in AI now? So the top model of today isn't necessarily going to be the top model of tomorrow.
spk_0
And so for many organizations, it was, can we create a platform that allows you to do in data science
spk_0
in a way that allows you to do things like ML ops or LLM ops or swap models in and out depending on
spk_0
you know what had the best return on your investment based on the sort of solution that you're
spk_0
trying to solve for? How do we do things like compound or agentic systems like can the platform
spk_0
support that natively building it in? Because again, the intent of Databricks has always been,
spk_0
how do we help companies minimize the amount of data copies that they have to create? How do we
spk_0
help them break down those silos? So it's no, it doesn't do any organization any good. If they
spk_0
constantly are having to copy data in and out to be able to create a new model or a new AI solution
spk_0
or a new data product to be able to deliver. And so I think foundationally we have thought about
spk_0
how do we build on the lake house and help companies with their own AI goals, objectives,
spk_0
admissions? But how do we also leverage AI within our platform? So you know when I started with
spk_0
the company two and a half years ago, it was very much about how do we make data break simple? How
spk_0
do we make sure it stays an open platform so that we're not you know locking organizational data
spk_0
into our platform that they can move it as they need it or be able to use it with other tools
spk_0
and solutions that are built on open systems? How do we make sure that we're doing that cost
spk_0
effectively? Now it's a lot of how do we make the platform smarter? How do we actually leverage
spk_0
AI inside of the platform? How do we disrupt ourselves? I'm not you write, our CEO essentially
spk_0
stood up, you know, a year and a half ago when chat GBT and large language models were really sort
spk_0
of at the precipice and said if we had to recreate data bricks, how would we disrupt ourselves?
spk_0
How would we do things differently? How would we rethink how we actually built some of these
spk_0
products? And so it's been a phenomenal pace of innovation for data breaks every last 18
spk_0
months to really think about how would we have done things differently? How do we make sure that we're
spk_0
creating a platform that's bridging the gap between your structured data, your unstructured data
spk_0
with some level of governance and control so that you have full indian lineage and explainability
spk_0
and enforcement of policy, etc. Across not just structured data, but also your AI assets,
spk_0
your notebooks, your models, etc. And so for us it's really been about how do we disrupt in
spk_0
that space? And so how do we leverage AI in the platform to do things like understand business
spk_0
semantics so that we understand, right? So for example, data bricks, employees are called
spk_0
bricksters. So if I type in right into a natural language interface, brickster, I'm not looking for
spk_0
you know a Lego, I'm looking for a data brick employee. And so how do we, you know, leverage AI in
spk_0
the platform to be able to understand that what does revenue mean, what is our fiscal year,
spk_0
where's a quarter, how do we define, you know, a Mia or America is, etc. And the other thing is,
spk_0
how do you actually leverage the platform to make it more cross-effective? How do we make sure that
spk_0
we're helping organizations, one of the biggest issues continues to be just infrastructure
spk_0
management, right? Turning clusters or servers on and off or setting up work spaces or optimizing it
spk_0
or prioritizing jobs or optimizing your queries, etc. So how do we leverage AI in the platform to do
spk_0
automate some of those things so that organizations can focus their talent, the limited talent that
spk_0
exists, right? How do they make sure that they're focusing them on the right thing, which is solving
spk_0
problems for the business? It's not managing infrastructure. And so I think you'll continue to see us
spk_0
innovate and disrupt in that space. How do we hold companies create AI? And then how do we use AI
spk_0
in the platform so that it's intelligent about the organization, about sort of the things that
spk_0
matter? Yeah, so let me expand on this example, Bricksters example, right? So now,
spk_0
they can be companies specific, jargon, they can be companies specific terminology. And for that
spk_0
matter, companies intellectual property, their proprietary data that is sitting inside the platform
spk_0
and hopefully, hopefully, open AI and other models, they don't have access to it. And then they
spk_0
so that knowledge is not built into your, whether it is a close source or open source models,
spk_0
right? Now, and that's why Rag or as we call it retrieval augmented generation, those kind of
spk_0
approaches exist. You find you in models and sometimes you build domain specific, you know,
spk_0
very specific to your company and your own custom models. So does data bricks? And for that matter,
spk_0
let's, even before I go to data bricks, I mean, so for your businesses, I mean, do you see the
spk_0
appetite for, you know, drag being the platform of choice are enterprises fine tuning their models,
spk_0
or are they building their custom models? What do you see, actually? How are they dealing with their
spk_0
own proprietary data when it comes to building applications? So maybe just one maybe correction. So
spk_0
based on an organization's usage of the platform and leveraging the metadata that goes into
spk_0
Unity catalog, we do know those things. Granted, it's just within that organization and that workspace.
spk_0
But because of that, we are able to extrapolate what do we think this table does? We, you know,
spk_0
can we start to use AI to do things like row and column tagging? Like that, right? But again,
spk_0
all of that, like you said, is in organizations and electoral property and so it exists within
spk_0
their instance of Unity catalog within their workspace so that we can leverage the metadata in
spk_0
that way. And the platform is able to then expose that to them in a way that allows them to actually
spk_0
apply. What that is is an example, though, of, you know, essentially what we call compound systems,
spk_0
right? I don't think for most organizations, I do think, right? At the very beginning, everybody
spk_0
thought, oh, we were going to be able to leverage these proprietary models and we're just going to do
spk_0
prompt engineering. So how do we create everybody to be a prompt engineer? Then they realized, oh,
spk_0
wait, that's really expensive to leverage a thing like open AI for every use case and not they
spk_0
don't necessarily have the domain knowledge of our organization or we don't want to give access
spk_0
to our intellectual property, right? That feeds potentially back into the model or, right? We also,
spk_0
I think the example of Samsung is maybe one of the most famous where they employees, you know,
spk_0
were really trying to do the right thing for the company, put trade secrets into chat,
spk_0
GBT, and essentially now exposed that outside of their organization. And so I think they're still,
spk_0
again, talking about that risk versus innovation. I still think there is some level of
spk_0
risk aversion to leveraging these big open source models. Even if you're doing a rag implementation
spk_0
against those open source models, I do think for many organizations, there's still this question
spk_0
of how much control do we have and how much are we exposing things that are really, really
spk_0
are intellectual property and are essentially, you know, the foundation of our company and why we
spk_0
exist and what makes us unique or valuable compared to our competitors. And so for most organizations,
spk_0
it is some level of compound AI systems that they're creating, they're leveraging open AI or,
spk_0
you know, proprietary models to be able to solve some part of the problem. They're creating their own
spk_0
models to be able to write and actually doing a net new creation of models via open source capabilities
spk_0
or even things that they're building in-house to be able to deliver. Can they execute a piece of it
spk_0
with a small language model or a leverage rag against a piece of it? Then the question becomes,
spk_0
how do you leverage the platform to tile those pieces together so that you're actually then able
spk_0
to deliver some value proposition? So if I think of examples like insurance is a great example,
spk_0
like how do you actually process claims? So it's a very simple, you know, not a simple,
spk_0
you write aid, but it's very much comprised of very similar sort of problems that they're
spk_0
trying to solve. Can you do some level of document extraction? Can you do some level of, you know,
spk_0
OCR capabilities to be able to read handwritten notes from the insurance adjuster in the field? How
spk_0
do you tie that together with, you know, some level of computer vision or unstructured data via
spk_0
pictures that get uploaded? How do you now actually validate that those pictures are not deep
spk_0
fakes and that they're actually legitimate? Right? I think we saw this rise of insurance companies
spk_0
dealing with false claims with AI generated, you know, pictures of traffic accidents that they
spk_0
were paying out on. So I think that's a great example of insurance companies have actually started
spk_0
thinking about how do we do things like fraud and solve for those cases? And it's a compound system.
spk_0
It's leveraging open AI for the right component of that. They're creating some level of capability
spk_0
internally leveraging rag on top of that and then being able to pull those together into a
spk_0
end-to-end compound system to be able to leverage agente capabilities to automate some of it and
spk_0
then the human and the loot but the end to validate et cetera. But it's literally saving hundreds
spk_0
of thousands of hours, you know, claims, adjusters to be able to process that volume of information.
spk_0
And I heard this agente AI and multi-agent systems a few times in the conversations so far.
spk_0
So is data breaks more around, you know, bringing in the current existing ecosystem? So, you know,
spk_0
you have frameworks like Langchain, Lama Index and the most notable ones and there are others too.
spk_0
So, are you have something your own frameworks for multi-agent collaboration or you have
spk_0
probably the platform has it built in? I'm just curious, right? So because I work on the
spk_0
technical side of it, we have a bootcamp as well where you teach people. I'm just curious. I mean,
spk_0
how is data breaks approaching it? Yeah, so we have frameworks that are built into the platform
spk_0
to be able to deliver against some of that. But again, like I mentioned, it's an open system.
spk_0
So if you want to bring Langchain in, you can, I think there's lots of organizations who have some
spk_0
level of capabilities that they've already built or been able to deliver. And so our intent has
spk_0
always been can the platform support an open ecosystem so that organizations can bring in
spk_0
the right capabilities that they need to be able to do and support those. But we are thinking
spk_0
about how do we help organizations automate some of that? How much can we create as a result?
spk_0
There'll be some big exciting announcements that come out at our summit in June of this year
spk_0
in San Francisco. So if you can't attend in person, I highly recommend that you join us virtually
spk_0
because there will be, I think this will be a big space of announcements of innovation of what
spk_0
the teams have been working on over the past year. And it comes up for open source versus close source.
spk_0
So you know, you can use OpenAI, you can use Lama or any of the open source models. So where do you
spk_0
see the future is do you think it's going to be open source or is it going to be close source or
spk_0
something else? I think that sort of ties back I think to the regulatory legislative requirements
spk_0
that we're going to see. I do think there will be some level of proprietary models that solve very
spk_0
niche sort of problems or issues as part of a broader compound or agentic system. But I think
spk_0
lots of organizations are starting to really think about if I have to do something like explain
spk_0
to the regulators what this model is, what weightings were used, what data went into it, etc. I think I
spk_0
see more and more organizations trying to figure out how much of that can they make open so that they
spk_0
have more control and more visibility and explainability into it. But there are definitely proprietary
spk_0
models that are able to deliver efficiently or effectively. So I think provided that they're super
spk_0
clear on helping organizations be able to explain to the regulators specifically what part of the
spk_0
compound system that proprietary model is looking to solve for. I think we'll continue to see a
spk_0
combination of both. But I think that's the power of something like the Databricks platform. Can
spk_0
you tie into a compound system? Can you use components of a proprietary model and an open model
spk_0
to be able to solve for the business problem or output that you're looking to solve for?
spk_0
And when you see enterprises on their journey to the AI adoption, what do you think is a common
spk_0
call it, common myth, commonly what they get it wrong and what is the biggest barrier? I mean,
spk_0
so in general what stops them or what holds them back from adopting AI for their business processes.
spk_0
For almost every organization it turns into a people process issue and my observation,
spk_0
I said, so either you don't have the right data talent that's been educated and enabled on sort
spk_0
of the capabilities of the technology or the platform or you haven't thought beyond just
spk_0
enabling your data personas to the business users that actually have to leverage the systems or
spk_0
tools. And so for almost every organization we've talked about digital transformation or data
spk_0
transformation for decades. Now I think it's almost this new era of people are super afraid of what
spk_0
they don't understand. And so for many organizations, their biggest tenderances, how do we not just do
spk_0
this migration of legacy sort of technical debt or process debt or business operational debt?
spk_0
How are we not do you? Right, we don't want to bring that technical debt into a new format.
spk_0
So for example, when cloud first came out, I remember lots of organizations just doing almost
spk_0
a lift-inship from on-prem to the cloud. The problem is they never thought about modernization. So
spk_0
right, so could you optimize the way that that warehouse was constructed to take advantage of the
spk_0
cloud and not just bring current construct from on-prem, you're right, into a different infrastructure.
spk_0
I think we're seeing that same thing with AI now. Yes, the business process works in these steps
spk_0
of ABC. So they're not thinking about now how do we change business process to be Y and Z, right?
spk_0
And really rethinking internal processes to take advantage of the technical capabilities.
spk_0
And I think for many organizations that's hindering sort of their ability to innovate as fast as they
spk_0
want because all you're doing is bringing your business or process or enablement debt from one
spk_0
format or one technology to another. And so we really have to think about how do we bring an
spk_0
organization along on that journey to be able to say what could you do if you weren't having to do
spk_0
that manual task of rationalizing 100 Excel spreadsheets every week to be able to report on a revenue?
spk_0
Right? And so what are the things that you never have time to do? And how do we think about enabling
spk_0
technology to get rid of some of that stuff? And people are really afraid of, do I have a job at the
spk_0
end of it? How much is going to be automated? What is AI going to displace or replace part of
spk_0
man? So I think it's no different than the industrial revolution where machines started to take over
spk_0
manual processes that people were doing. We have to take people along on that journey to say,
spk_0
how are we going to enable you? Because you have the domain knowledge, the understanding of the
spk_0
processes, the understanding of the business, the understanding of our customers or patients or
spk_0
clients, whatever that might be. And so I think for most organizations, it is that unlocking the
spk_0
domain expertise of the organization and making sure that the technology is an enabler, not something
spk_0
to be feared by the organization. Yeah, so that's great. So Robin, you mentioned about job and that was
spk_0
the next, I would use this as a segue to talk about society. So at the end of the day, we have
spk_0
so we live in this society, our jobs, our physical health, mental health, how we work, where we work,
spk_0
all of that is also important to us. So how do you feel about or where do you see this things are
spk_0
going to be? I mean, no one knows exactly what is going to happen, but how do you see AI adoption
spk_0
actually shaping the future of workforce? Yeah, so like I said, I do see for many organizations
spk_0
that I work with, it is actually optimizing our improving productivity. I think for many organizations,
spk_0
though, it comes back to that culture, like how do you make sure that the organization understands
spk_0
the value proposition of that product, to the increase, it's always so amusing. I think when people
spk_0
say, oh, well, we save 15 minutes of all 20,000 employees at the company and it's like, okay,
spk_0
so how did you translate that into something else? Like, what were they able to do now as a result of
spk_0
saving the 15 minutes a day? Did you now say, okay, now there's the opportunity for you to deliver
spk_0
against the thought leadership or the 10X project that you haven't had time to do? And so
spk_0
think lots of organizations were sort of missing that step of, oh, now we have to actually
spk_0
help the organization understand because otherwise they just see pieces of their current role or
spk_0
function slowly slipping away as we start to automate or leverage AI to be able to do that. And so
spk_0
I think those organizations that are doing it really well are taking the company, the entire
spk_0
company and thinking about enablement. You're right, absolutely, you know, organizations like yours
spk_0
that are helping us really bring up the data science capabilities and the next level of data
spk_0
scientists that will have. But how are we thinking about the business user in finance department who's
spk_0
been doing that same role or function for the last 25 years? That domain knowledge is invaluable,
spk_0
all right, and being able to translate. And so how do we sit with them and understand like, what are
spk_0
their pain points and show them that value? And I think sometimes we miss that opportunity. And so
spk_0
I think those organizations that are going to be able to innovate and truly transform themselves
spk_0
in a way that, you know, at a pace that they want to, they're going to have to think about every
spk_0
persona across the organization and how do they create a way that allows them to go on that
spk_0
transformation journey? That's a very interesting point. So we work with companies as well. And then
spk_0
one of the barriers that we have seen is internally is reluctant, because, you know, some of the workers
spk_0
they think that they are going to be replaced. Do you see that? I mean, have you heard of this barrier
spk_0
to adoption that workers intentionally do not want to adopt AI? Yeah, it happens at almost every
spk_0
organization. You're going to have some persona, right, that just can't see the future. They can't
spk_0
see the art of the possible. They can't see what their job would look like. And maybe that is because
spk_0
a majority of their current function could be automated or we could leverage AI and such a capability.
spk_0
And so for those, I really think about, you know, how do we think about pivoting? So for example,
spk_0
I do think data is probably the best space in the world for us to bring diverse perspectives,
spk_0
diverse point of views. And that requires us thinking about how do we enable somebody that has
spk_0
domain area of expertise or industry knowledge? How do we think about, you know, giving them the
spk_0
tools, the capabilities to be able to do things differently so that they can rethink their job,
spk_0
or what that would look like as a result of the knowledge that they have about the industry,
spk_0
or the knowledge that they have about the process, et cetera. And I would love to say they'll like
spk_0
that's an instantaneous thing. But if you think about it, I worked on digital transformation at
spk_0
Microsoft for 15 years and still left. And they were still transforming. They're still
spk_0
transferring today after I look. And so for these organizations, I think it's just going to be this
spk_0
instant thing. I would say it's all about those quick wins that being able to execute, but it's
spk_0
also about giving your organization and the people across your organization, particularly those that
spk_0
are worried about what does their job of tomorrow look like. And to think now about what's the
spk_0
enablement that you can give them, what's the training that you can give them? Where do you have some
spk_0
technical exuberance or desire to learn and you know, great new capabilities? How do you grow
spk_0
and foster that today? And then how do you rethink those that don't necessarily have the same
spk_0
technical background or technical aptitude? What does a future for them look like? And how do you
spk_0
recreate their function? And it's a, you're right, it's a long process to take them on that journey.
spk_0
And so making sure that you're investing in the employees where it makes sense to take them
spk_0
along in that journey with you. And as a human, we know what these tools and technologies they are
spk_0
capable of. And we also know the limitations. So speaking of large language models, they are
spk_0
built on data sets that are inherently biased. And bias comes from, I mean, these companies
spk_0
that gather data from data that has been generated by humans and humans are inherently, I mean,
spk_0
we have our own biases. So does this as a human, does this worry you, your optimistic that eventually
spk_0
there is going to be some self correction that is going to happen? So that's the first part of my
spk_0
question in terms of bias. And in general, over the lines of humans on these tools, I mean, does this
spk_0
worry you? Because I hear all sorts of options. Some people, you know, they are worried. Some people
spk_0
they say, no, I mean, I'm optimistic when it comes to technology. I would love to take like
spk_0
made more of a Robin Sutara, not the field CDO or a data breaks, but as a human, I mean, how do you
spk_0
feel about this? Yeah, I think anyone who has never read the book Invisible Women by Caroline
spk_0
Criado Perez, I think it's a fascinating read on the impact that bias data can have on everything
spk_0
from city planning to job definition to how sea belts and cars are decided, right? They're all
spk_0
designed for those things are designed based on data that is essentially the average man, which
spk_0
is 5.8, 160 pounds. If you look at society, there are very few average men, right? There are very
spk_0
few men that are only 5.8, right? And 160 pounds. And so it was such a fascinating book to sort of
spk_0
read through to say, hey, we are leaving out a majority of society if we inherently depend on
spk_0
just limiting ourselves on the data sets that we have. And that's why I think that there has to be
spk_0
some level of introduction across the data teams and, you know, the data product teams that are
spk_0
being developed. How do you make sure that you have diverse representation on that team? Because
spk_0
you're right, we all have inherent biases. So if I only create a team of all women or I only
spk_0
create a team of all veterans or I only create a team of only Americans, I think I will be very
spk_0
inherently biased then on the products or services, whether they be data or AI, that I'm creating
spk_0
as a company to be able to deliver to society. And so how do we make sure that you are creating
spk_0
organizational teams and structures that give you representation? Because it's going to be
spk_0
somebody in that room that says, hey, wait a minute. Like if we use that, we're missing out on
spk_0
this perspective from Bingo Law, right? Or, you know, somebody that didn't graduate university,
spk_0
whether it's socioeconomic or cultural or whatever it might be. And so I really think when we talk
spk_0
human in the loop, it has to be a diverse team of humans in the loop. Like how do we really think
spk_0
about setting up our data teams, our organizational structure, our enablement plans, making sure that
spk_0
we have diverse representation across all of those. Because data in and of itself is inherently
spk_0
biased, right? It was created with biases in mind. And unless you have somebody on the team that's
spk_0
going to help you recognize where those biases might exist, you might do something like plan cities
spk_0
that don't take into account people that don't own cars. So you're now creating a socio-economic
spk_0
bias for those that can't have to walk to work or take public transport, right? Or things like the
spk_0
snow, there's an example in that book about clearing snow and they didn't actually, they only
spk_0
cleared the roads and not the sidewalks. So now you're essentially creating based on data,
spk_0
right? Algar, the AI models that were created to prioritize snow clearance. So essentially,
spk_0
you now said anybody that took public transportation that can't afford to drive themselves to work
spk_0
is now putting on a disadvantage because they're unable to get to work as a result. Because of the way
spk_0
you prioritize that snow clearing, like things like that you just don't think about, right? When
spk_0
you prioritize getting the roads cleared over the sidewalks to get people to work, etc. And so
spk_0
just really fascinating on if we understand those biases in the first place and somebody can point
spk_0
them out, how can you then start being mindful and planful and taking those things into account
spk_0
for the products that you're going to deliver? And you see it in healthcare, you see it in
spk_0
particularly, I think there's some great examples in there. I've done some work with the women
spk_0
and data around women's safety, sort of, you know, how do we leverage data to help in the women's
spk_0
safety arena, etc. And so I just think there's so much opportunity as a society. I think data
spk_0
is phenomenal. We can solve some amazing problems with data, any AI, but it does require us to be
spk_0
deliberate about creating diverse systems, meaning not just the data being diverse, but the people
spk_0
being diverse that are working on the data products and services to make sure that we can deliver.
spk_0
Yeah. So you remind me with this diversity aspect of it. So I mean,
spk_0
mitigating bias can be a tough problem. I'm not sure if you remember one of the recent Google
spk_0
Gemini releases, you know, they were trying to mitigate bias, right? So show me a room full of CEOs,
spk_0
right? So all white men showing up in the meeting room, show me the founding fathers. I mean,
spk_0
and then DEI can actually sometimes overcorrect things. And now you're showing a mix of all
spk_0
colors and races. For the right reasons, we are trying to mitigate the bias, but in some cases,
spk_0
there's some historical factual correctness that you have to worry about, right? So it can be
spk_0
actually very tough, right? So because in heritin models, they learn because of bias in data,
spk_0
right? So fundamentally, they learn because of some what we call signal, I mean, technically,
spk_0
it is biased, right? So it's a fascinating problem and it's a very complex problem to solve,
spk_0
actually. And but if you would think about it, I mean, if the Gemini team had had enough diverse
spk_0
teams, right? Working on that version of the release, would they have caught that
spk_0
mitigating bias before it went public? And all of a sudden, they were, you know, creating
spk_0
this, you know, incremental racial backgrounds of the founding fathers, potentially.
spk_0
It's definitely a balance. And I think where there is a huge amount of risk that we won't be
spk_0
able to mitigate all bias, but it does work. This is why I don't think we will ever get to not
spk_0
having some level of human intervention in these things. Because, right, right, it requires that,
spk_0
it requires people to use this system. And this is why I think for organizations,
spk_0
making sure that you're enabling those people that have that domain expertise of your business,
spk_0
of your processes, et cetera. If you think you're going to displace them, you are essentially
spk_0
introducing incremental bias into your company, into your AI solutions that you're creating,
spk_0
because you're getting rid of the people that have the domain area of expertise to say,
spk_0
oh, wait a minute, that's not right, right? Or that's not how we would expect that result to be,
spk_0
et cetera. And so I really think, you know, whether it's organizational or societal or however it is,
spk_0
we have to think about what is that feedback loop? How do we make sure that we're allowing people
spk_0
to give the feedback so that we can constantly work on optimizing and improving the system? Because
spk_0
just inherently having, it's always interesting, knowing my 20-something kid, you know, says, well,
spk_0
Chachipiti said, this happened in the 80s, and I was like, uh, baby, I was there in the 80s,
spk_0
I can tell you for sure that did not happen that way, right? And so like you said, they just have this,
spk_0
right, there are people that just have this complete and absolute confidence in the results that
spk_0
they're getting out of the system. We have to make sure that we're creating, you know, a structure
spk_0
that allows people to say, that's not, that's not right, right? My knowledge, my expertise, my
spk_0
insight, and I want to have a team and an organization that can bring that point of view,
spk_0
because I only have a limited narrow based on my life experiences, my, you know, my upbringing,
spk_0
my education, et cetera. And so how do we make sure that we're creating organizations and teams
spk_0
and structures to be able to support that people in the part of it? And most importantly,
spk_0
having the right guardrails on these systems, right? So I think that is very important. So
spk_0
we're coming to a close. Let's actually just wrap up quickly. So in the next phase, I will quickly
spk_0
ask you, you know, some rapid-pired questions, and then you will answer, short answers. I mean,
spk_0
elaborate, if you like short answers would be fun, right? So if resources were limited,
spk_0
which would you address first in AI bias medication or improving models, correctness of performance?
spk_0
Bias mitigation, absolutely. Yeah. In terms of different industries, they are going to be
spk_0
disrupted by AI in different manners, right? Which one do you think is going to be disrupted? What
spk_0
use cases in which industry do you think will be disrupted as a result of the AI revolution that
spk_0
you are seeing? I think ultimately every industry will be disrupted at some point. I think right now,
spk_0
anything that's sort of professional services, anything that has a dependency, I think, just on
spk_0
aggregation of knowledge into a strategy capability, et cetera. So I think professional services
spk_0
immediately, and at some point all industries are going to be disrupted.
spk_0
So without naming names, I mean, no more of those consulting and services companies. Is that
spk_0
what you're telling me? I think they'll still be around, but I'll have to, I think they'll have to,
spk_0
you know, innovate net new business models to be able to. So no more business and power points
spk_0
only. Exactly. Yeah. In terms of jobs, will AI result in job elimination, job creation,
spk_0
a job displacement? Job evolution. Okay. Can you elaborate? I mean, I know I asked you it for
spk_0
short answers, but when you say job evolution, what do you mean? I think for most jobs or roles,
spk_0
I think there is the capability to think about how does that role evolve or change as opposed to
spk_0
being completely replaced or displaced by AI. And so again, bringing that domain knowledge and
spk_0
understanding that those employees have in that current role, there's value there,
spk_0
particularly as we think about bias mitigation, et cetera. And so how do we make sure that we're
spk_0
helping them evolve their roles and their functions as opposed to displacing or replacing them with AI?
spk_0
And what is the biggest challenge to enterprise adoption of AI? Is it technology? Is it skill
spk_0
gap? Is it regulations? Is it culture? Something else? I think it's still primarily culture, right? I
spk_0
think it's helping people see the power and the value and then taking them along on that change
spk_0
journey. Okay. In terms of open source versus close source models, which one is going to be the
spk_0
inner enterprise AI? I think most organizations are going to use a combination of both, but I do
spk_0
see more open models becoming more prevalent as opposed to close models for regulatory and legislative
spk_0
requirements of explainability and transparency. Okay. Is it time for me to sell my stocks for
spk_0
open set like opening? I don't think it's a great idea, right? So you should not, uh,
spk_0
they'll continue to solve amazingly, you know, big complex problems. We'll definitely still need
spk_0
proprietary, but I think for most organizations, you don't need that kind of power to solve
spk_0
every business problem. And so I think we'll start to see more and more, um, open,
spk_0
sharing of models and capabilities across the ecosystem. And perhaps an extension of the same
spk_0
question, light language models or domain specific small language models, which one do you see being
spk_0
used? Again, it's going to be a combination of both depending on the use case. We are seeing more
spk_0
and more domain specific, I think, uh, models being created right now. Because large language models
spk_0
have been around much, much longer, uh, and organizations have figured out the limitations on what
spk_0
business problems can and can't be sold with those. And so we're definitely seeing an uptick in,
spk_0
more domain specific business models being created right now. Okay. And my last one here,
spk_0
if you have to mention one book or paper or a thought leader or a talk that I should go and watch,
spk_0
to understand how this revolution is going to unfold. And really, if I want to understand what is
spk_0
going on, what would that book or paper or talk or a thought leader be? I think there are so many
spk_0
phenomenal books that are being created because the space is moving so, so quickly. For me, I,
spk_0
yeah, right, I enjoy reading things like CDO Magazine to sort of get the top, the top issue is
spk_0
facing executives today. I enjoy things like the Data Chief podcast and Databricks has some
spk_0
phenomenal blogs. I think that continue to come out not just talking about the technology and the
spk_0
platform, but also how our organizations leveraging those platforms and capabilities and sort of
spk_0
the real business value that they're being able to drive as a result. And for many organizations,
spk_0
that problem that you're trying to solve for is probably not unique to you. And so looking across
spk_0
industries, how do you take something from retail supply chain and provide it to a form,
spk_0
you're right, a former company to help solve for supply issues, etc. I think we're going to see a
spk_0
lot more knowledge sharing some of these best practices so that we can continue to push the pace
spk_0
and innovation. And my last question, this is not a rapid fire question, just a closing thought.
spk_0
What are you excited about as a human and as a technologist when you look at everything that
spk_0
is happening around us? I think what I'm most excited about is the accessibility of technology
spk_0
is now no longer limited to just technologists. As I mentioned, my last coding language was
spk_0
4-tram, right? So it's not super helpful these days. I can get by and SQL enough, but what I love
spk_0
is the fact that even my parents and grandparents have access to the information that's typically
spk_0
they can write Python now. So if I think about the
spk_0
power now that that represents and sort of the impact I think that we can have on society,
spk_0
I think I'm super excited to see what that uncover for us. Well, thank you so much Robin for
spk_0
your time because of the pleasure having you. Thank you so much. I really appreciate. Thank you.