#142 Bayesian Trees & Deep Learning for Optimization & Big Data, with Gabriel Stechschulte

Interactive Transcript

spk_0 My guest today is Gabrielle Stechscholzzi, a software engineer passionate about probabilistic

spk_0 programming and optimization. Gabrielle recently re-implemented BART, Bayesian additive regression

spk_0 trees, in rest, making the algorithms faster, more flexible and more suitable for real-world

spk_0 applications. So if you are a PIMC BART user, I definitely recommend checking out his implementation

spk_0 that is in the show notes. In our conversation, we dive deep into what makes BART special. Its

spk_0 ability to quantify uncertainty, handle different likelihoods, and serve as a strong baseline

spk_0 in settings like optimization and time series. We also explain how BART compares with

spk_0 Gaussian processes and other tree-based methods, and talk about practical challenges like handling

spk_0 missing data, integrating BART into PIMC and embedding machine learning models into decision-making

spk_0 frameworks. Beyond the code, Gabrielle reflects on open source collaboration, the importance of

spk_0 community support, and where probabilistic programming is headed next. This is Learning Vision

spk_0 Statistics. Episode 142, recorded September 18, 2025.

spk_0 Welcome to Learning Vision Statistics, a podcast about patient inference, the methods,

spk_0 the projects, and the people who make it possible. I'm your host, Alex Sandora. You can follow me on

spk_0 Twitter at AlexUnderScoreAndora, like the country, for any info about the show. LearnBateStats.com

spk_0 is a lab class to be. Show notes becoming a corporate sponsor, a looking Bayesian merge,

spk_0 supporting the show on Patreon, everything is in there. Let's learnBaseStats.com. If you're

spk_0 interested in one-on-one mentorship, online courses, or statistical consulting, feel free to

spk_0 reach out and book a call at topmate.io-slash-AlexUnderScore. Andora, see you around, folks,

spk_0 and best Asian wishes to you all. And if today's discussion sparked ideas for your business,

spk_0 well, our team at PIMC Labs can help bring them to life. Check us out at PIMC-Labs.com.

spk_0 Gabrielle, check shortly. Welcome to Learning Vision Statistics. And I think I butchered your name

spk_0 D-9. No, no, it was quite good, yes. Okay, okay. It was Bayesian. I rehearsed it and that didn't work.

spk_0 But yeah, thanks a lot for being on the show. I've been meaning to have you here for a while

spk_0 because you do a lot of very interesting things. We know each other for the PIMC world, but that's

spk_0 the first time we actually meet almost in person. So that's great. I'm very happy you're here.

spk_0 Thanks a lot for taking the time. As usual, let's start with your background and your origin story.

spk_0 Can you tell us what you're doing nowadays and how did you end up doing that?

spk_0 Yeah, yeah, for sure. And thanks for having me on. And maybe next time we can be in person

spk_0 some hiking and the mountains. So my background, so originally, so currently I'm in an internet

spk_0 of things lab and IoT lab at the Lutzander University of Applied Sciences and Arts.

spk_0 Within the lab, I'm doing various modeling of engineering processes, but I wasn't always doing

spk_0 that. So I originally studied economics back in the US. And there we were primarily doing econometrics

spk_0 and so like frequentist based statistics. And then from there, I moved to Switzerland to do my

spk_0 masters and do with my girlfriend. And then in my masters, continued in data science. But that's

spk_0 really kind of when I started getting involved in probabilistic programming, invasion, statistics,

spk_0 in particular. And so after graduating, I just instantly immediately started working in the lab

spk_0 at the university. Okay, so so pretty kind of a random random road, right? Like this is rare

spk_0 that people end up in Switzerland, especially when coming from the US or where is my prior wrong?

spk_0 No, no, yeah, it's yeah, a bit odd. Yeah. And so what do you mean by an internet of things lab?

spk_0 I absolutely don't know what that is. Yeah, so pretty much any like a lot of things to do with

spk_0 you an activity of hardware. And so when you're when that hardware is connected to the internet,

spk_0 to provide some sort of connectivity. And then that's kind of what you can kind of think of

spk_0 internet of things. And so like all of your things labeled as like smart devices as kind of

spk_0 under that umbrella term, IOT. And so now everything is being today's being like is being IOTified. And so

spk_0 you have your disk washers connected to the internet, coffee machines and so forth. And so that's

spk_0 kind of really what IOT generally means. Okay, okay, so this is that sounds very algorithmic and

spk_0 deep learning does it? So I mean, it's pretty so with our group, we have a very widespread

spk_0 of knowledge of within our group, you have people that are really specialists and networking

spk_0 and hardware. And then you have and then like stored data storage and processing. And then maybe

spk_0 like me more on the machine learning or data analysis side. And so for me, it's how do you analyze

spk_0 the data coming from various sorts of machines, whether that's manufacturing machines and so forth?

spk_0 Okay, okay. And how did you end up doing that? Because that's not what you studied, right? So

spk_0 how? Yeah. How did that happen? Yeah. I think it kind of came from so back to my bachelor's,

spk_0 any kind of metrics, doing a lot of time series stuff. And then you can kind of think in IOT,

spk_0 it's also a lot of time series stuff as well. Because when you when you have these sensors

spk_0 hooked up to the machines, you're also logging time series, like every, depending on the frequency,

spk_0 like 10 hertz, 50 hertz, you're logging a measurement every second or or 60 measurements every

spk_0 second. And so with that, you get a really nice stream of time series data. And I don't know how I

spk_0 exactly got kind of like brought into the IOT field specifically, but it was kind of like

spk_0 stem from, oh, hey, do you know various time series methods like like a seasonal or moving average

spk_0 and state space models and so forth? And it's like, okay, yeah, some of that can be translated from

spk_0 econometrics over into this IOT row. And so it's kind of a bit of that kind of gradual shift.

spk_0 Okay, okay, that makes it. So a lot of time series, state space models,

spk_0 Gaussian processes, I'm guessing, or at least I'm hoping. Yeah, yeah.

spk_0 How did you end up working on Bay's stats in particular? Do you remember when you were first

spk_0 introduced to the Ant and how often do you use the Ant in your current work?

spk_0 Yeah, so I was first and it goes back to my bachelor's when I was doing the first couple statistics

spk_0 courses. And I just remembered like when doing some very basic regression models. And then

spk_0 in that course, they're like, oh, yeah, you reject the non-hapothesis because of the p-value.

spk_0 And I'm just like sitting there, I'm like, yeah, but who and why? Why is it 0.05? And who came up with

spk_0 that kind of like arbitrary metric? And it just kind of like was always unsatisfying to me like,

spk_0 this is kind of you follow this kind of like strict rule and diagram. And you kind of,

spk_0 oh, okay, yeah, you do this or you don't do that. And that was always kind of unsettling to me.

spk_0 And so from there, it was kind of like a more of a self discovery because there they never taught

spk_0 Bay's in my underground Bay's statistics. And so it's kind of like, okay, what else is out there?

spk_0 And that's kind of that's where I came across Richard McElrister statistical rethinking and then

spk_0 Andrew Gellman's Bayesian data analysis. And so that's kind of how I got more introduced into

spk_0 Bayesian statistics. But then in regards to how often I use it, it's pretty much

spk_0 every day. So almost every project that I've done in this lab has to do with probabilistic modeling

spk_0 in some form or another. And why? Why is that? So how how come Bayesian stance seem very interesting

spk_0 and important to your work and what what do they bring that you can't have with the frequency

spk_0 framework? Yeah. So the big thing I see in like with sensors in the IoT in general is

spk_0 particularly the problems that I'm solving is first off, you have a lot of sensor noise. So these

spk_0 sensors in the processes that they're measuring aren't perfect. And so for example, like if you're

spk_0 having much of a sensor going to manufacturing machine, the speeds that these sensors are logging

spk_0 aren't going to be necessarily exact. They could fluctuate a little bit. And then not only that,

spk_0 the process that it's measuring is always perfect. So if and so I look at that, am I okay, actually

spk_0 probabilistic programming is a really good kind of fit here because we can start to begin to model

spk_0 the uncertainty, some of the sensor noise in the process. And then in the kind of

spk_0 manufacturing process itself. And so being able to like quantify the uncertainty there is

spk_0 very powerful because it kind of lets you account for that for some of the noise in the process

spk_0 and in the measurements. But at the same time, it kind of it's really also difficult because you

spk_0 can imagine that we're some of these settings are logging a lot of data. And so traditionally,

spk_0 basions that computational methods aren't very good with big data. And so it's kind of like they

spk_0 often see kind of in my day to day, like you kind of have this friction between the big data

spk_0 and base. And so that's also kind of maybe we can talk about in a little bit, but you have that

spk_0 kind of friction. Yeah, yeah, exactly. That's where I was going. And that's where my my

spk_0 astonishment comes from. Yeah, so I actually do want to do when I talk about that. Yeah, how

spk_0 how do you manage to combine both like these need for uncertainty quantification and also

spk_0 intuitive uncertainty interpretation. But at the same time, also need to run the models. I don't

spk_0 have frequent you guys need to run the inference, but you have a lot of data. Yeah, that can be a

spk_0 bottleneck. So how do you how do you thread that needle? So I'd say there's kind of three general

spk_0 not really approaches, but maybe techniques that we do. And so the first one is probably what

spk_0 everyone can think of is you kind of have your raw data and then you perform just some aggregation

spk_0 on top, some sort of resampling to kind of reduce the size of the data. And then you just continue

spk_0 to apply your general maybe MCMC on on that. But then the second one is other inference methods,

spk_0 like variational difference. We've seen to be a very good fit because we have a lot of data,

spk_0 but we then with variational inference methods, we can apply that to that data because a lot of times

spk_0 you need like some sort of approximating strategy. And since we have a lot of data, we can come up

spk_0 with a nice sampling scheme to then use within the variational inference method. And then the last

spk_0 one is, yeah, luckily we do have some hardware at our lab that we can just throw GPUs at the problem.

spk_0 So we can use like maybe a lot of times like jets, so numpyro, pyro, and use these more traditional

spk_0 deep learning frameworks for GPU acceleration. Yeah, yeah, yeah, yeah. So that less approach is

spk_0 quite nice also because you don't have to think too much, right? Like a numpyro or PIMC model can

spk_0 just run out of the box on Jax and you get GPU acceleration and you don't have to do anything else.

spk_0 So I would say if you have the, yeah, if you have the computer available, when I do that,

spk_0 especially if you have to come up with a customized variational inference scheme, that's the biggest

spk_0 data is much more, much more intricate. And I'm curious what your experience with the different

spk_0 VI algorithm, maybe if you can give a layup of the land to the listeners, where do you see

spk_0 these methods and the different algorithms being useful or not and what your practical recommendations

spk_0 are in a way. Yeah, so in regards to that, I've just mainly used kind of these the standard

spk_0 implementations that numpyro offers. So like with their auto guides and their mean field and so

spk_0 forth. And so when I, when I look, when we were using those as primarily for like high article models,

spk_0 and we found that, or I found that out of the box, they worked quite well. So I had to kind of

spk_0 go off into some other tangents to figure out which ones work or not.

spk_0 Yeah, yeah, yeah, for sure. I know, I know we are making a lot of effort also in the PIMC

spk_0 to have more more VI. So there is a lot of a lot of effort made with with the Google Summer of

spk_0 Golden, improving out of the box VI. So a TVI also, Jessica Bowski did a lot of work on them.

spk_0 That plus approximation in PIMC extracts that you can use now with in in conjunction with

spk_0 Feet map where you can use the map estimate to initiate the Laplace approach approximation.

spk_0 There is also there the pathfinder algorithm that's already available in PIMC extracts.

spk_0 We will have a dedicated episode about that with Michael Cow who who developed the

spk_0 the pathfinder module in PIMC extracts. People stay tuned for these. And I think I'm forgetting even

spk_0 one VI method that we are adding right now, but maybe that will come back to me. But there is

spk_0 yeah, there is activity relevant on the PIMC sign too. And I feel this is really great because

spk_0 yeah, I think like we've been collecting in the in the last few years to improve that as a

spk_0 as a community or in case to make people more aware of these these different algorithms and

spk_0 and they can come in very handy. Yeah, yeah. Which who who was a kind of pioneer that

spk_0 is at least from from an outsider kind of seems a bit like pyro did stand also do some.

spk_0 Yeah, yeah, yeah, Stan has a lot of that pathfinder was developed by Bob Carpenter and his team

spk_0 actually at the beginning he was developed as an initialization method for Nets, but they realized

spk_0 that actually the the results in themselves were were really good. And so so they just released it

spk_0 also as a separate outcome. But something you can definitely do also is initialize Nets with

spk_0 the pathfinder results, which which can be very useful. Yeah, Mike, there is that. So Stan has a

spk_0 lot. Numpiro has a lot. We've had the advi module in PIMC for for a long time now. Now it's getting

spk_0 a bit more love and we have we have a lot less we have we have the pathfinder as I just said

spk_0 and I am sure I'm forgetting one method, but you know that's what we're going back to. Yeah,

spk_0 and actually if you want if you folks want an introduction to these in the different methods,

spk_0 there is a talk talk I co-wrote with Chris Fonsbach and actually Michael Gao who was talking about

spk_0 a few minutes ago for a pilot everginae and so I will put the the YouTube video

spk_0 and in the show notes and also the Github repo and Chris was the one I was supposed also to

spk_0 to flight over Ginae, but didn't find any affordable flights. So I had to I had to to teach Chris and

spk_0 he was the the sole presenter. I think he he doesn't hate me. He loves presenting and he's

spk_0 so good at that. Yeah, so yeah, yeah, so the video is awesome. He obviously did a great job

spk_0 presenting the material. So I will put that in the show notes folks because that's a very good

spk_0 lay of the land basically of what you can do with the eye what the different algorithms are

spk_0 and then keep an eye out. I think one also recently gave a talk in Piedetta Berlin. Yeah, exactly.

spk_0 That's what I was going to say. So it's not the videos I'm not really seeing it as the time of

spk_0 recording and I don't think they will anytime soon. I think they they usually take about two

spk_0 months to release them. So folks keep an eye out on the on the Piedetta Berlin YouTube channel

spk_0 and then watch Juan Ordus talk there where he basically builds up on our presentation at Piedetta

spk_0 Berlin at Piedetta, Virginia and then builds on then on top of that and shows practical implementation

spk_0 of the eye, especially as vi with an empire with really good practical advice. So yeah, really

spk_0 we're coming back to this is a very good one. And actually I'm curious if there is anything you

spk_0 do in particular Gabriel when you when you use vi to try and make sure that the results

spk_0 are getting back are reasonably close to the posterior because we have these these guarantees

spk_0 with nets with MCMC but we don't with vi algorithms. And so usually something I do is

spk_0 trying the model and fake data and make sure they can recover the parameters of interest in

spk_0 a reasonably close range or for the parameters they can't try and see if there is a pattern in the

spk_0 bias. And at least we know there is a bias in the model and we're and that's really very helpful

spk_0 because well if you can at least get a model running with vi even though you know there is a small

spk_0 bias. I would argue this is already better than having no model because you only want to do MCMC.

spk_0 So yeah, but so basically I think this is something useful but I am sure you're doing much better

spk_0 things than than because you have more experience than me on that front. No, not not front, not really

spk_0 because I also kind of take the same approach there like before kind of scaling out to like maybe

spk_0 a more complex model on our problem in industry. Like I try to simulate what that data or engineering

spk_0 process looks like on yeah on simulated data and then see just pretty much exactly what you said

spk_0 kind of a cow well the algorithm is kind of recovering the parameter or the posterior and is

spk_0 able to actually model the problem a hand. Yeah, it's funny. Yeah, that's pretty much kind of what I do

spk_0 as well. I've found that I do. Okay, cool. So I'm not I'm not doing something obviously stop it

spk_0 that that's good that that is rare. So anything you want to add about about vi things you've noticed

spk_0 in the wide that the word particularly well or particularly bad before we move on to some other

spk_0 topic. So yeah, something I really want to talk to you about is that something you've worked on

spk_0 for a long time like this has this was really a masterpiece. So I thank you first for for doing

spk_0 that and this is your your re-implantation of Bart patient and additive regression trees in rest.

spk_0 So people probably know you can do Bart models with point C in a package that's called

spk_0 Pint C Bart and this is not some package I use it whenever I can but it has the defaults of

spk_0 regression trees which is you have if I remember correctly you have as many parameters as you have

spk_0 rose in your desert. So that means it grows pretty fast in the computational demons. So

spk_0 yeah it makes periods when you start passing 200 k observations it starts to be to be

spk_0 resueled to infer. So what you did is re-implement the sampling algorithms which is Metropolis Gibbs

spk_0 I think or something like that what what algorithm is that like a particle Gibbs. Yes, that's

spk_0 particle Gibbs. Yeah. So particle Gibbs and you re-implement that in rest. So yeah, can you can you

spk_0 talk about that deck basically why why would you start doing that and basically give us the

spk_0 the elevator pitch for the project before we don't have it be deeper. Yeah, so the the Pint C Bart

spk_0 project does this really kind of I believe comes from it comes from Osvaldo and about a year ago

spk_0 he reached out and said hey we need to make this thing faster are you interested I'm like hey I'm

spk_0 all for it was do it and but before that I really hadn't used Bart or was too familiar with

spk_0 with the the method I mean I was familiar with more like gradient boosting techniques which is

spk_0 somewhat similar but I did have the experience with the rust and so that was kind of a good

spk_0 kind of complement to each other it's like okay I see kind of like maybe what like Adrian

spk_0 say bolts doing with nut pie and rust maybe we can kind of share some of the code that he's been

spk_0 doing and then use that within Pint C Bart to help kind of at least with the log probability

spk_0 evaluations and so forth and so yeah this thing this really stemmed from Osvaldo when wanting to

spk_0 make a more perform it and then me stepping on board and saying hey okay let's yeah we just

spk_0 implement this in rust and then share some of the code base from pie

spk_0 hmm okay okay and so how is the how is the experience like it was was rest all you to you

spk_0 how do you even start on such a huge project yeah so I do have prior experience with rust within

spk_0 some data processing pipelines in the IOT lab so the rust part wasn't entirely new to me

spk_0 but what was new was kind of interropping with Python having Python bindings so that way the

spk_0 Python user when they call the Bart code it calls then it executes the rust implementation

spk_0 but in regards to the implementation process essentially what the approach I kind of took was

spk_0 okay let's implement this kind of essentially one for one from the Python implementation into the

spk_0 rust implementation and then from there we can start to kind of optimize the different whether

spk_0 the different functions or methods and then that way we can get a more of a nice performance

spk_0 improvement instead of kind of just immediately rewriting something then we wouldn't know maybe like

spk_0 okay then now this isn't kind of working right where where did that go wrong and so forth and yeah

spk_0 I don't know if you want to talk about some of like the rust specifics so the algorithm specifics or

spk_0 yeah maybe so it's been a while since we talked about Bart and repression trees on on the show so

spk_0 maybe if you can introduce the methods the tree methods in general you mentioned great

spk_0 boosting we obviously mentioned Bart so maybe give us just the elevator pitch for Bart and tree

spk_0 methods in general and then I think it will be useful to dive into a bit more of the technical

spk_0 details of the algorithm to understand really how the methods work and why people could be

spk_0 interested in using Bart and in which cases yeah so yeah at a high level then like you have these

spk_0 tree based methods and at like the simplest level you have your decision tree and so that's kind of

spk_0 your logic like if this variable is greater than some value you kind of go down the tree and then

spk_0 you finally get to a leaf node and that's kind of like your prediction for a target or your response

spk_0 variable so building up off of the decision tree you can you can have like a random forest which is

spk_0 then like a bunch of those decision trees together and do a forest but then kind of even starting up

spk_0 all on top of that you have gradient like boosting methods and so this these methods are really where

spk_0 you attempt to like lure like you use kind of like the random forest but you learn you learn

spk_0 like the residual between the where the tree the difference in the tree's predictions and so

spk_0 when you start to do that you're kind of like it's kind of like a meta learner you're kind of learning

spk_0 where each tree is doing beta to kind of come up with a better producing or better predicting tree

spk_0 and so this is kind of really more where Bart is aligned with with the gradient boosted

spk_0 methods rather than a random forest because Bart is kind of doing the same thing as boosting

spk_0 these gradient boosted methods the way it's kind of assembling these trees is different

spk_0 and the way that the Bart assembles these trees is by taking random perturbations and then

spk_0 assessing the log likelihood of that tree. Okay yeah yeah so that's closer to gradient boosting

spk_0 way of doing things right yeah and okay so that's the that's the iVager beach now when are these

spk_0 models particularly particularly useful in your experience and why is there strength and drawbacks

spk_0 so one of the strengths I think with with the with so if you want to kind of compare like Bart and

spk_0 like a traditional maybe XG boost or light GBM model one of the big benefits of Bart is that you

spk_0 get the uncertainty quantification you have a posterior over over decision trees and so

spk_0 then with that you can actually kind of you can actually stick that model into maybe other things

spk_0 that you want to use uncertainty for for example like Bayesian optimization traditionally uses

spk_0 Gaussian processes but you can actually you can actually stick this Bart model into the Bayesian

spk_0 optimization routine as well because you also have the uncertainty there but one of the big drawbacks

spk_0 with Bart is that it's famously slow compared to the like XG boost or light GBM and so that's kind

spk_0 of one of the big drawbacks I see there with that method but another nice thing about Bart is that

spk_0 so with like XG boost and light GBM it's very easy to overfit on your data and so you're going to

spk_0 you need to look at a lot of like loss or about loss curves and figure out okay hey when do I

spk_0 stop training when how much how many trees do I use how many learners and so forth to kind of

spk_0 stop the tree stop the tree of training and stop the tree from overfitting but with Bart it's

spk_0 really nice because you can we have regular regularizing techniques so that way we avoid overfitting

spk_0 kind of inherently within the method and so that's one really nice kind of pro I see with with Bart

spk_0 over the others but yeah the big I'd say Khan is that it's significantly slower than the other ones

spk_0 and that's just for multiple reasons so yeah thanks so these these these is much better so and I

spk_0 hope it is too for listeners so now I think it's a good time to dive into why that would be like where

spk_0 the buttocks are in why like what's the algorithm per se and how does it work basically earned

spk_0 are the hood so that people really understand the models when when they use that

spk_0 yep so in regards to like pymc bar we implement as I think we stated before particle Gibbs whereas

spk_0 other implementations might implement like a metropolis has to approach and so with the particle

spk_0 Gibbs steps how will the algorithm works is that so we generate a set of trees so maybe 50 or in

spk_0 in pymc bar you define the number of trees and the number of particles and so for example you

spk_0 might say okay we want 50 trees and then 10 particles and so now we're gonna we're gonna perform

spk_0 a series of particles or particle Gibbs steps and so at the first step we want we're gonna

spk_0 loop through the for all 50 trees so for the first tree we're gonna initialize then text maybe 10

spk_0 particles whatever you define and then those 10 for all those 10 particles which are just decision

spk_0 trees we perturb each one maybe we decide for we sample for a variable a certain split value and

spk_0 another one another split value and then we assess the log likelihood and then at the end we say okay

spk_0 hey this particle maybe particle 5 out of the 10 that's gonna replace the current tree which is 1

spk_0 and then we proceed to the next tree tree number 2 we go to that same process initialize 10 particles

spk_0 perturb each one wait them according to the log likelihood and then replace that tree and we

spk_0 continue until all 50 trees are essentially replaced and then and so yeah that's really kind of

spk_0 at a high level the main algorithmic steps it's really quite a simple process which is quite

spk_0 surprising if you read kind of a lot of these papers yeah and were you already versed that much into

spk_0 into pot and in tree methods before working in that or or did you get that knowledge by working

spk_0 at least project not really so it was mainly knowledge from working on the project and reading

spk_0 the code base that Osvaldo and others did which was quite readable which is really nice nice

spk_0 procedural kind of line by line oh this is what it's doing yeah and so that really kind of helped

spk_0 with the intuitive understanding of oh hey this is what the particle gives us doing this is kind

spk_0 of yeah and I second that yeah the the code base is really really well really well done and written

spk_0 and it's quite easy to to start contributing to the package so this is the this really awesome

spk_0 because I've dabbed a bit with with part for working baseball and I haven't tried yet your

spk_0 rest implementation because that is very useful baseball because there is a lot of use cases for

spk_0 methods like bot but there is so much data that you often need you often need an acceleration somewhere

spk_0 so yeah whether it's using classic pine C part on a GPU or actually using your rest implementation

spk_0 and adding a GPU to put that that should probably be a really good boost to to some fixed speed

spk_0 yeah yeah and I and I must say one of the things that's really nice with the pine C part is that

spk_0 there's several thing like really nice enhancements that we have and so if you go look around

spk_0 online a lot of the other packages are specifically for Gaussian lightlihoods so that's the

spk_0 first one so you can't really model like a post on process or any other the second one is that we

spk_0 also offer various split rules so if you have in your design matrix numerical features and categorical

spk_0 features you can pass split rules specific for that data type and this is something that's common

spk_0 and other packages that it kind of just assumes everything is a numerical value and so that those

spk_0 are kind of the two really nice things I think differentiates kind of our package but then lastly is

spk_0 that like we have the bar kind of random variable and how this is embedded in PNP and PNC and so you

spk_0 can model the linear predictor you can model the sigma the uns that parameter and so that's really

spk_0 nice because you can build essentially arbitrary probabilistic programs with bar whereas other

spk_0 package it's kind of that's the that's you use that method and that is the method that you use

spk_0 yeah yeah exactly this is actually a very good point that you can module you can add that as a

spk_0 module in the PNC model so you could model your linear predictor with a classic linear regression

spk_0 and then your sigma your send adviation you could model that with a bot yeah random variable so

spk_0 this is very useful and yeah and I must say that recently in the current existing Python implementation

spk_0 support has been added for more than one part random variable which is which is really great and

spk_0 has been something that's been requested yeah yeah so you could do like you could do two different

spk_0 bots on two different parameters so this is really awesome in a way that's starting to look a lot like

spk_0 the gp server module gp's you can add them to PNC models really as you want in a new can have

spk_0 different gp's for any number of parameters you want to your model and you really you cannot do that

spk_0 with like gp packages gp focus packages most of gp focus pages is what you can just use it as

spk_0 that's all you're gonna do anything else and often also likelihood likelihood limits in

spk_0 PNC you can use a gp with any likelihood a distribution in most of the packages it's often

spk_0 normal normal likelihood biggest that's often hard good yeah how is it so I know on the bar

spk_0 Python pure Python bar we can use any likelihood we want how is it on the rest side now is

spk_0 I remember at the very beginning you had not included yet categorical multinomial ability to

spk_0 to use that kind of likelihood of course I always use that likelihood so I was like damn gonna

spk_0 choose that yet but yeah how how easy it right now when it comes to the likelihood especially the

spk_0 most multi-dimensional one which always I know much more of a pain to develop

spk_0 yeah so in regards to the current state of the rust implementation there are still some things

spk_0 that are implemented one to one and I'm still working on that but in regards to the likelihood

spk_0 that that's been resolved so you can model multiple multiple different likelihoods but I think

spk_0 the one you were specifically asking was in regards to the different split rules like the

spk_0 categorical and like a continuous split rules and that is also now implemented in the rust

spk_0 implementation but the one thing that's that's not yet is the multiple bar random variables I still

spk_0 and working out some bugs there and so that's still something that's kind of being implemented on

spk_0 our end okay yeah yeah so concretely we can do anything we PNC bot rest that we can with PNC bot

spk_0 except for having two more than one random variable bar trend variable in the PNC bot

spk_0 otherwise everything is on par right now yeah amazing yeah it's cool yeah yeah thanks so Gabriel

spk_0 yeah that means now we'll be able to use that much much more on baseball data this is gonna be

spk_0 super fun um and how do you squeeze that in actually like is it part is it part of your jump or

spk_0 is it something you do on the side and like maybe you have some advice for people who when it's

spk_0 down doing some open source line we do and and they would have some practical advice of how to

spk_0 squeeze that in in the work and and the free time because in the end this is really what research

spk_0 is about right that trying to push the envelope and on very frontier topics which are not only

spk_0 gonna pay off for your project but for your company as a whole and a lot of other people who are

spk_0 not used to have that one yeah and so yeah luckily with this kind of a lot of the stuff I do at work

spk_0 uses these tools and so if if I if like our team and me see kind of like hey it would be really nice

spk_0 if we could speed up bar because it would help our problem network then like doing that doing the

spk_0 open source at work kind of aligns quite nicely but if the problems are kind of aren't really

spk_0 related yeah that's kind of in my own time and so forth but in regards to contributing more generally

spk_0 I honestly the PMC in Bambi community is just I think one of the best in the scientific open

spk_0 source community does everyone is very inviting and willing to help but my advice kind of I'd say

spk_0 to people starting out is don't bite off more than you can chew pick kind of maybe the low hanging

spk_0 through and then work your way up from there I've found to be quite a fairly more safe approach

spk_0 and kind of goes goes better with the maintainers that way yeah yeah yeah yeah that that that

spk_0 if he sounds right and that's what I recommend to also to people who reach out to me maybe one last

spk_0 one last question on on Bart since you use that a lot in your work what's your experience on these

spk_0 models what do you find they are very useful for where do you see their limitations to be

spk_0 yeah so I've used them in two scenarios one of them is it within embedding bar and

spk_0 bashing optimization routine which you just you talk to with max but then the other one is

spk_0 specifically for a time series process that is that I'm going to use my hands but like

spk_0 exhibit kind of like a kind of like a partitioned kind of like blocky that's not really good the

spk_0 time series is continuous they kind of has like this block structure so like from time from point

spk_0 one to point B as a constant value and then the next the next time interval it goes up to another

spk_0 shoots up to another value and then it's constant for a little bit and so this is kind of quite nice

spk_0 because these three methods are essentially kind of like a piecewise linear function and so it's

spk_0 able to model that just kind of inherently quite nicely yeah and so that's just kind of a yeah like a

spk_0 very raw weird time series which is I mean no time series no time series is really continuous but

spk_0 it's like you don't even have enough point for it to look like it's continuous and so at the

spk_0 bottom the discreteness of the tree structure here is a free and asset yeah exactly yeah and you kind

spk_0 of you kind of see this come up in sensor based time series kind of quite often I think if you

spk_0 kind of look at maybe like our profiles over a time you kind of see that kind of like like block

spk_0 looking structure as well and then you can kind of like oh maybe like these tree based methods might

spk_0 work here yeah okay this is very interesting I love it thanks actually two other questions on that

spk_0 well what about the time intervals because a lot a lot of the time having fixed time intervals is

spk_0 much easier to deal with what's your experience here with bot models like does that do all the

spk_0 sensitive to the fact that sometimes the time intervals are not sensitive which I guess might be

spk_0 the case with sensor data related to that what about missing data and what about out of some

spk_0 preparations I know it's a big big question that it's it's a related so so the in regards to the

spk_0 time interval so you're you're saying when the time intervals are unequal like the time between

spk_0 okay yeah so in regards to like the bar or just just tree methods in general I think are very

spk_0 they're very good for interpolating missing values because you can kind of

spk_0 impute that or interpolate that inherently within the tree and so if you have like a sensor

spk_0 that's maybe didn't log a measurement over a certain time period but that all this certain kind

spk_0 of comes back online and then it continues logging with the tree methods you do you do get a nice

spk_0 interpolation there and so you don't really need to do any kind of feature like feature processing

spk_0 beforehand and so it's nice because that's handled inherently within the model.

spk_0 Okay yeah yeah so it's just great that's what I thought but yeah basically when there is no

spk_0 fixed time interval it's like a missing data problem yeah so yeah so they are very good at

spk_0 internal interpolation how how good are they at extra

spk_0 pollations so really doing out of some predictions how does that work here yeah so luckily I

spk_0 haven't really had to use it for out of sample predictions in my interest interesting yeah because

spk_0 I mean obviously I'm asking you that because I know tree methods are not good at it at other

spk_0 simple predictions so I'm glad I haven't had to use it so I mean I think that's one of the

spk_0 reasons why I did choose to use that because yeah because like in particular with some of the

spk_0 couple of the problems we were modeling for example like if you have like actuator limits of a robot

spk_0 that that's pretty clear upper and lower bounds that you have from the engineering process

spk_0 and then so you know you're not going to be extrapolating past that and so you know with the

spk_0 bar then you have advice interpolation within these actuator limits.

spk_0 Yeah exactly yeah so and that's actually why also I haven't been able to use

spk_0 thought methods yes yet in production other than for for exploring and teaching because most of

spk_0 the time I work on actual out of simple data so let's say if I if I work on players the age of

spk_0 players is not really out of sample mine though old players are human so you'll never have a player

spk_0 with 120 years old but if you were looking at season for instance well the years the years really

spk_0 are out of sample and so here it's a problem or players themselves are out of sample what about

spk_0 the player you never saw in your in your training that said so that's often why I couldn't

spk_0 choose tree methods or part methods because they don't don't extrapolate in comparison to

spk_0 caution processes which are really good in general at prediction and space based models.

spk_0 Okay awesome and so one last question I swear on on Bard what do you mean by using them

spk_0 in optimization routines I find that super interesting. Yep so in regards to the optimization

spk_0 routine I was specifically talking about Bayesian optimization so essentially this Bayesian

spk_0 optimization is a sequential optimization process where you typically have some sort of surrogate

spk_0 model and that can be typically it's a Gaussian process but it can really be any other method

spk_0 that provides a posterior and so I'm so I'm essentially kind of flopping out this GP

spk_0 putting in the bar model there and so using that to optimize for some industrial process

spk_0 and so with the iterative method essentially what we're kind of doing is we're training the model

spk_0 in the historical data and then we're using you don't have to get to the details but some sort of

spk_0 function or generator to generate a new set of features feature values or design points and then

spk_0 then evaluating that back with bar and then running the loop again retrain the model generates

spk_0 some new values evaluate with bar and so forth and so that's kind of what I mean generally with

spk_0 the Bayesian optimization and it's just Bayesian because we're using probabilistic methods from

spk_0 what I can tell okay so it is your like bar did it include it in your last function

spk_0 it's it's it's the it's the surrogate model so it's so for example

spk_0 in a lot of and so if you think about if you want to optimize a

spk_0 machine for like the scrap rate how much scrap an industrial machine is producing you probably

spk_0 don't know the physical like the equations that generate that that will govern or produce the

spk_0 scrap and so what's the next best thing we can do we can turn to data driven methods and so there

spk_0 we collect data about the process maybe you have sensor measurements on how fast the robot

spk_0 arm is moving how fast is material being fed into the machine and then you also have measurements

spk_0 okay this scrap was produced no scrap was produced and so forth and so we use then the bar or the

spk_0 GP and to learn the association between the parameters governing the process and your whatever metric

spk_0 your track and then so now that you have that that's kind of now your your your your function that's

spk_0 your mapping from inputs to outputs and then so with the Bayesian optimization framework

spk_0 or loop you're then kind of deciding oh hey we we want to optimize we want to produce as little

spk_0 scrap as possible so we're going to use this model that we just trained on to propose or to select

spk_0 the values that produce the least amount of scrap would that make sense okay okay yeah yeah I think

spk_0 it does so so this is not really that you are using bar inside of a loss function when doing

spk_0 optimization this is more this is something different that necessarily no okay nice yeah do you have

spk_0 driven the public writing about that people who look at if they are interested in these kind of

spk_0 methods we are writing a paper but it's not published yet so unfortunately now okay well let me know

spk_0 when it is because then we'll we'll publish that in the in the LBS in the LBS sphere which as you

spk_0 is extremely powerful in the world you know making my short to do great so I think I think

spk_0 it's a good summary of everything about do you have anything to add on that on that I forgot to ask you

spk_0 where do you think we did a good job already to give people an idea of how they can use that

spk_0 no so I mean our goal is to essentially provide backwards compatibility with the rest

spk_0 implementation so it's just a drop in replacement but I think the things that we maybe didn't touch

spk_0 on too much maybe for some of the rust people out there maybe like what were some of the interesting

spk_0 rust like rusty bits that kind of resulted in some nice performance gains I think could be kind

spk_0 of fun to talk about yeah yeah yeah um one so a couple of the things or especially one of the

spk_0 areas that was nice to implement with rust is is in the the tree proposals and so what we do with

spk_0 pi and c bar is we have a prior probability over the depth of the tree and so if you think of a

spk_0 binary tree as a like as you add nodes to it it'll then the depth of the tree will increase

spk_0 and so we have a prior probability of how deep a tree can be and you can actually set this as a

spk_0 user using the two two parameters alpha and beta and so in the tree proposals we propose a variable

spk_0 to split on and a value to split and so based off of that and the prior probability of the depth

spk_0 of the tree we can essentially we then can say how likely a tree is to be grown so essentially how

spk_0 likely is it to the depth to increase and so traditionally then in the original Python implementation

spk_0 um we performed in the tree in the tree proposal we would we would always perform a tree

spk_0 proposal in a systematic resampling to propose the particle to replace the tree but with rust

spk_0 implementation we take a lazy approach and so we use a smart pointer called reference counting

spk_0 to essentially defer or wait to essentially materialize the the growing the tree until we know

spk_0 we've until we know we will accept that tree to grow and so so we kind of like beforehand we'll

spk_0 we'll calculate the proposal we'll compute the proposal and we'll say hey this is what it would do

spk_0 if it were to be chosen or selected and then if it is selected okay materialize actually compute the

spk_0 results and so it's a bit of a lazy lazy kind of way of doing it and so there it's it's really nice

spk_0 because in the systematic resampling we resample according to the the weights or the log likelihood

spk_0 of the trees and so if you have 20 particles and then after systematic resampling you say we select

spk_0 like 10 10 of those new particles all come from the same one because I had a high weight we essentially

spk_0 have to copy or perform a deep copy or a clone of that tree structure which can be very expensive

spk_0 but since we're now kind of using these smart pointers for an only copy if we know we're going to

spk_0 accept the tree proposal then we get a really nice performance boost there and that's kind of

spk_0 a just a lower level detail that that's I think quite cool yeah yeah that that is definitely

spk_0 super cool and so if people want to want to get started on Pimacy Bart and especially the

spk_0 rest implementation what should they do what should they do can't what should they download

spk_0 yeah so so if I think if people want to get we just read the code base to start helping

spk_0 currently the code base is under my repository and I think we can link that in the show notes

spk_0 and and I have several issues there where things that should be need implemented or cleaned up

spk_0 but yeah and so I think that would be a very good place to start is just going to the repo and

spk_0 looking at some of the issues I have I have them all tagged as good first issue and various other tags

spk_0 yeah yeah yeah yeah so I put that already in the in the show notes so if you look at that folks

spk_0 that's if you want to start getting involved I was asking what if you're when at what if people

spk_0 want to start using it what I would advise is and that's going to be in the show note too

spk_0 good is the Pimacy Bart website look at the tutorial notebooks and then just install Bart RAS

spk_0 and just run these notebooks but using the rest implementation and you'll see it's literally

spk_0 dropping your replacement except when you need to use two Bart random variables as we're seeing but

spk_0 otherwise this is literally the same and I think it's amazing is makes it so easy for people to

spk_0 start and Bart models are really good because they are super flexible they are very easy to

spk_0 understand and they are usually a very good baseline if you are in a case where trim methods are

spk_0 applicable so if you're in these case practical advice would be definitely try Pimacy Bart because

spk_0 the model is going to be super easy to write it's going to be just like this but just figures out

spk_0 the functional form so it's just like one Bart variable you fit that into your likelihood

spk_0 and you're done and you see how the height works if you're in the cases we say we talked about

spk_0 before where it doesn't work I will then that's going to be for next time but if you are looking

spk_0 after being these kind of cases I think it's a very good shot too yep absolutely and so to play

spk_0 a sound Gabriel I saw that you recently worked on some other optimization problem which is

spk_0 reproducing uber's mouth cat place optimization and you have a really good block pause to put

spk_0 in that I put in the show notes and you put also the code into a guitar repo that is also in the

spk_0 show notes folks if you want to look at it do you want to do in a touch on that briefly basically

spk_0 what it is what it does and how why would people be interested in that uber has a system in place

spk_0 that performs resource allocation and so what their problem is is they have a what uber can do is

spk_0 there are a ride ride hailing service with a bunch of different programs like uber eats and your normal

spk_0 driving scenarios but what uber can do is that they can actually they can influence the marketplace

spk_0 by by essentially allocating money to different programs to stimulate supply and demand but this

spk_0 results from a business problem as in a as a company we have a finite amount of money how much should

spk_0 we allocate to each city in each program with the city such that we can maximize some sort of

spk_0 business metric and such as like gross bookings which then maybe influence with the profit of the

spk_0 company and so I am interested I was interested in in how you can how this even works so how do you

spk_0 perform resource allocation with optimization methods but then what I found out quite interesting

spk_0 was was that they were embedding a neural network into the optimization algorithm to model the

spk_0 forecasting problem and so you kind of have these two kind of interesting components you have the

spk_0 optimization algorithm but then the fact that they're embedding a neural network into the system to

spk_0 help learn the association between how much money they're allocating and how much this influences

spk_0 a business outcome such as gross bookings and so is that the same idea as the what you talked about

spk_0 and then you can go back to the optimization yeah yeah so it's kind of the same and because I

spk_0 think I don't think a lot of people I mean maybe a lot of people know this but like you you can really

spk_0 embed like any machine learning model into like an optimization kind of program and then optimize

spk_0 for those features that you're using in the model and so essentially this is kind of what I

spk_0 want us to do here is like okay what's embed a machine learning model into an algorithm

spk_0 where optimization component to produce an optimal allocation scenario but what's really

spk_0 interesting with the optimization algorithm used here the alternating direction method of multipliers

spk_0 is that it's for it's a distributed optimization algorithm and so you can

spk_0 so it happens kind of in three steps and so in the first step you use the neural network to

spk_0 predict essentially how much gross bookings each city is going to have given a certain amount

spk_0 of allocation to this program and then you select the ones that optimize that objective and then

spk_0 the next step you perform a consensus step where you are trying to get the cities to kind of

spk_0 agree with each other to satisfy the constraint typically you have the constraint of maybe

spk_0 uber can only allocate a million dollars and we need to divvy up that a million dollars to each

spk_0 city such that the sum is is not exceeding million and so you have that consensus step and then the

spk_0 last step is just kind of a dual update step and then you kind of the new iterate over this and so

spk_0 it was really kind of a nice exploration but I think what could be more interesting and something I

spk_0 talked to with Warren was what if we embedded more of a probabilistic model in D here and so then

spk_0 instead of just and said so now we can have the entire posterior over the this over our decision

spk_0 space and we can say hey you should allocate like between 200,000 to 150,000 to city a and program

spk_0 a b and so that's just kind of kind of where we I kind of see this going in a way is kind of

spk_0 instead of now replacing your neural network with a more of a probabilistic model to have uncertainty

spk_0 over our decisions. Really cool yeah yeah this is really amazing uh very low net well that's

spk_0 actually a public writing that we can refer people to if they are interested in this idea you were

spk_0 explicitly before also embedding a barred model into an optimization algorithm I think I think

spk_0 it's very close at least using this is using a neural network but this is also very very cool so

spk_0 yeah um and I definitely some some application event in the baseball in the baseball for sure I mean

spk_0 the spots in the spots world in general so yeah this is this is super cool yeah thanks thanks

spk_0 Gabriel so yeah all the links to an ad during the show notes any any other current or upcoming projects

spk_0 you want to talk about so before we close up the show

spk_0 something you're excited to so not really current projects on the play maybe there are a couple

spk_0 previous projects where I see more probabilistic programming could be in play but yeah nothing

spk_0 up thing at the moment okay and I'm curious also what do you what are you curious to see

spk_0 in the coming months then night like what is it something you would really like to see in the

spk_0 Bayesian world maybe but in the in the data world in general in the data science world in general that would

spk_0 have a huge impact and potential on your on your work you know or things you are able to do

spk_0 I think because I think a recurring theme of a lot of the problems I work on is optimization

spk_0 and I think better sub better tooling around yeah embedding or using machine learning models

spk_0 probabilistic models within an optimization framework whether that's Bayesian optimization whether

spk_0 that's in the traditional convex optimization or sequential decision making I think because

spk_0 typically now especially at like work I need to hand roll all of that together myself and I think

spk_0 it would be really nice to have a package or a framework that really helps with that process

spk_0 yeah yeah it's I agree it would it would be something very interesting and very useful amazing

spk_0 damn thanks odd Gabriel I am very very excited to try these new these new things I like the

spk_0 the bot rest part and also the optimization thing so folks if you want to contribute to

spk_0 barter s the links are in the show notes we're always looking for people who want to make this

spk_0 better for themselves and everybody at the same time and I'm sure that Gabriel will welcome any help

spk_0 on that anything to as Gabriel before I still ask the questions

spk_0 you know no nothing nothing for my seat good I'll take it as a sign I did a good job

spk_0 so I'm gonna ask you the last two questions I give a guest at the end of the show if you had

spk_0 a limited time and resources which problem would you try to solve

spk_0 mm-hmm so I think I'm going to defer back to what I was just saying in regards to the tooling

spk_0 and in particular and as a specific problem space is sequential decision making and so kind of

spk_0 the big idea or the big pitch there is like what so what decision should you take now such that

spk_0 you're a immediate action or you're immediate reward is maximize but also takes into account

spk_0 the future contribution the expectation of the future contribution and this is kind of this

spk_0 problem space of sequential decision making sequential optimization is really kind of quite formal

spk_0 in the control theory world but in regards to kind of like business applications I think is quite

spk_0 lacking especially in the open source world and so developing like a library or framework for that

spk_0 I think would be a great step forward for modeling sequential decision problems that's something I

spk_0 would really like to work on hmm yeah that definitely sounds like it would be very useful and second

spk_0 question if you could have a dinner with any great scientific mind dead alive or fictional

spk_0 so I already had dinner with Tommy Catrata in Buenos Aires

spk_0 and I recommend you to rest friends so I feel like I was at that dinner two you know yeah no

spk_0 probably I would say probably Richard Feynman because I've read some of the biographies of him

spk_0 and not only would it be like I think it just be a fun dinner right like yeah a lot of technical

spk_0 people can be quite boring or socially awkward but Feynman being technical and fun I think it would

spk_0 be a very good dinner experience yeah yeah yeah definitely a great on all these accounts

spk_0 that Feynman sounded very interesting and cool and that the technical people can be what you said

spk_0 so yeah this is a great choice your different these dinner is getting grounded I can tell you

spk_0 this is a popular choice so we're gonna have to to scooch at the dinner table but you know we should

spk_0 go to Buenos Aires to the same restaurant I'm sure Feynman would have a lot of things to say about it

spk_0 I think so too I forget I forget the name otherwise I would recommend it right now to

spk_0 yeah me too actually and it's like I don't know I'm blanking on the name um Tommy come to

spk_0 our rescue yeah yeah awesome well Gabriel and that was a great show thank you so much for taking

spk_0 the time show notes are gonna be full for these one folks so make sure to take a look at them

spk_0 and well Gabriel next time you have a fun and useful project like that you are welcome anytime

spk_0 on the show otherwise really looking forward to meeting you in person in Switzerland at some point

spk_0 I definitely gonna come and do some hiking over there which my wife and I love Gabriel thank you again

spk_0 for taking the time and being on this show yeah thank you so much it's been a lot of time

spk_0 this has been another episode of learning Bayesian statistics be sure to rate, review and follow

spk_0 the show on your favorite put catcher and visit learnbasedats.com for more resources about today's

spk_0 topics as well as access to more episodes to help you reach true Bayesian state of mind that's

spk_0 learnbasedats.com our theme music is good Bayesian by BebeBerrytman fit MCLoss and Megeran

spk_0 check out his awesome work at BebeBerrytman.com I'm your host Alex Endora you can follow me on

spk_0 Twitter at Alexander's or Endora like the country you can support the show and unlock exclusive

spk_0 benefits by visiting patreon.com slash learnbasedats thank you so much for listening and for your

spk_0 support you're truly a good Bayesian change your predictions after taking information and if you

spk_0 put in number less than the Bayesian let's adjust those expectations let me show you how to be a good Bayesian

spk_0 change calculations after taking fresh pain in those predictions that your brain is making

spk_0 let's get them on a solid foundation

#142 Bayesian Trees & Deep Learning for Optimization & Big Data, with Gabriel Stechschulte

Interactive Transcript

Topics Covered

Related Episodes

Out on a Limb | Nov 2nd | Hillhurst United Church

How Do You Respond to Jesus? John 5:16–47 :Daily Devotional, Daily Bible Study

Episode 240: Otto Aerospace CEO Paul Touw talks teardrop, laminar flow, and Phantom 3500

Burnout, Breakthroughs, and the Power of Getting Back Outdoors | Gun Talk Hunt

Share Episode