Automated policy enforcement for real-time operations, security, and compliance for life sciences
Learn how NIBR leverages Turbot Guardrails to accelerate cloud adoption leading to drug discovery in weeks.
Disclaimer: Automated Transcript
Thanks everyone for coming today especially in this late time slide so hopefully we'll get you in and out in time for dinner my name's Ken Robbins and I run engineering at the Novartis institutes for biomedical research also known as Nyberg and with me today is Nathan Wallace who's the founder and CEO of turbot let me start by giving a little bit of context into what we do at niver to motivate what we're talking about in how we manage policy controls in the cloud so nieghbors the research arm of Novartis so we deal with early drug discovery all the way up through phase 1 clinical trials and as you can see there's a lot of scientists projects disease areas scientific platforms a lot of complexity a lot of different systems at different scales and as you know in science it changes fast and it's constantly changing and so that's the informatics landscape that we need to keep up with equally complex is our technical landscape technical diversity of course
There's also changing very quickly and we have lots of engineers and infirmity shion's we have cloud apps on Prem a lot of variety they're bleeding edge new stuff as well some really old legacy things as well so our main challenge is that we have to figure out how are we going to keep up with the science and accelerate drug discovery in the face of this constant change high complexity high diversity but we also have to keep control compliance security monitoring of all this at the same time so we've been using the cloud since let's say 2010 but generally in a very homogeneous way a couple years ago we started to build our new cloud strategy which is really to shift to the cloud in a serious way and one of the first things we did as we figure out what that strategy was going to be is we looked at this diversity and and the landscape that we had to manage and it was pretty obvious that we're going to have this order end problem we've gone a hand-to-mouth have to be managing individual applications and that was going to be challenging so when the first things we did as
We clustered our applications into six fundamental use cases these were what makes sense for us you probably might have same similar overlaps with your own but with some differences as well but for us these are the six news cases that mattered and the characteristic clustering criteria is there's for each of these is a unique functional template or blueprint as well as a control requirement that pair of those things together describe or define each of these boundaries around these use cases so maybe what I'll do is I'll walk through some of these just to give a little bit of a flavor of what they are but also you can see how we divide them so the first we start off with is this thing called techlabs this is our cloud sandboxes this is where a implementation or engineer anyone which is this go into amazon and do conner whatever they want this is we we transform the organization by giving people a place to explore and learn Amazon download open source software write new algorithms whatever you might want to do we give very little friction there high freedom but in exchange we say that's only for public data you can't use any of our private data there so very few controls limited on what data and it's really firewalled off almost entirely from the rest of the network
So that's why it's safe to do that so that's one sort of pattern next we have this informatics labs this is where our informations and our data scientists will operate again very much like a tech lab that sort of ad-hoc workloads but in this case they need access to our entire data ecosystem and so we have to have more controls because now we're opening up the network and so there's a different shape for what those controls are and that's unique for that particular use case well class of use cases then we have applications and services this is more traditional infrastructure as a service and this is where you'll have your web applications web services data platforms things that are on infrastructure that's long-running so you can imagine this this servers that are running for a long time that need to be patched there's access controls and security groups it's just a different shape and and sort of control and compliance profile versus some of these other use cases and this is in fact where we spend most of our time at least in terms of numbers of use cases maybe one more here external partner collaboration is another important and again unique profile this is where so over the last couple years in particular niver has shifted our focus to what we call opening the framework where we're accelerating and amplifying the amount of external collaborations that we do with academia and industry partners so we need a place in a way to do that when those collaborations have an informatics content which of course is always so these so we have this a environment for external partner collaboration where we have to be able to handle the control and compliance as well as the data sharing in this kind of common of arm which is again very different from some of these other use cases we also have distributed computing we have a robust internal HPC environment but we also burst that out to the cloud again different sort of model there also we have non Amazon basically anything software's or service platforms as a service we need to manage those as well and that's just you know it's more vendor management than it is infrastructure management so also basically are the use cases and the clustering that we did and by coming up with this clustered pattern we can apply template so we don't have to do everything across every possible combination of workload so after we have our requirements then we have to figure out what is it we're doing how we going to build this how we going to satisfy these needs so of course one of the very early architectural decisions we had to make was what is our account model are we going to have one big single account or maybe a small number or a lot of accounts we're really excited and interested in this idea of using a multi account model because that gave us a really good boundary which seemed to make sense for this diversity of workloads that we have but like I said we were in the cloud for a while in a very homogeneous sort of way and it was really hurt our heads to try to figure least hurt mine how are we going to manage policy control and compliance and monitoring and know what's going on which we have a responsibility to do across a vast number of accounts
It just seemed like a silly thing to even contemplate and so really by inspection we didn't run many experiments on this we pretty much said that yeah let's not do that so instead we opted for the single account model because that seemed understandable and manageable and sure we could handle that so use this picture here of this large restaurant to be the model for what I'm talking about what I say single account model in our case we start off with a few accounts you might have like devtest Prada separate accounts GX p90x B that's sort of kind of separation still it's a handful but in any one account is a pool of many different workloads just like you have many different tables here so you have this common environment that people need to sort of separate from each other but they still could bump into each other you have a common kitchen which has a fixed menu doesn't change very fast it's very hard to accommodate a lot of different menus with one kitchen and then in order to get something from the kitchen you have to use these black vested waiters to go into the kitchen for you you can't have a shared environment and have everybody be autonomously writing their own
IAM controls well and that's when we start to find some breaks in this model as we start to build out we went for this model and we start to build out a few workloads and a few more we recognized very quickly we are starting to bottleneck because we had to control the gate of who's going to write I am and every time the new workload came in the team building that workload really couldn't write their own I am policy because that's really got to be the centralized service just like the waiters going into the kitchen and then there were some other issues you have we've created some tagging standards but now how you enforced tagging now you have to start forcing people into some automation models to make sure you get the tagging right the whole thing without going into too many details as we start to scale we could see the cracks in the model and realize that the kind of scale we thought with the number of diverse workloads we have you notice I showed before we had 400 internal applications running today how are we going to do that when we go to the cloud it just seemed like this was not going to work and while we're scared about our ability to manage that we recognize that it was better to lean into that and figure out how are we going to automate and solve that problem because doing it in the single account model was just too brittle wouldn't work so good lean style we stopped we shifted and pivoted into a multi account model so here I got again I shouldn't be talking about food before dinner in a late session but we've got a another food example this is a food court in the Venetian actually but this one's in Macau because they had better pictures and so on our multi account model here it's just like each one of these restaurants each restaurant has their own menu with many items on it but each cuisine is very different its own autonomous team and staff access food supply they have some shared services across plumbing and electricity let's say but otherwise they're all independent until now shifting back to the cloud how does that work with multi account
What it meant for us so by using this what we call an account as a container model I can get this high diversity that we know we need just like you have each restaurant has a new style cuisine comes in they can just build a new restaurant we can just create a new account every time so we can accommodate that diversity without it's always that's this clean separation now if I have the separate account it's very easy to give autonomy I can give it a team here's your account you can go work in there and get some stuff done and focus on on the results and not be worried so much worried about making sure that your boundaries are clean another element which is actually easy to overlook and is the you get unique limits with an account you probably know Amazon has hard limits on on all the resources but of course there's also soft limits on just about any resource that you might consume these are done at the account level right so that means if I have a shared account and have one workload using 60% of my resources and another workload using 60% of my resources they are both causing each other to be throttled and no one knows because each one is under the limit well what if I'd now have dozens of workloads all using the same resource now the best thing you could do there is increase all your limits to the max which is kind of like putting a 500 amp fuse in your kitchen circuit it just that's not why we have limits so it's a limit ability to do it the account level helps when you're working with the workloads per account and of course with the separate accounts I can also now give access control per workload per that vertical slice of what it is you're trying to do who should have access as opposed to more of an infrastructure model where it's maybe slice across your role in the organization you have access to service you have access to hardware to application or middleware you know now
I can slice vertically which is actually really a powerful empowering because I can give exactly the right access to who should have access to which application depending on what its needs and and control requirements may be and of course the obvious one is get this limited blast radius when you have a fixed account boundary generally the worst that can happen is something within that account whether it's a hack or maybe someone runs some automation that decides that they should sort of delete all your EC 2's but it goes amok because you have a one-off error then you know it's nice to have that account boundary to limit your damage risk and of course cost management is really important we don't talk a lot about cost because we're moving to the cloud for agility we actually do save money but we try not to focus too much on cost but if you're not careful of course you know it can grow very fast by having individual accounts each account sees their own invoice without tagging which is really powerful you can see exactly what you're spending and the breakdown across resources without doing anything at all Amazon gives that to you for free you can always aggregate things back up if you want it's just hard to break them down so the account the breakdown of having a separate account is really powerful in that way and of course then you can also do something
If you own an account you can set a billing alarm because you know what makes sense for your workload it's it's low everything's localized and you know again when you're in this environment it's hard to do that so just looking back for a moment on these use cases again just to remind you what we had and I'd like to show how we take these use cases then we melt we map that to a multi account model and what like I said before we have this notion of a template which applies to both the function as well as the policy pattern that we're applying so here's what a multi account model looks like at niver and I'll start with these techlabs that I mentioned so for the techlabs we have about 30 different accounts each is generally organized around the boundary of a of a department or a group and maybe about 5 to 15 different individuals work in that account there's a template that says how to build one of these just as a blueprint we can just run to create some tech lab and although once we built out a bunch we don't grow them too quickly and then there's a set of policy controls that apply to all the tech labs so each one that exists I don't have to do anything new and just say oh it's a tech lab therefore it gets this policy very much like object orientation an informatics lab like I said is very similar and so here I have an informatics lab for every informatics team typically they have around maybe five individuals working in these and these can grow and be added as needed and again just apply simple policy template that's different the function is a little bit different and informatics labs you make you know some of the compiled applications available we wouldn't necessarily do that in tech lab and certainly I mentioned we have different control so the policy pattern is different where we spend most of our time in there very cookie cutter but it's the vast majority of our accounts as these applications and services accounts and here we'll give a separate account for every discrete workload so product a for example is some application and so we will give it a dev a test and a prod will give three different accounts to that team building that product and the the dev environment
That's the separation of these environments they don't have to be separate but almost always they are is really powerful so you can imagine you want to run some nation that says that's no day of 8x5 if I can run that uniquely in the dev account I don't even run it in the product out and so I don't have to worry that maybe something's miss tagged or that is a bug in the and the script that's gonna start turning off and shutting down instances because I made some mistake because the environments are separate it costs almost nothing to have a separate account so I can just get that separation and I get this goodness here similarly we get the separate cost management so let's say I'm looking at dev and prod and they cost the same well you're gonna you know the team who understands the workload is now looking at their cost and they can optimize that and they can recognize that well dev probably shouldn't be the same as prod maybe I should only run it 8 by 5 or something less and then of course another big aspect is the access control let's say you have a small number of users and prod that gonna have high privilege to access and everyone else might be read-only or even a totally different team running prod if you want but I can give super user access to everybody and their friend in dev and not not be a problem and one more example would be a product B on the right there if you have a pure service application or single page web app then there really is no need to have separate environments you can put those in one account so we accommodate that it's just most of the time it makes sense to separate them and distributed computing again
The functional difference here is usually it's run by a scheduler there's a large IP space which is uncharacteristic and different from let's say a tech lab which doesn't need a large IP space and then this partner collaboration you know Amazon created the notion of an account I mean it's just sort of the natural thing to separate customers from each other that's a pretty firm boundary well if we're going to do a collaboration all the parties also have to trust us it's not about that all the parties have to trust us that we're going to keep everybody separate and controlled so we allocate a new account for every partner collaboration and we can set up the policies and framework for what a partner collaboration looks like again it's that template of those accounts so we have one recent example which is really exciting it's a three party collab raishin which is a little bit more challenging than the simple two-body problem and we have two universities and an eternal team all bringing staff data and algorithms into this environment they're all bringing that in and all taking some of it out and so it's the shared safe DMZ where everyone can work comfortably in incidents secure it in control and of course there's some well some common services consolidated billing I mentioned we have there's billing information at the individual account level but of course you can aggregate that up into a consolidated billing account
So I have a single Pio against all these many many accounts again networking we have several direct connects we bring those into a common account that the net ops team can manage and those of epc peered out to the rest of the accounts and I mentioned earlier that we were shy at first about leaning into a multi account strategy even though we kind of wanted to and then once we decide to of course this is where that Liz we have an account that runs our automated guard rails it's sort of I like to think of it like the for loop that goes across all accounts and says what kind of account are you and what is what policy should you get both the policy template as well as inherited variants for your unique instance of that of that template and so that's where that happens in this automated guardrails account is the oversees and manages and make sure that everyone stays in control and compliance and we actually started to build this ourselves because that was that pivot that we made which helped us appreciate the problem and understand it and then recognize it actually is a very big problem which is obviously why we're scared from at the beginning and Amazon gives you lots of capabilities to do that we built a bunch and then recognized that this was for us at least much more of a Buy and than a build decision and so that's when we bring in turbot who is the tool that we use that actually runs those automated guardrails and keeps our environment safe now so what we're trying to do is use this account as a container sort of model to build a virtuous cycle where I have multiple accounts lets me give teams high autonomy and enablement and that lets them be highly agile and innovative and keep up with this dynamic need that we have to be able to adapt to the science very quickly in order to accelerate drug discovery which is our mission but the activation energy to enable that is this automation and so that's where we have that automation that helps us allow this goodness that's on the right side of the picture there and to explain what that actually looks like other than so they what I described how Nathan from turbo will come up and explain a little bit more how turbo actually does that what was really interesting about working with Kenneth and Tim Novartis was that they realized something really early that actually takes many of our customers a long time to realize and that's that power has completely shifted from the infrastructure team to the app teams the cloud moves infrastructure to the app and what that means is that basically now we have to move to a world where we think about that infrastructure being controlled in real time by the software that's working within it if you have a single manual process you're dead in the water if you think you can review something or do it for them you've just killed the agility of the cloud you have to think about in a different way Kevin team we're able to see that very very early and that led them to some conclusions like multi-account etc
That got them moving now as we think about how to automate out our controls controls for the cloud the key there is to have a philosophy about how we're thinking about that we want to give agility to those teams to our app teams we need researchers to be able to do research without being stopped we need web teams to be able to move quickly we need GXP teams we got to move as slowly as they would like right but we don't want that slowest common denominator to no longer to take over the whole organization that's traditionally what we did right we can put the biggest apps in there they move the slowest without shared resources and have everybody moving that way so we have to rethink all of that in the world of cloud when we now have the ability for those applications to manage their own infrastructure so at turbo as we've worked with lots of life sciences companies financial compensation to do this we've got a few lessons I'm going to share with you now and then we'll talk about how they can all come together to give you a framework for automation guard rails which are labels that's sort of agility so the first one that Ken spoke about a bit was that multi-count model we think of it as a lot as workload isolation right accounts are beautiful hard blast radius to create that isolation but even more importantly we're in a world now with cloud well we can actually isolated tire applications on a networking level remember the days we had a physical server then someone had a brainchild let's create VMs and now we're all excited for containers it's the same thing applications now live in their own virtual data centers we're completely isolating them out that's technically interesting but what's really exciting is it means that each application is now responsible for its own change control we can kill the cab right we can kill those slow choke points that normally bring our organization down right so creating that isolation gives us safety you know all the benefits can mention but it also allows us to work independently from each other now the other part about cloud that changes completely how we have to think about our automations is that there is no way we can compete with it if you're a central cloud team you are not going to compete with Amazon a thousand new features a year if you're trying to choke those out or compete with them or build your own you're going to come to reinvent very very nervous every year about all the things you're about to have to do when you get back but instead if you can think about a way to enable your teams to use all these amazing capabilities queues you know lambda whatever those things are and work out how to enable those services for those teams all of a sudden you're turning that speed of cloud that was scary if you're an infrastructure team previously to be an advantage right because now you're starting to ride that rocket and turn it into something that you can use as an organization so the next part of thinking about automation is not to abstract it if we abstract these tools we now have to keep up with them if we build a new interface to creating an s3 bucket we have to move as fast
As the change of s3 right even if we require people to use templates to do that we have to move as fast as those things if you get in the way you are the bottleneck and you do not want to be the bottleneck when you've got all these teams getting excited at conferences like this one the second thing that happens once you start writing that rocket is that you can all of a sudden tutorials are plentiful Google search results work to your advantage you know instead of training everybody in in your internal processors for how to do things and find things you can now use all of that to your advantage in that environment that then changes the relationship we have between our cloud team and our application teams instead of an application team saying you better do this for me right and then having a project manager to chase you so then you get a project manager to combat their project manager and then we're running around in a bunch of meetings now we're in a world where it's actually like just do it start the server create the bucket do what you need to do and we're now moving with speed for people who wanted to get stuff done that's really exciting because they can get stuff done for people who prefer to chase other people for not and not getting stuff done it's a little scary but you know it depends who you want in your organization the other thing is with of course even if you felt like you could put a manual process in front of all of that or approve things manually review JSON files your check encryption policies good luck doing that in real time once the app starts using those Amazon API infrastructure is in our real time controlled by the application if you try to do anything other than work out how to wrap it in software to find controls you'll quickly fall behind on this the nature of that relationship is a complete change from where we used to be
We used to be in a world where it's like hey I need this well have you got capital we'd go through all that stuff now we're in a world where it's like you're just the teams are just doing it you're working with them to help them understand how to tackle it and then you can feed software into that to make it move faster and faster and faster of course once you've got this set up where you've got people now with the freedom to do things they want to do we quickly hit the point where it's whoa that's a bit too much right so I'm very happy for anyone to create as many s3 buckets as they like and store as much data and as they like we just send them the bill it's their business units their bill etc but it better be encrypted I want logging of all access to that environment right and there are other rules you might start to set these are the policies of the controls you want to wrap those teams in to make sure you're keeping that environment safe you have to think very carefully about what are your must rules versus your should rules you must use encryption you must have logging you know you should use this region or maybe you must use this region it's up to you so as you're thinking about your automation quickly get to the point where you're thinking about the policies and the rules you want to wrap this is no longer a world of discussions and grays this is a world of real-time automated action if you don't know what your automated action is if you don't know what your opinion is you can't write the software right you can't make those controls work now as you get those policies flowing all of a sudden you start to create B pcs with subnets and then you go what are we going to name them and what's the route tables and what's our best practice around that gateway should they be high availability you know just to talk networking for a moment you're going to have hundreds and hundreds of these policies and decisions as you move through in time be ready for that start building them slowly and gradually as you think about those problems for your organization right most of them have been solved before so there's lots of good advice out there you can map to NIST you can map to CIS you can do that sort of thing but you're going to have hundreds of decisions that were previously in SOPs it's a double use of Bob who works down there you know always did it that way right that's the organizational knowledge that's now wrapped up in software and of course the other thing is because we are large complex enterprises particularly in life sciences there's gonna be exceptions a lot of exceptions everything in s3 must be encrypted we're all happy with that
We're doing a public website okay I can encrypt that one right and you're going to end up with a lot of exceptions so be prepared for that think about how you're going to tackle that in your environment and how you're going to manage those control those time them out that sort of thing we'd like to think about policies like this we try to keep it really simple similar to the way our seas are written you you Matthew Amazon route 53 must be disabled everywhere no one can use it except for maybe the DNS team right must should simple rules and have a scope for those rules or exceptions now once you've got your cloud team there setting some rules and some policies in conjunction with security and you've got application teams actually doing interesting things researchers starting to do great stuff we're now in a cycle where you actually can learn by doing we're no longer pointing your fingers at each other trying to get it done we're now in a place where it's like I need to use this what tool can I use right the cloud team might be experts in how to do that to accelerate each of the teams while the application teams know their business area the best and we've changed the relationship from one of requests fulfill pointing to one of how we're going to do this together can we create a policy exception while we work this out right which is generally a pretty reasonable question that security be happy with right
So you can start to change that relationship and allow people to experiment within that really tight blast radius you've set up within a set of policies you've defined right and exceptions that you're willing to make to those rules and that gives you a large collaboration pattern that allows you to really start to realign the way your infrastructure teams and your application teams and your researchers are working together makes it much much more collaborative and so that cycle then becomes oh I need to use queuing okay we get better at queuing the next team can benefit from that and around and around we go once we have all those policies and there's different paths brought together we're suddenly in a place where we can implement real-time guardrails this is the critical part checking is good but detecting a problem and instantly correcting it is better we need to be in a place where we know we're in control of our environment all the time not at the point when we did audit a month ago right but all the time every s3 bucket now that exists previously now and in the future must be encrypted and I want to ensure that in real time if you're going to let your users use the Amazon console use CloudFormation use terraform use api's you've got to have real-time controls you can review their code otherwise you're gonna have to review every piece of code they write everything going on in that environment to stay in control that's not practical it won't work but if you can start to set rules on the environment like if I see an unencrypted bucket I want to enforce encryption on it you can now give those teams a lot of freedom and you no longer have to review all those projects on the way in reduces your workload allows you to focus more on the new services and now we're in a virtuous cycle
The basic pattern for a guardrail is quite simple you're going to wire up things like AWS events and cloud watch or watch your cloud trail it's up to you bring those events together from the different regions and account and consolidate them into a single place you can roll out a thousand lambda functions if you'd prefer right or you can start to bring them together but the key is you want to get to into a place where you have context about how to make your decisions well you know what your policies are and how you want to treat that bucket in that environment or that ec2 server in that environment so you need to combine that event with the context so you can make a good decision and that's when the guardrail comes into play to actually implement the change you want in that environment turn on encryption delete the server because it's in an invalid place whatever those actions are you want to take or audit trail out those capabilities report what's going on back to the users right and you're in that full loop that Ken mentioned right I can see an action happened I know what I want it to be take a response to that that creates a new action which I can then go and we're going around and around right once we create this loop we allow console we allow API access we have all these different capabilities ready to go now as you're thinking about your automation it's one thing to create guardrails and patterns on people and to say you know you can't do that I can automatically fix that but what you really want to do is start helping them move faster we've lived together an enterprise for the whole point of working together and moving faster so occasionally we need to do that right so basically what we do is we start to think about how can we create common language and models for these teams
If I'm talking I am I want to talk about read-only versus operator versus admin I don't want to talk custom Jason policy because I can have much higher bandwidth conversations with you when I'm talking networking I want to talk private network isolated networked public facing DMZ here internal facing but we direct Internet access I want patterns that I can talk about and then when I'm in a security of your my yeah my application has to DMZ subnets combined with to private subnets and we all know what that is and with automation we know it's in control common language and most patterns give us very high bandwidth conversations and they allow us to roll out faster and faster as an organization as we learn this is an example for I am define your levels metadata access means no reading read-only operator standard common language that we can all start to understand and use faster and faster of course once you've got all this moving on and apps creating something in the environment spinning up 26 servers in an EMR environment then we automatically responding to each of those tagging them you're turning on whatever policies you want in making sure encryptions there etc we have a fast-moving infrastructure controlled environment and what we want to do now is know what the hell happened for audit trail security purposes but also for our own application and development teams who just need to see what's happening in that environment so we need high visibility to that environment of what actually happened in the change history of that we basically almost need like a DF like we had in our code to see what happened over time to our configuration in our environment so think about the visibility and how you're giving that back to the users as you automate out these tools otherwise they'll be confused and asking you a million questions
Once we've got those pieces together we're really in a world of automation we love to think about and talk about killing the ticket if you see a ticket you should be how to automate a response to it if not you haven't defined the world problem well enough yet once you automate that response you should never see that ticket again we are automating out level 1 and level 2 anything that can be scripted at for a human to handle can be automated out and anything that can't be scripted for human handle should go to the app team probably so we can rapidly you move through our automation and once we get those pieces together we start end up in a world of software-defined operations software-defined infrastructure has software-defined operations to help manage it it moves in real time it just with it at that pace so what I was going to do was take an attempt to show you real quick what that actually can look like as it all comes together so this is an example of a turbot environment which implements a number of automated guard rails you log in using Active Directory that sort of thing to be able to see in the environment how that you so you have that level of control users then see the Amazon accounts they have access to their two out of hundreds right or their three right each person can see their scope of the environment they can log in that see things like their billing etc immediately what's more important though is the fact that they can immediately go to that Amazon console they are not abstracted from those tools we want them using that native and direct access in this type of environment
Now let's say they go to something like an s3 and do a very simple action like creating a bucket down there it is oh jeez so we create a bucket as soon as we do that in this environment because we have automated real-time controls associated with it if we come and watch the properties on this bucket for a moment we'll start to see over the next few seconds those policies take effect in the environment we saw the tags start to get pickup yes sir attending slightly different results there so when we look at the tags now it's automatically picked up the tags the environment it started to set those on that bucket refresh again we can see now it's turned on the access logging for that bucket a real-time control in response to a policy right it's also done things like say you can't delete that or turn that off right you're starting give people the ability to create buckets set up best practices while giving them appropriate amount of control while letting them have a lot of freedom and access in that environment as we do that we also do things like for example bucket policy permissions enforce encryption in transit right real-time policies done as detective and corrective actions in response to an action they took in the console if we come back to turbot we can see that we've detected the fact that that new s3 buckets been created we start to create a history of that action and that capability therefore what happened with that bucket so we can see the controls here for this bucket for example the encryption in transit control the tags have been set we can look at that control and see the event that triggered it this is the visibility I'm talking about letting people see the events that led to the actions in that environment so they know how things were done and why they were done that raised an alarm which created an automated action which then automatically closed the alarm visibility into a real-time control operating in the environment the decisions we're making there are tied to the policies which I mentioned the idea that you can set those simple policies in your world to determine how you want things to work for example you might say in this case this bucket really needs to have versioning turned on let's enforce versioning that's now a real-time enforced guardrail that will constantly insurer versioning zhan on that bucket when you're managing your policies you need to think about exceptions think about ways that you can see all the exceptions in the environment to a policy
So here the policy is I don't care about versioning except for these buckets you might use version control to manage these you might use an Excel spreadsheet it's up to you but you're going to have a lot of exceptions you need to think about the environment and how you want to manage those is important we come back to the s3 console refresh there we should see things like for example the Virgin has been turned on as a result of that change etc now back in the Amazon back in the turbo console we can actually start to see the history of change to that bucket and we can start to find in resources throughout the whole environment so for example the bucket we just created we can see it there we can see its activity history right from when it was created all the alarm changing etc we can see the history of configuration changed that bucket it was created by Nathan it was changed by turbot to add policies to it add tagging to it we then made a change to add the encryption to about deep that is an automatic guardrail right and it also turned on the versioning so you start and get that history of change in that sort of environment in response to those guardrail rules that you've set it's worth thinking about how you want to find resources for that sort of stuff so search could also be for things like IP addresses or understanding the different parts of your environment whether it's to find an ec2 server with a certain IP rate or an O key range right so you can really start to cut through that environment to help you troubleshoot and see visibly what's happening with everything you've got going on in there I mentioned one of the things that's really important to think about is that permission structure having a common language which to talk so if we can grant by searching for someone from our directory and then thinking about whether they have your s3 admin metadata read-only or operator this is a much simpler conversation than talking about detailed JSON objects right if we simplify that to make sure it's always the same again common language drastically accelerates our conversation right now once you have implemented those pieces what you then want to do of course is make sure that they can understand that in the console as well so that you have that full native control all the way through this is not an abstraction it's an implementation so if we look at the groups we can see they map immediately to those same things we saw in the other console right so we're looking for that type of common language patents that deployed to scale right with real-time action and control through that environment so software-defined operations is really about trying to give you that visibility as we mentioned comply with those different pieces now
If the strategy nerds out there you'll notice this is an activity system as Michael Porter would describe it and effectively what we've just done with those decisions we took on the way through I need to isolate my workloads so that I can implement different rockets in each one I can combine them by teaching people how to work inside them and implement those patterns at scale right that gives us a powerful set of things that all work together to allow our application teams to move faster and faster while we implement controls around them I look nobody can it's not we're not when Novartis has got to thanks Nathan so basically you can see now you got a little bit more feel for like what the turbot console looks like and the way those controls would automatically happen basically how I sleep at night right because once you set it up the machine is doing the work for you makes life easy so let's now shift a little bit from the how - where's it got us so you can see here this is where we are today in niver this particular model after our pivot and really started to deploy actual workloads in this environment is not much over a year old now and this is part of our cloud first for new workload strategy so everything we do in Ivor now is cloud first but we still have a lot of legacy well migrate eventually we've done some migrations opportunistically but most of these numbers here represent what we've done just really with them last year so
We now have hundreds of users that have that direct console access the turbot access and through that also the Amazon console access which is incredibly empowering and then we have you see what 170 accounts and growing at a pretty good clip and we think that that's probably going to plateau around 500 just to give you an order magnitude feel for that so that just kind of gives a little bit of scale so of course we're doing this for a purpose right our mission is to accelerate the science to help patients and so let me tell you about just two examples so multi parametric data analysis is an application that we had doing high throughput screening high content screening analysis on Prem it was a rich application used a lot of internal infrastructure and was at the limit of what it could do it really couldn't grow any further and that was bad enough and then basically the science came back and said well level analysis is great but really we need to do cell level analysis which is scientifically much more valid and that's now going to increase the scope of the amount of data that we're processing three three and a half orders of magnitude because now every well has got a few thousand cells in it and so the team we basically allocated these dev tests and fraud accounts to this team they rejected the application to be cloud native wrote a bunch of cloud formation and they were empowered to just operate and do and rebuild em PDA into this cloud environment and you can see we're processing you know per screen you could have trillions of observations depending on how many plates are in the screen so this is really exciting but scary and even more exciting is within the next year or so there is serious conversation about we actually want to do time series so ever for every plate we take an image of a plate of 1536 wells we actually want to do time series of perhaps 100 images over time so now we got two more orders of magnitude oh and by the way we also would like to do Z stacks so 10 images in the Z focal plane which is now another factor of 10 so
We're talking another three orders of magnitude that we're looking at adding within the next year year and a half and that's on top of what we already did so into your span we're talking about six six and a half orders of magnitude increase in data volume from from screening technologies which is really quite insane and we really so a good example of why we really need to be able to react to these things so this next example is the informatics lab example I'll show this in the form of a timeline so back in July UK biobank released half a million genomes with the associated health records which was an amazing and useful thing then the neo lab took that data and they did a genome-wide Association study and they published that out publicly and they did that very quickly then this just last October was just over a month now the in lab takes the neo lab data they add transcriptome information to that to produce an even richer data set and they published that publicly on s3 and that's where our story within Nara begins we have an implementation who's watching this and season thinks that's some really important data I could do something with that he already has access to an informatics lab account so basically in a day he downloads the data set from his s3 in two hours and then reformats it into a way they'll be easier to process in SPARC shortly after that he's published we've done some a preliminary analysis in the notebook we then couple days after we've got a bunch of informations with a notebook application that's published accessing this data which is now formatted and processed nicely and by october 25th we've already done enough analysis that we've taken a set of gene targets for about a dozen different diseases and we've distributed that to the drug discovery teams within niver is out looking at those doesn't those indications with those targets it's a elapsed time of nine days this environment ition didn't have to go and ask for provisioning request he didn't need us to build anything he had a lab he had a bench and he went out and just happens to be a dry bench and he went out and did some really exciting analysis and it's a good example of what we're trying to do with our architecture and with our cloud team and just the way we operate is we want to enable our users and get out of the way let the science happen at this pace that science can happen so these guys brilliant people but hopefully the environment made it easy for them to leverage that there's some other aspects of the way we've done things that had a little bit of surprising outcomes
We knew that moving to the cloud in a big way was going to be important to the organization and be well-received the way we did it was a little bit surprising in the reception we got you can see some of the excerpts of some emails that I get because this it really energized the invigorated our engineers and the implementations and everyone's using this environment because of the empowerment model is something people are very thirsty for and it changed the way that people did the change me it we did the change management way people had accepted the move to the cloud in a very rich sort of way and they were highly productive in people side doing side projects and it just was a surprising sort of enabler that we just got more results than we expected of course automation has efficiency benefits we've have some great internal teams cybersecurity or the information governance and network operations in particular and others all helped but the core cloud team are four people that managed those 170 accounts and those 400 users and that's going to keep on growing and of course we have one big TVH who has not been pulling their weight but we've got four people operating all of that which is a testament to the ability of just automation in general but our ability to shift to a model where instead of dealing with this sort of order end problem we can go to an order six problem just come up with six templates essentially that's a little bit oversimplified but you get the idea that we apply a template and let automation do all this work we define those policies that like Nathan was showing and then we can just let the machine do the work we don't need to have people in the loop so I've covered some of these tips and lessons learned and observations along the way but I have some others on this list here that I haven't been able to touch on and I'm somewhat long-winded
If you didn't notice and I really want to go through all of these and tell you all about this stuff I realize there's no way we're going to it's a whole nother talk so what I did have I did a blog post on this and so if you go to that link Ken Robbins that link slash reinvented state sends you straight to the particular post we're basically just expanded and what these bullet points are we can also talk about them during Q&A if we have any time left or certainly grab me and Nathan afterwards we can talk about these as well be great so basically the big take home here is if you're in an environment that has the same sort of high rate of change high diversity that we have we found that the multi account model is really powerful and that account as a container is enabling our users in a significant way but in order to do that you really need the automation but you can't just let that happen so if you have the that model and you automate it so you can keep the control then you actually can enable this high innovation high agility and reactive nature to the needs of the of the organization and the science and still not give up its maintain even better control real-time control compliance monitoring security it's actually better than anything we've done otherwise so we get not only can we keep up with it but we can do it even better this way and finally I'd like to thank inside Nyberg we had mentioned some of the teams or some more teams here the great collaborations from many different participants across the organization but also Toro bot in the Amazon essentially a surrogate members of our team so I'd like to thank them as well for all their support to build this environment with us and with that I think we have time for maybe a few questions and thanks everyone for your attention today and Nathan and I can.
If you need any assistance, let us know in our Slack community #guardrails channel. If you are new to Turbot, connect with us to learn more!