Case Study

Closing the skills gap - accelerating governance with cloud automation

Learn how software defined operations can deliver agility and ensure compliance.

Turbot Team
5 min. read - Sep 17, 2017
Learn how software defined operations can deliver agility and ensure compliance.

Disclaimer: Automated Transcript

Welcome to cloud leaders the next session in our virtual summit is closing the skills gap accelerating governance with cloud automation and we will hear from Nathan Wallace founder and CEO of turbot Nathan will be able to answer your questions at the end of this session but please submit them throughout the presentation so we are ready to answer them as soon as the presentation is done the session will be recorded and available shortly after it ends we have several other webinars today as part of our cloud leaders virtual summit look for the link to the next session in the attachments and links section of this webinar.

Let's hear from Nathan thanks and it's great to be here today and continue our very close partnership with cloud Cheka and so but um at about what we do is we work with many different enterprises ranging from highly agile development shops with the heavy web presence through the large pharmaceuticals working in a highly regulated English industries and we spend all of our time thinking about how to do government governance and cloud automation.

But simply that boils down to basically three questions how can we achieve agility for application teams how can we ensure control for enterprise and how can we accelerate that whole cycle helping those application seems to move faster and faster while keeping better and better controls in the environment at the end of the day we just really want to help them solve those business problems while keeping things safe you know to do that governance in the cloud we actually have a number of unique challenges and cloud SIA where you have to think about those and redesign their traditional operating thoughts or the way we've run into prizes to sit better and work better in a cloud environment from the agility point of view the first thing we have to accept is that the cloud is one of the most incredible innovation engines we've ever seen Amazon's doing thousand new features a year as is bringing out services to try and catch up with Amazon Google's doing their own thing to try and compete with all that as well and as it as we think about that for our enterprise and unleashing that we have to accept that we can't compete with it we can't abstract it instead we have to ride that innovation engine and use it to our advantage the second thing we need to understand is that there's a change of power applications net control their infrastructure it's no longer a place where applications are begging for infrastructure from shared resources this is a world where business teams have their own budgets and they know that there's unlimited capacity available on demand those applications control their infrastructure they create storage they create servers they scale up the scale down they do lambda they do all sorts of things to manage your own infrastructure in real time that means we can no longer take requests and fulfill them instead we have to work out how to enable and accelerate those application teams to handle those new responsibilities.

We have to teach them how to fish from a control point of view expectations have just gone through the roof everything must be encrypted logging must be enabled everywhere everything that we thought we wanted to do in our traditional data centers we now are required and expected to do in the cloud environment and what's still if we make a mistake there it's highly visible highly public and worse or more difficult than previous mistakes that we could perhaps cover up bottom line is we can't afford to miss on our keeping our controls in place now when we combine that software-defined infrastructure being created and managed in real time with the fact that now we have to keep it in control we really have to move to a world a software-defined operations we've got to be real-time in our control to keep up with that level of flexibility and automation to do that type of automation though we need best practices we need clear consistent models we have to understand how we're going to do at any enactus we have to understand how we're going to do networking how we're going to operating systems every single one of those things is required to go to five minutes server in place what's the networking what's the security group what's the operating system how our users going to log into that every single step there needs to be automated or you have put manual steps in and broken this flexibility of the cloud to do that architectures

We need to think about the whole we can't follow it out anymore and finally were the best practice point of view the really thing that's very interesting is well you give all the power to these application teams all of a sudden they stop avoiding us and start asking for help they want our help to know how to do these things best in the cloud and that completely changes the nature of our CEO where well you're our services where we used to be trying to basically provide certain services to people in a defined way now it's more about collaborating with them learning from them sharing that knowledge back across the organization working through this at different organizations had survived we found that one of the most critical things you need to do is start to isolate your different workloads by using those different Amazon account boundaries or Azure subscription boundaries or GCP project boundaries to your advantage truly separating your workloads like their day is we just have a physical server we'd share we move to virtual machines this is same for a data center we're moving to these mini virtual data centers for each application.

Full network isolation full identity and access isolation breaking up that environment into pieces that isolation is a critical concept for us to be able to deliver the level of self service to these teams and empower them with access without putting everybody else at risk now that's type of isolation it's hard to imagine unless you have full automation of the stack doing that by hand is untenable doing it with automation actually it's employment provides a wealth of benefit to your organization as you start to get that protection separation cost control ownership in your in self-service empowerment so tell what we work with organizations to help them make that change right help them change the way they do this operations help them achieve that isolation and the way we do that is really what we call software-defined operations best way to think about so what we think of it as like a droid that's part of their cloud team you're really helping to enable and accelerate that organization and the fundamental concepts are similar to the problems we've stated ultimately you have to give those application teams self-service that means direct access to the Amazon console the Azure console the GCP console direct the use of API they need the ability to manage their own infrastructure and have their applications manage its own infrastructure we can't enforce them to go through templates we can't force them to go through manual requests they need direct and continuous access but the couple with that to make sure we keep it safe we need automated controls over that environment so when they make those changes in real-time we're ensuring that they're up to standard that we require as an organization so one of the things turbo does is watch continuous that environment and make those Corrections according to the policies that are set for the environment at the same time and that becomes a huge accelerator for the cloud thing because they're no longer firefighting

They're now working with the application teams knowing that their base level of stuff is automated and under control key to that is actually going beyond texting but also getting to the point of continuous automation and definition of best practices what are the eight standard subnet types that should exist in a network and how can we are automatically enforce and keep them in control what a standard security groups how should identity and access work there are 2500 permissions in Amazon how do we simplify that down to a language that we can understand and automate that is what software-defined operations looks like different people do it in their different ways whether it's a combination of some lambda or you some standards or tools that survive we provide that through a software product and try to really accelerate that out what I was going to do now is to take a look through what does Automation look like in real time so if it's all at what we do is the first thing is basically separate your workloads into different accounts that's the full isolation model we spoke about each one of these is the separate Amazon account for example some of them have as your subscriptions associated with them breaking up those workloads or applications into their own environment giving them the safety of that isolation and users can use is the one two three or five accounts they haven't have access to I can go into a cloud checker demo one we're using today users are encouraged to immediately log in to the Amazon console this is like Google we don't want to waste time looking at the results we want to get to the end and do our work so what feels the same way once they do that they just use the console in standard ways like you'd expect for example you know we can create ourselves a bucket nothing special they're exciting about that process other than winks and creating a bucket what's the interesting knows what's about to happen

We just created a bucket and now we can go through the process of looking at that and see what has to be done in real time we can notice I did nothing about version I did nothing of the logging I did nothing about tagging but all these things are already done in that environment in real time by survive it's set up all the best practice tags flowing through from different settings in the environment according to policy it's turned on logging pointing at a special logging bucket it's so but as other previously set up to make sure it's read-only writable by the correct people etc that whole environment is basically instantly a fourth according to the policies we have set now fix it back to the turbo console we can have a look at what's happening in this account and if we go to the activity tab we'll see in a moment that Joe was already detected that we created that new s3 bucket and it knows that it was Nathan that's all tied back to our identity and access to an Active Directory system if we look at that we can actually see the history here for the bucket we can see that it was created and what information we gave it at the time for this bucket so what provides a whole summary of all the different in from available including activity of everything that just happened so if we look we can see Nathan created a bucket at 4:55 in the same minute so about then raised alarms and checked a whole bunch of things cross account access DNS compliant naming logging was not turned on did it have the correct tags it went through and checked all those things and then it actually remediated them in real time and automatically closed the alarms what that means for us is we know we're in control of how this bucket is working and what's happening if we look at the tag for example we can see here the three events the one that arrived from Amazon these are highly detailed with all the information of the Amazon event that triggered it from cloud watch etc right and then with all the actions that were taken to actually repair and set off that environment all of that's done in concert with the policies interval defining how you want buckets etc to work this is just as three there's hundreds of these across all the different services if you think about it you've got a support land

Yeah IAM rolled all sorts of services for your internal customers and building controls across all of those is what's required to really get to that level of automation so from that several point of view we've detected that bucket and automated all those things in real time now if we think about another thing that's quite difficult to manage and we can automate out with more control its permissions identity and access permissions are very complex to manage in these environments Amazon has 2,500 submissions like I said they add new ones every week that's one of the things we do is look at those as you has completely different way of defining its permissions but you need to come up with a common language for how to talk about things like permissions or if it's networking a common language for networking you need a way to think about these as an organization so you don't spend all your life in Jason and tell what we do it serve what is we boil all of those permissions down to fairly standardized sets for example certificate manager metadata operator admin core services metadata read-only operator admin so basically we manage each of those things in their own unique way allowing you to give simplified permissions I can then search my Active Directory to grant those permissions to somebody one of the ones we are offering so what is actually classical permissions which will automatically synchronize up to the website so I can grant the permission like that we can also grant permissions and so it for things like Amazon so if we give Cody a permission like AWS operator that allows Tech operator level action so but allows you to do automation of the expiration of those permissions as well so once we set those different things up we can actually have a look in the environment to see how that changed stuff so if we come to the UM

If we come to the class checker website we can actually look in our settings for example that our users see someone like Raj and with that change sowhat's automatically synchronized up all the permissions for us to have read-only just to that one Amazon account right making those permissions much easier to manage than having to hand manage it in all the different environments in addition to what is to synchronize those through to Amazon I am for example setting up the user we gave permission to Cody in this case making sure that he had that operator level permission you'll notice an environment that basically one of the things that so what's doing it it automatically created appropriate groups and giving those policies so someone has created simplified standardized model names for all these things so we have a consistent language for how to talk about or refer to these different things in the environment and see how they're working it's not abstracting the cloud it's working within it giving users the ability to see those things we do a similar thing across networking right again these things are all very important how do you get to VPC how do you make sure you have standard names for all of your subnets how do you make sure that peering is automated between all those environments with good names you know accept reject etc doing route tables through that whole system so those millions of Myriad's of items you've got you start to think about automating as you get those things under control more and more your environment

You can move with speed and agility within those standard models that you've created the other thing that's exciting about that is then you can start to actually search for or find resources that you've got in the environment so into what we allow you for example to look for the bucket we just created before and search across that different environments for those different resources see the history of what's going on see how things have changed we can see a history here for example where we just granted Cody permission in that account and we can immediately see in the activity history that he was added to the user group and added to the operator group and so what actually tracks you know those full changes in the environment of what's going on so if we come back and think about what does that mean when we have that degree of automation in the environment the first thing is for that type of automation you've got to start to think how does that impact my environment and what does that how does that change the way we work in cloud right and there's a couple of things that are absolutely true here

First I mean it's very obvious that with that type of automation you're provisioning time goes down you're time to detect a problem in the environment with literally seconds it happens faster than we could cooked between the screens right and their time to correct was also seconds it was done before we even got there so you're talking about real-time detection and correction right comparing that to a traditional ticket that we might open in our environment and think about taking days or hours to get resolved is you know just such a stark contrast and gives teams so much agility onboarding time can be massively reduced think about the power of having that direct access to a console people with skills on knowledge of how to use Amazon or Azure can immediately use that they don't have to be trained in a myriad of internal processes project meetings and follow up that we might require if we have manual based processes for that so there's a whole bunch of basic metrics we can do to think about just how much advantage we've got out of that automation but we had said what actually believed that the cultural change that happens is even more important the way that their support and your operation of your environment Changez is even more important important and to give a couple of examples there the first question you want teams always asking is how can I remove this manual step if I had a server and I present it and everything was instant but getting my users permission on to that server take a day

I just lost my agility if I every manual step that stands between a team and their use of the cloud is a reason they can't automate it's a reason they have to be manual and it just snowballs out to more and more manual steps every manual step in has project management that project managers then have to have not project manager that's facing them and they argued to try and agree with timelines and we chase each other around every time we kill that we get to the point where we're flowing faster and faster another thing we love to see and work with our customers on is the idea of how do I kill this ticket when a user has a problem when they have an issue in their environment or an automation problem the questions not how do I close this ticket that's important but what's more important is how do I get rid of this ticket so I never see it again what automation can I put in place so that we never have this issue we never have this bad data we never have problem occur in our environment again that's a really important thing because once you do that you move faster and faster because you're not bogged down in the simple stuff the next thing we believe is important is having a shared language a common model for how to talk about things if I can say this is a DMZ subnet and you know exactly what I mean it has a public IP address it has an Internet gateway that's a powerful thing that's different from for example a direct subnet to might have direct internet and access through a gateway but does not have inbound Internet access because it doesn't have open access to an Internet gateway if I can talk about a web security group and we both know that that means 80 and 443 or an app security group which we push which means 88 443 upwards then we are in the common language and we can move much faster project reviews moves faster we move with more confidence and we know how to automate and completely control each of those items dannion access is simplified down to talking about operator rights versus admin rights that's a faster higher bandwidth conversation than talking about JSON files and reviewing them in details the last thing we think that happens once you get this little or software-defined operations and isolation of those teams is that you can actually have clear responsibilities to things application teams own their application they're responsible for their budget which is clearly reported through that environment and available to them to see what decisions they want to make at the same time we don't care if they run a hundred servers or one a cloud center of excellence only cares that they have every one of those service backed up cached and that they're not wasting resources

So we're no longer in the discussion about capacity those things have melted away we're now in the discussion about requirements and our ability to deliver and whether we're meeting this basic controls will requires an organization with that shift in the conversation with that different approach to software to find operations coupled with our stuff let's find infrastructure we can start to radically collapse and close this skills gap this challenge of how do we actually find the right resources in cloud and how do we actually deliver on all the projects that organizations have to do the first and most fundamental win is that if you bring in a bunch of predefined automation so it's a great example we believe that has hundreds and hundreds of those preset doing best practices that's a whole bunch of undifferentiated heavy lifting you can completely avoid as an organization allowing you to go from a few good experiments right or hand built things to massive scale almost immediately

The other thing that's really really interesting on the skills gap is basically how it changes the game to your team's internally and let us talk about these in order the first obvious benefit of automation is time your clouds here we get a lot of time because they don't have to triage a thousand s3 buckets right and whether they're set up and managed correctly they don't have to review every project coming in they know their controls are automated similarly the applique isn't wasting time dealing with internal processes reviews setup or learning new things that aren't relevant they know that cloud skills they know their application and they can come straight in and do this the second benefit of that time is where it's elevated up we're not talking about undifferentiated heavy lifting anymore we're not talking about simple things like encryption we're now talking about our application how they work and how we want to automate them at a higher level in the environment with that elevation comes alignment I'm no longer have a cloud C or E or a central service team accepting requests from an application team and then badges to fulfill them right we're now in an environment where the application team is empowered they can make get the resources they need similarly the cloud CoA and those experts in security networking cloud operations are also empowered to think about policy for the organization and how they want to enforce that that creates a strong alignment we both now want the application to run we both now have nothing blocking us we have no capacity constraints etc the team that wants the power is enabled with that power the team that needs controls has those controls predefined and now it's in our interest to work out how to make that happen together if teams come with new things they need that's an opportunity for the cloud seems to work with them learn and bring that back and accelerate it back out to the rest of the organization what we see happen in organizations where we work and we have turbot running really as part of that cloud team as a member of the team

We see this flywheel start to turn where the application teams get very excited engaged because they have this capability the cloud theory and those operations folks are excited because they're no longer stuck in firefighting mode right and it makes the basics work and now they're working together on what the problems that business wants and that drives so much engagement and they get excited about it the project's come here and the whole thing flows around and of course the more automation you get the more time you create which allows you to create more automation and it becomes a virtuous cycle so in closing the thought I want to leave you with it's basically that as you move to cloud think about the power of that software-defined infrastructure think about the importance of giving that power directly to those application seams and understand that to be able we need to do that to give them that agility but coupled with that we have to have the real-time control we need that software-defined operation and with that combination we can create a virtuous cycle where we're really working differently we're helping each other and changing the way that works for the organization which collapses the problems we used to have around wasted time on project management wasted time on firefighting you wasted time you know blaming each other for pasady or requirement or fulfillment and instead we're left with the water level lower trying to get this stuff done together so if there's any questions I'd love to answer those now thank you very much Nathan that was great very informative and really makes you think about this new way of getting work done so let's look through the questions you can still submit your questions using the questions tab and then you're on your screen and we have a couple already what is the biggest challenge in terms of finding the right skills to build a cloud center of excellence absolutely it can be really hard to find the right people cloud skills are like henze they're also highly sought after so it's very difficult to hire people into this role even more difficulties when you're an organization like a pharmaceutical or a financial company or your other organizations

If people want to work on the front end applications hiring people straight into managing the backend of that cloud it's not as exciting and doesn't always have the same rewards so what we see is basic by bringing in that level of automation you can drastically close off and accelerate your team's ability to cover all the basics you need for cloud and then coupling that with that new relationship that forms between truly what is now a cloud center of excellence rather than a you know icy service delivery team really then accelerate but that makes it more attractive that makes the environment more dynamic so it basically reduces an amount of skill you need to get going and it makes it more attractive for the skills afterwards great thanks Nathan all right another question what are the advantages of account isolation with regards to cost control yes absolutely so with some obviously people using classic or often have process behind us do all organizations and one of the things that's really interesting about that cost reporting is separation of those costs it can be very difficult with tagging and stuff and we have a lot of schemes to help with that breakup but the huge advantage of that cost separation comes once you really embrace a multi account model Amazon for example is built on the idea of multi-tenancy if this actually built us a hotel now if we try to put everyone together in one of those hotel rooms we're just trying to we're just making a mess we've either been got a share house that's gone bad and gets messy or we've got a basically were to hold such a formal hosted party to keep that under control instead we can use that isolation and break us up into those multiple accounts giving each person their own room in the hotel so to speak now what we do that fork up control I know everything in that room isn't mine I know that budget is fine we know how much people keep take care of their own house relative to shared spaces this is a natural thing that happens and once you give them that separation it flows through to cut their full responsibility a fun story I enjoyed back in the day I had a team that came to assess it out and three bill is $70 this is ridiculous yeah $70 in enterprise context the meeting costs more than they than the actual $70 but what was fascinating about that was we went well okay let's look it turns out they were doing hundreds of millions of requests to these static websites abed credit industry

It turns out that was happening because they had a bug in their mobile code that was affecting hundreds and hundreds of users in the organization sending access requests draining batteries that's 70 dollar bill led to a fix that benefits hundreds and hundreds of people and he sat with initiate you would never got that level of his ability well good story thank you okay I think we're running up against the clock at we have another webinar scheduled in 15 minutes so I want to thank Nathan and I want to thank turbot and I want to thank everybody for joining us to to I hope to see everybody in 15 minutes look for the link to the next webinar in the attachments and links section of this webinar or visit cloud leaders dot IO thanks again Nathan turbot and thank you everyone for attending thank you.

If you need any assistance, let us know in our Slack community #guardrails channel. If you are new to Turbot, connect with us to learn more!