272 - Lessons learned with Azure Policy
Hey there, and welcome to another episode of Control Alt Azure. I'm Tobias, and I'm back with Jussi. What's up?
Jussi Roine:Hey, Tobias. It's 2025 now. I took some time off, worked a little bit here and there. But there was plenty of time to do fun stuff with the family. We mostly stayed at home.
Jussi Roine:Lots of snow, lots of nice dinners, really slow mornings. That was fun. On a personal note, I did hit a new personal record at the gym with bench press. And and perhaps it's a vanity goal, but that was something I realized this after hitting that record. That was something that I set out to achieve 5 years ago when I got started at the gym.
Jussi Roine:And now that that is done, I'm planning for new goals at the gym, obviously, but also throughout the year as as well to to do new exciting stuff and achieve a little bit more again than last year. What's up with you?
Tobias Zimmergren:So on my end, I just came back from a ski trip with a family. I think I mentioned that in in a previous episode. We had a really great time in the slopes. And personally, I'm proud that I, myself, who's not a skier and did not grow up skiing, I can now master all the open slopes in the system. The black slope was closed, so therefore it was easy to master the like semi advanced red ones.
Tobias Zimmergren:I'm a pretty quick learner. It was amazingly fun. I mostly use Snowblades, like the shorter type of ski that enables me to really have a lot more fun. I I could go jumping. I could do some one eighties.
Tobias Zimmergren:I fell down a little bit, but do some fun tricks along the way as well. That was really fun. But more importantly, the 7 year old made some really good progress. She's also never been skiing in her life. She went 3 days last year, same as me, and now for a week.
Tobias Zimmergren:And she's now sliding down the slopes herself. And we got some training to do with the 4 year old, but we'll definitely plan on another ski trip again. So this was a lot of fun for the family, a lot of good memories built up. And I can feel I'm recharged. I'm mentally recharged.
Tobias Zimmergren:I'm physically recharged. Except maybe for today after doing a lot of laundry and, you know, unpacking the car, getting back from the trip and all that stuff, 10 hour driving. But, yeah, it feels great.
Jussi Roine:That sounds awesome. Skiing, especially with the family, that's that's always nice. Today, we will be talking about lessons learned with Azure Policy. You might recall that we did an episode on Azure Policy at the general level during episode 25, which feels like it was 4 years ago, and it I think it was 4 years ago. And then we had Jesse Loudon as a guest to talk more in-depth on Azure Policies in Episode 109.
Jussi Roine:And all of the guidance from there is still very valid. And while the actual capability of Azure policies is very mature, And I've always felt that Azure policy is a little bit gray and boring, but it still is the cornerstone of any Azure architecture. So today, we'll talk about a couple of lessons learned in recent months on Azure Policy. But before we get to the lessons, Toby, how would you define what is Azure Policy, just in case somebody's listening and is not intimately familiar with Azure Policy?
Tobias Zimmergren:So without, like, diving into the details, because we can probably do a a new episode fully on Azure Policy and what it is and do the sales pitch, it's really a way for for you to bring governance to, you know, the resources and the technical platform you have. You know, up from the management group level and tenant level, you can say, in this subscription, in this tenant, we will only allow a specific type of resource to be deployed. It should be deployed this way. Here's to kind of patterns. Here's to rules.
Tobias Zimmergren:Here's to kind of governance policies we want to apply to specific types of resources. For example, if you wanna deploy things in a specific region only, if you wanna avoid certain regions, if you wanna disallow something, if you wanna limit the SKUs, you can choose. You know, Astro policy can help you do things like that. And there's a bunch of built in ones, and then you can customize and build your own and define your own kind of rules. So it's really a versatile tool to bring governance to your fingertips.
Tobias Zimmergren:That's how I would describe it if you did a napkin presentation or if we're in an elevator for 5 floors. That's pretty much, you know, how I would box it in.
Jussi Roine:That's a nice description. When I got working more on Azure Policy way back when, I always initially felt that Azure Policy is like a GPO for the cloud, GPO, the group policy objects you would have in Active Directory. And, well, it sort of is, but it does a lot of other things as well. So you can sort of deny and allow stuff that can or or shouldn't happen, but you can also do compliance reporting based on what you defined in your Azure policies. So with this out of the way, let's let's go through a couple of lessons with Azure policy.
Jussi Roine:And I've got one here. Let's debate a little bit on on this. Let me just set the scene here. The lesson for me was recently, what policies do we have? That was the question.
Jussi Roine:And there was a scenario where we inherited a new environment. Or perhaps somebody did some work on policies, but you don't really have proper documentation on that one. So you need to figure out what policies do we have, how are they deployed, where, when, by whom, and how. And then you also need to figure out if there's any custom policies and how those are managed. And maybe 1 or more of those custom policies are bypassing your built in policies.
Jussi Roine:And is there a reason for that? What if we enable a built in policy now? Would it break something else from custom policies or from your real services? So this was sort of the first big question I was asked. Jussi, can you have a look?
Jussi Roine:What policies do we have? And I thought, well, this will take me 2 and a half minutes. I'm done before lunch, and I can do a long lunch and a long coffee break and then do something else. But it was it was surprisingly tricky. I've got a couple of insights that I got from this.
Jussi Roine:We're gonna go through those in a little bit. But what sort of ideas are you getting from this if somebody were to ask you, hey, Toby. What policy do we have? How would you start approaching this problem?
Tobias Zimmergren:Yeah. I think, I think this is a great question. Right? Because I've this is something I used to do a little bit, and this is something I see customers doing as well. And like building an inventory is it's not always easy, especially if you have multiple tenants, or if you have multiple subscriptions.
Tobias Zimmergren:This can be a, a tricky thing. I've used something like AZ Advertiser, which you can get from AZ Advertiser dot net. It's a third party tool built by a Microsoft employee, but it's not kind of officially supported by Microsoft. I I just want to mention that as a disclaimer. That's a pretty good tool to understand, Azure policies and policy definitions and what exists and what does that look like.
Tobias Zimmergren:So that's been a good starting point for me. I've used in the past a little bit of KQL. You could query and say, hey, what do we have here? Use the Azure portal. But as soon as you kind of go beyond a certain boundary, if you have a single subscription or a couple of few smaller subscriptions, using the Azure portal is pretty easy.
Tobias Zimmergren:Right? But if you have a huge enterprise with multiple tenants or multiple, you know, 100 or thousands of subscriptions spread out, this becomes a little bit more tricky. So for me, like, my personal experience has been more with AZ advertiser. I've done some PowerShell, scripts to do inventory and stuff like that, but that's kind of the extent. I never did this for the, like, large enterprises with, you know, thousands of subscriptions.
Jussi Roine:Yeah. I I I fully agree on that one, especially if it's a smaller environment. Typically, if you go to Azure portal, then you select policy. You get the Azure policy compliance view. And from there, you can view all the policies.
Jussi Roine:It's relatively easy to to get the big picture from there. But as you said, if you have 300 subscriptions, it's relatively hard now to get the grasp of the big picture. With maybe 100 of or thousands of policies, the UI in Azure policy view becomes quite slow. So it starts paging, the stuff for you at least, maybe 50 policies at a time. Then you click next.
Jussi Roine:There's no page numbers. Then you get 50 more. And it's really hard to sort of try to build this mental image that, okay, I have 300 policies. How are they applied? At what level?
Jussi Roine:Are we applying them through management groups or to subscriptions or someplace else? And there's no line numbers in any of the views. So maybe you're seeing a list of policies that these policies are applied. Well, is it 25 policies? Is it 30 policies?
Jussi Roine:You have to count the rows visually on the screen or do some sort of scripting in there. The other sort of lesson as part of this what policies do we have is that you cannot assume that you're a global admin. So, obviously, in your home environment, you're always the global admin. You can do whatever. But perhaps you are a global reader or just reader, and you have to request for elevation of permissions to do stuff, and that might be time bombed.
Jussi Roine:So it adds this friction in trying to do stuff that you feel takes 2 minutes or 5 minutes. And if you cannot deploy 3rd party tools like as advertiser, which is super handy, if you cannot deploy that one, then you relatively quick quickly start thinking about building your custom tools or using something outside the environment, if you can connect to that remotely. Just a quick side note on this one, though. If you inherit an existing Azure environment, the Azure Resource Inventory, which is an open source tool, it's super handy, and it's it's available as a partial module now. So when you install ARRI, Azure Resource Inventory, you can just type invoke ARRI, and it goes through the whole environment and produces a really nice looking report.
Jussi Roine:What goes where? What subscriptions do you have? What VNets do you have? What VMs, what are the total resources. It doesn't really give you that much in policies, but at least, it is giving you the big picture, which you can then mentally apply to your policy views to try to understand what's where and how and why.
Jussi Roine:So that was the first lesson. The next one, you mentioned KQL already. And we've talked about KQL a couple of times in the past. What's your 2025 take on KQL as a query language today?
Tobias Zimmergren:That's, that's a really that's a really good question. And, you know, couple years back, I loved KQL. And then a few years after that, I disliked it. And then I loved it again. So it kind of depends on what you're doing.
Tobias Zimmergren:Right? It's it's a specific query language specifically for, you know, pulling out resources on Azure and figuring out what your Azure estate looks like. And, you know, it's connected to Azure Data Explorer, and you can use that, and you can run the kind of heavy queries and do heavy compute computational stuff with it. So what's my take on this as a query language? Well, you get learn you kind of learn it and get used to it, and then it kind of works.
Tobias Zimmergren:But I don't have any opinion on the actual language itself. The the asking the question I usually ask myself is will this get the job done? Like what is the purpose I'm what am I trying to achieve here? And what's the purpose of me trying to achieve this? Does it fulfill a business need?
Tobias Zimmergren:Is this where the business needs the time, invested? Is this what I need to be focusing on? And if so, great. We figure it out. This is a goal we have.
Tobias Zimmergren:This is a mission we have. This is something we need to work towards. How do I get there? And if KQL is that way, great. If it's something else, cool.
Tobias Zimmergren:I'll go with that. So I I don't have a strong opinion just like with if you would ask what's my favorite web browser? I don't care. Like, what's my favorite laptop? Same thing.
Tobias Zimmergren:I don't care. What's the goal that we need to achieve and what's the best way to get there? So my own opinions aside, let's go with the best option for the problem that we have. So that's my kind of political answer. I To that.
Jussi Roine:I admire your political approach in here. Because my my opinion here is that KQL is is it's horrible. It's painful to type. It's impossible to memorize. It's hard to quickly glance at a lengthy query to try to figure out what is this doing.
Jussi Roine:It's a little bit like with SQL, simple SQL statements, not a problem. But then you then you have a 2 pager, and you're like, this is almost like a functional program. Why aren't you using SQL for this? So for Azure policies, when I couldn't really grasp the big picture with hundreds of policies and hundreds of custom policies, I figured, let's use KQL. And what I had forgotten, and this might be obvious to somebody listening on this one, but what I had forgotten was that KQL comes in a couple of different variants.
Tobias Zimmergren:Mhmm. Yep.
Jussi Roine:There there's the one variant that you can run-in Resource Graph Explorer in Azure portal. And then there's the other one, the more beefy one that requires or dictates that you should use Data Explorer and spin up the compute clusters and really dive deep into that one. And those type of queries typically are quite lengthy to execute. So if you're just wanting to do some ad hoc stuff, you typically gravitate for Resource Graph Explorer. And what bite me here is that I used some of the sample queries from Microsoft on figuring out what Azure policies do I have, and none of them worked.
Jussi Roine:So Resource Graph Explorer was complaining about the syntax, but I was literally copying them from Microsoft Learn. And I was like, I'm I'm not doing a typo in here. There's no IntelliSense. And then it hit me. Oh, maybe this environment, I'm executing this, doesn't have the full language capabilities that I'm expecting it to have.
Jussi Roine:I went to Data Explorer, oh, they work in here. So again, you have KQL, but it's a different type of KQL depending on what your interface is. And that's what I actively dislike about KQL. Any any thoughts on this?
Tobias Zimmergren:No. I you know, a couple years back when I when I was operating, like, distributed, globally deployed, subscriptions and and tenants, This is something I stumble upon as well. So it it's it's a familiar problem. I have not worked a lot with KQL in detail since, so I would have expected that to become a bit more clear, but I hear the story remains kind of the same. It's a very powerful query language, but you have to know where to apply which query.
Tobias Zimmergren:Otherwise, you'll probably waste a little bit of time, on what you just did. Like, trying to execute queries here, But it's actually over here, you should run the query, but it's the same language or the same type of query that you're trying to execute.
Jussi Roine:Yeah. And then when the queries fail or the interface is giving you red text but not really telling you why it's failing, then you spend a lot of time trying to troubleshoot the syntax. And once you get it running, you cannot really be sure, am I still getting the stuff that I wanted? And you sort of have to redo the query again with this mindset. So once you've found the queries, typically, what you want to find is list all custom policies, all all the metadata on those.
Jussi Roine:List all built in policies, all the metadata on those. List all initiatives, meaning the sort of envelopes that pack together policies and apply them someplace. And also, list the compliance status. And obviously, you can see the compliance status in the policy view. But what's useful in here is to try to understand, do I have one policy affecting 15,000 resources and failing?
Jussi Roine:Or do I have 500 policies affecting a single resource and failing? Either way, the compliance status will be red. But it's important for you to understand, is it a big or a small problem for me to try to fix the compliance status to become green again? So once you have these queries, you export to Excel using the best format known to man, CSV files. And once you have them in Excel, then it becomes easier.
Jussi Roine:What I did try, GitHub Copilot now has the o one language model support. So what I did use was with o one, I could quite rapidly craft nice looking KQL queries that have mostly worked. I needed to tweak them a little bit in the Resource Graph Explorer. But before o one, everything I got from GitHub Copilots or from my local LLMs were broken queries, and they wouldn't work at all in me trying to figure out what's happening with the policies. So there's a little bit of a disconnect here with the graphical interface and figuring out the queries to get the same information, but get that to Excel so that I can really dive deep into the source dates.
Jussi Roine:So this was lesson number 2. I think we have one more lesson. What did you have in mind?
Tobias Zimmergren:So for me, one of the things that I've, you know, I've stepped on that mind a couple of times in production or in systems where, we wanted to kind of enforce specific policies, but we didn't test them out, or we didn't have a chance to test them out, or we thought we tested them out. The lesson number 3 for me would be start with audit mode. Always try to start with audit mode. So audit mode in Azure Policy, that's like a non enforcing evaluation mode, if you will. And this kind of allows you to assess and monitor the compliance, without really making any changes to your resources.
Tobias Zimmergren:So when a policy is set to audit mode, then Azure Policy will evaluate resources against that condition, and it will identify non compliant resources. But it will not block the creation of the non compliant resources, and it will not modify the existing ones. And I think that's the key point. You will discover them. You will identify them.
Tobias Zimmergren:You will kind of build visibility. So, it doesn't hinder you or enforce anything, but it monitors and brings visibility to, kind of the estate you have. And also says: hey, if you were to apply this policy, it you know, here's a bunch of things breaking compliance. That's good for you to know because then you can start working on that before you start breaking your production environments. So for me, that's you know, that was the number one thing.
Tobias Zimmergren:Whenever we deployed and developed, you know, bigger sets of policies, always go with audit mode. Right? And and don't just assume you can start enforcing things because there's a lot of things and a lot of kind of divisions in large enterprises as well that will, likely be impacted in one or another way. So why is this important? Well, obviously, enforcing a policy immediately can disrupt, disrupt your operations.
Tobias Zimmergren:And it can conflict with existing configurations. So kind of have to be mindful how and when you start enforcing versus just auditing. So the lesson learned here for me over the years is use audit mode for new policies to evaluate compliance without enforcing them. And then this will allow you to better understand the impact first. Then you can refine your policies as necessary if you see that, well, this is not gonna work.
Tobias Zimmergren:Or if we do this, our production is gonna stop. Or if we do this, we're gonna hinder the entire business from doing x, y, or zed. Then you can kind of assess and understand that impact. Then you can refine the policy, and then you can gradually start enforcing it. So you don't maybe you don't need to start at the top and say, hey, for the entire tenant or the, you know, root management group, let's enforce this entire thing.
Tobias Zimmergren:You might wanna do it for specific divisions that might be more mature or in areas where you see a lot of problems, but you don't wanna cause disruptions for the rest of the business. So that's my, probably lesson number 3, that's my best tip. Start with audit mode to build an understanding of the impact of applying the policies and get some monitoring on this. You know, and and kind of assess is this the right policy? Is it configured the right way?
Tobias Zimmergren:Is it gonna help us configure our Azure estate the best way and the right way in a compliant way that we want? And then build your understanding of that audit mode. When you've done that, you can switch that and say, okay, now we know here's what's gonna happen. Here's the impact. This is, you know, we're doing an assessment now and we can see this is what the landscape will look like if we deploy these things.
Tobias Zimmergren:And then you can switch. So that's my number 3. Always go with audit mode first. Because I did do like I this is a lesson I learned in production where we had a globally distributed system operated across the globe. And one of the things we we needed to do was enforce specific regions or rather disallow certain regions, disallow certain SKUs to be deployed while still enabling kind of the DevOps mindset, like developers and dev divisions.
Tobias Zimmergren:And, you know, folks should still be able to deploy certain types of things and run their CICD pipelines and and stuff like that, but we still had to restrict a bunch of things. And in doing so, we also realized that we kind of just started enforcing something, and then we didn't realize what the full impact of that was until a couple of days later when we had a bunch of problems deploying other things we did not expect. So that's my, lesson learned. Start with audit mode, you know, assess the landscape, understand the impact, and then take it from there.
Jussi Roine:I I I like this. And what I'm recalling now is that as part of the cloud adoption framework, there's some additional capabilities beyond audit mode, stuff like, deploy if it not exists and modify. So regardless of which mode you're planning on deploying your policies, start with audit mode. Otherwise, if somebody's deploying custom policies over a pipeline, for example, it might take several minutes to deploy. And if you now have deny or deploy if it not exists or modify or something similar, then Azure will trap those deployments and go, hold on.
Jussi Roine:I'm seeing a policy that doesn't do x, but the policy is still being deployed and evaluated. But now now you have Azure picking it up, in the middle of that process, starting to modify stuff for you, and everything will break. So audit mode definitely is going to be your friend. So the tools that we mentioned, the as advertiser, Azure resource inventory, and a couple of sample KQL queries, they will be in the show notes. Have a look at those.
Jussi Roine:This was fun with Azure policies. I'm happy I got this done. I don't have to spend that many hours with Azure policies any longer in the next couple of weeks. Perhaps I will get to enjoy them later as well. And the last bit is the unexpected question.
Jussi Roine:It is 2025. Toby, I have a question for you. Are you ready?
Tobias Zimmergren:Let's go.
Jussi Roine:If your life were a sitcom, what would it be called, and what's the theme song?
Tobias Zimmergren:Okay. Well, based on my recent experience and coming back from the ski trip, this would probably be something like The Chronicles of Laundry, which is probably a comedy drama about the never ending laundry battles, probably playing something like Eye of the Tiger in the background because that's what I had on my Spotify today as well, yesterday, as I did 6 basket of laundries, or 6 machines. So at least right now, after getting back from the ski trip, that's my life. So The Chronicles of Laundry, with Tobias.
Jussi Roine:Yeah. I can I can feel it as well? We do laundry quite often at home as well, and that's not the tricky bits. But then somebody needs to hang them to dry at 9 o'clock, and somebody needs to collect the dry clothes the next day and fold them neatly and divide them between the kids' bedrooms and and wardrobes. And it seems like it's never ending.
Jussi Roine:So, yeah, I would I would definitely join on on that sitcom as well. Alrighty. Thanks for tuning in. See you next week.
Tobias Zimmergren:Alright. See you then.