Making sense of the Power BI Premium Gen2 Metrics App

Yooo! Adam Saxton with Guy in a Cube and in this video, I am gonna demystify the Premium Gen2 metrics app. I know a lot of people have been waiting for this, let’s do it. If you’re finding us for the first time, be sure to hit that Subscribe button, to stay up to date with all the videos from both Patrick and myself.

All right, Power BI Premium Gen2 metrics app.

It’s had a few updates since it first came out, and I think it’s at a point now where I need to show you what it’s all about and tell you how you can actually use this to help you with managing your premium capacity. If you’re new to it, it may not make a lot of sense, so I wanna help you level up your skills in that area and try to answer your questions. All right, enough of all this talking, you know how we’d like to do it here in “Guy in a Cube”. Let’s do what? Let’s head over to my machine.

First thing we need to do before we do anything else is we need to install the app. So we have to go to Apps and then we’ll go to Get Apps and search for premium.

All right, you’re gonna see two items here, this is the Gen1 metrics app, so it has a gray background, Gen1. This is the new one, this is the Gen2 app, so let’s go grab that, we’ll say Get It Now. All right, we’ve got our information entered, we’ll hit Continue and it will go ahead and ask us a few things.

So first off we wanna say, yep, we want to install this app, and what it’s going to do is it’s going to install and create a workspace for you. There’s a couple of things we need to do though before we get going here. So you’ll notice there’s a new app here, so this Premium Capacity Utilization Metrics, and it’s got a timestamp on it. We need to first go to the report itself. So let’s go in there.

It’s gonna load the report and we’re gonna have to answer a few questions here.

We’ll say Connect or we’ll enter our capacity ID, UTC_offset, so this is just where you’re located, right? For me, it’s the great state of Texas, so Central Time. So I’m gonna put negative five. Then there’s a couple of things here, and this isn’t necessarily obvious to folks.

There is a scroll bar here, so if you scroll down, it’s gonna ask you for TimePoint and TimePoint2. These, you can put any date that you want, right? So this doesn’t need to be anything specific. These are parameters that the app uses to manage the range of what it’s going to be looking at, and so it’ll hijack these as part of the app, you don’t need to worry about it. So I’m just gonna put in a date here of December 1st, and I’m gonna use the same date for both.

There’s also an Advanced section, but you don’t necessarily need to worry about that, just leave it on automatic refresh, that’s good, and we’ll hit Next.

It’s going to ask us to connect. There’s two of these, so we’re just going to hit Sign in and continue. Make sure privacy level says Organizational. We’re going to do it again.

All right, we are done. We’re going to go back out to the Capacity, and you’ll notice here that the dataset is actually refreshing, which is good, ’cause this needs to refresh for the first time, that’s going through, and then if we go back into the report, first off, ignore this message up here, you need to select the capacity in your environment. There’s a couple things that are going on here. From an import perspective, so this is where the schedule refresh comes into play, there’s metadata that gets imported. So, it’s the list of capacities and list of artifacts in your environment.

That’s all that’s imported. The actual metrics themselves are using direct query against the Kusto Data Explorer store in the backend from a telemetry perspective.

So the actual visuals, they may seem a little bit slow, that’s because we’re using direct query against the really big telemetry database that it’s going to get your capacity information from. So just be aware of that, it’s actually close to real-time in terms of the telemetry itself as it stands right now. The other thing I winna call out to your attention, just remember that this video is being recorded as of December 2021, it’s actually like mid-December.

There was an update that just came out, which is why I’m doing this video, but just know that there’s going to be other updates as we go, and so things may start to change a little bit. Once it becomes a big diversion from what’s going on, I’ll do another video just to recap what’s going on.

All right, back to my machine. All right, so now we can see information, this is amazing. I winna kind of guide you in a journey here.

The first thing you should be paying attention to is this overload minutes per hour visual.

This is your red flag visual, and what I mean by red flag is if you’re seeing information here, that’s a warning sign for your capacity. Overload means you went over your CPU limits for your given capacity. So, each capacity, depending on the SKU you have, so if it’s a P1, you have a certain amount of what’s referred to as CPU seconds that you can have within a 30-second block. This means you went over that.

If you have overload minutes, one of two things are going to happen, either one, you’re going to get throttled, which means reports are going to start looking sluggish and it’s going to not perform as great as you would like it to perform.

The other thing that could happen if you have auto scale enabled, this is where auto scale will start to kick in, if the system deems like, “Hey, we need that extra-CPU core to satisfy the overload itself.” And so that’s all based on the actual timing, I don’t know what the actual algorithm is but based on the 30-second windows that it’s looking at, it will trigger whether or not we need to go grab a quart if auto scale is enabled. So be aware of that, that that is your sign that auto scale would have kicked in if you had it enabled. So, this is what was there before the update that happened recently.

Now, what this doesn’t tell you is how close did I get to the overload? All this says is I overloaded, and so it doesn’t necessarily help you in terms of capacity planning, but it does tell you whether or not you’re potentially in crisis mode.

If you see occasional spikes here, that may not be a bad thing, right? So, you need to judge that, what are your users saying? Are there actual perceived slowdowns?

It may just be a blip, and no one really noticed and you’re fine. If you see a consistent pattern of overload here, that is a clear sign that you need to scale up your capacity, right? So, your capacity cannot handle the load, or you’re going to optimize some things or maybe move some workspaces to other capacities, right? You’re in a crisis mode if it’s consistently overloaded. So, you need to pay attention to that, you need to act on that.

But again, if it’s just an occasional blip, it may be okay, or you may know, like, hey, this was end of month processing or end of quarter or end of year processing. I know we were getting hit hard and so maybe that’s a great use case for enabling autoscale, right? You wanna handle that known blip that’s gonna come up and satisfy that load just for that momentary period of time. The other thing we can see here is the datasets that are there and what information is present.

I will come back to some of these other visuals in a minute, but one thing you can do here is we know that Phil’s test dataset is our problem dataset, that’s the one that’s taking all of the information.

We can see here from a CPU second perspective, it’s got the largest number of CPU seconds, the duration’s also the longest, you can also see a column called overload minutes. So, if we know that we Overloaded, you can tell here which dataset is the one causing the overload, right? So, it’s a way to break down and kind of see what’s going on.

And then you can expand that a little bit, again, direct query, so this could take a second or two. And then we can see here, it’s just a query breakout for a dataset.

The other thing you can see here is this performance profile. You can just kind of see, hey, in general, we’re pretty fast, hey, we kind of slowed down a little bit here, kind of an indicator of how we’re doing from a performance perspective. And then up here, we can actually slice, and we say, Wednesday was this big spike, and we can filter down the other visuals to see like, hey, what’s actually going on that day? If we winna drill in a little bit.

What’s the one thing you’re not seeing here?

Memory, right? So, memory, this is a big thing, I’ve had a discussion with folks, because Gen2 changed things from an architecture perspective for Premium, memory’s not really the thing we care about so much with Gen2, it’s all about CPU for the most part. And so memory shifted from an override capacity limit to an individual dataset limit or artifact limit. And so we care about memory in the context of a single dataset, but from an all up capacity perspective, there’s not a hard limit from an overall capacity perspective. There is a CPU limit, and so that’s why this report is really focused on CPU and not memory.

That’s a shift and something that you need to kind of change your memory context to understand that memory is not really the gate at this point, it’s CPU, it’s all CPU.

Another question I’ve had, a relation to the CPU is like, “Well, is this the front-end cores or the backend cores?” ‘Cause that’s something that’s documented in terms of Premium. This is all about the backend cores. So, the backend cores are the ones that have the limit on it, and that’s where all your dataset processing, that’s where all the hard crunching happens is on the backend cores itself.

Now, I said memory is not necessarily the thing you care about, but one thing you do winna know is this Artifact size visual, it has a mid-line where it says Dataset refresh limit.

And so, each dataset is bound to the memory limit of the given capacity, so for P1, it’s 25 gigs per dataset, right? So each dataset has a 25-gig limit, the red line indicates that SKU limit, So in this case, this is the P1, so it’s a 25-gig limit and then the yellow line, that dataset refresh limit is the halfway mark.

So if you’re doing a full refresh, and it’s 11, 12 gigs, you’re bumping right up against the limit, because it’s gonna duplicate that dataset to do the refresh. And so there’s like a shadow dataset that’s there for the refresh, and then it frees up, so that’s something to be aware of as you could hit the individual dataset memory limit if you’re doing a full refresh on a pretty large dataset.

So be aware of that, and this will give you some kind of indication of that. So, a couple other things we can do here on this dataset, maybe we winna right click and we want to drill through into the artifact details or the refresh details. That’s something we can go do.

The other thing we can do is we can go to the Evidence page, and we can see about overloading and like what’s actually going on. So, if you winna investigate a little bit on the overloading, this will help you understand the actual overloading that’s happening, it can help you drill down a little bit more.

But now, I winna show you, this was the recent update, and this is what I love about this app.

This is what’s gonna help you understand what’s going on before you ever hit that overload. This is the thing, this visual up here. You’re gonna see background CPU, interactive CPU, and then CPU limit. Let me go and expand this out so we can see it, so now what we’ve got the CPU limit, I’m on a P1, right?

So, each capacity has a certain number, this is defined as CPU seconds of what your limit is, and that’s from a 30-second block. So, if I hover over this, you can see that my CPU limit is 120 CPU seconds per-30-second window for a P1. And now what you can see here is what happened on this. So red is interactive CPU, so these are reports people are interacting with, this isn’t like a refresh in the background. If you did refreshes, you’d see some blue lines here, but now we can see, hey, we overloaded, and then here, oh, we got pretty close to the line but we didn’t actually overload, so now I can go investigate, hey, if I did have a spike that was kinda close, but it didn’t actually overload, now I can actually see what that was and go investigate what it is.

Right click on the line, you can do drill through to time point detail. So, what this does, is for that given minute and second, you can actually go see what was causing the activity in this block. All right, so from this perspective, this red line is where we are right now, and the other red line we can see is the CPU limit, and what we can do here is we can actually see, okay, what was going on in this time slice? And we can see how many seconds it was taking up.

We’ve got our CPU limit here to remind you, this is our SKU that we’re on, and from here, we can see these are all the items, the durations and the CPU seconds that contributed to the total, and we can see here, the total for this time block is 296 CPU seconds, which is clearly over 120.

Okay, so what was doing it? And there were a couple of queries here against this given dataset, so that you can know where you need to go optimize or do something. This is the key for where we need to go spend our activity to optimize the capacity so that it’s working efficiently. So even if you didn’t cross that limit line, you can actually go see, hey, there’s one dataset out there, that’s really taking a lot of the resources, and we need to go do something about that, whether that’s optimizing the model at all, which is my main primary recommendation, it takes a little bit of time, or maybe moving that workspace off to another capacity to kind of isolated a little bit so it’s not impacting other workloads on this capacity.

So, you’d also see background operations too if there were refreshes going on to see how that contributes to the picture as well.

The other thing that’s cool here as we can see, hey, we’re at this time point, but there was also this spike over here, so maybe I just winna right click there and go to that point and time as well. And so, we can jump around and go see what’s going on and drill into, what is my capacity doing? What dataset is actually causing the problem? What workspace is that dataset assigned to? And then I can go talk to the people that own that workspace and dataset.

Here we go, this one was 237 CPU seconds, right? Clearly, over the 120 CPU seconds. So this is really how I can go dig in. So when you look at these artifact names, this Phil test dataset YR_Max, that is the workspace name. And then we’ve got our Dataset, that’s the artifact type and then the actual dataset name itself, which is called Load Test.

So, let’s go back to the overview, I just winna show you a few more things here.

So, we can see other items here, so if I go to the Operations tab, we can see different artifacts that are here. So in this case, I’ve only got dataset artifacts, but you’d be able to see if you’re running paginated reports, data flows, things of that nature, that would show up differently in this line, so you can see which ones are consuming what. So in this case, if I hover over, this green is datasets, and so you’d see different colors for different workload types. So you can get a feel for what’s consuming the CPU the most in that given time or or window.

And we can select Wednesday, again, this will filter down the other visuals just like a Power BI report, it’s amazing, right? So we can go see where did this stuff happen? I can drill down so I can go into given items a little deeper into Wednesday, right? So these are the different hours from a Wednesday perspective, and then we can then drill down into the artifact details and the refresh details. And then we can see the durations, CPU and then they’re giving users that were actually causing the items.

And if we go to the Refresh tab, we can actually see information about dataset refreshes all up on the capacity and see what’s going on there as well. In this case, there’s no refreshes for this given capacity. All right, a couple other things to be aware of, first off, this does work for embedded capacities as well. So if you got Azure Power BI Embedded capacities, they’ll show up here as well.

You can see all capacities that you have rights to, you will see in that drop down list for the given capacity.

So obviously, it’s targeted more towards admins, but admins, if you give certain capacity admin rights to other users, they’ll be able to see this as well. So it’s just a way to help spread the load of capacity management. All right, did this help you understand the capacity metrics app? Do you have any other questions? Let me know down in the comments below.

We’ll try and answer as many as we have, and I’ll ask the Product team also to maybe weigh in and take a look at those too if I don’t know the answer. And like I said, we will be doing updates for this as major updates come to the app, just to keep you up to date on how to use it and where to go get it.

If you like this video, be sure to hit that big thumbs up button, smash it if you so desire. If it’s your first time here, hit that Subscribe button, and as always, from both Patrick and me, thank you so much for watching. Keep being awesome, and we’ll see you in the next video.

https://www.reallysmart.art/s/8.aspx?u=4424