nc-state logo
Microsoft Power BI
Christopher G. Healey

Introduction

This module will introduce you to Microsoft Power BI (BI), a business intelligence platform designed to provide visualization dashboards, cloud-based collaboration, and multi-platform distribution across workstations and phones. We will work through examples on loading data from Excel worksheets, visualizing the data using different methods, and integrating the visualizations into an interactive BI dashboard.

What is Microsoft Power BI?

Microsoft Power BI (Business Intelligence) is a collection of tools designed to manage, query, and visualize data. The following six components are the ones you are most likely to use, although there are others.

Our main focus in these lectures will be on Power BI Desktop (BI Desktop) and Power BI Website. Additional information can be found on Microsoft's Power BI site if you're interested in exploring additional BI tools.

Setup

To use BI, you can download a copy of BI Desktop, either from the university's Microsoft subscription license or directly from Microsoft. We recommend you use the university's download, since this will also give you the ability to publish desktops to BI's public cloud. There are a number of steps you will need to complete to get BI Desktop up and running.

  1. Go to the university's Office 365 software link, https://office365.ncsu.edu; authenticate if needed.
  2. Choose the Power BI icon. If it isn't on the default list of choices, click the "All Apps →" link to expose it.
  3. Click on Power BI. This will take to to the Power BI public cloud.
  4. In the upper-right corner, click on the down arrow (↓) and choose "Power BI Desktop."
  5. Click "Open Microsoft Store" to jump to BI Desktop's download link on the Microsoft Store.
  6. Choose "Download" to download BI Desktop. If you receive a request dialog to login to your Microsoft account, you can either do that (if you have an account), or simple Cancel and close the dialog. BI Desktop will download regardless.

However you download BI Desktop, the first time you run it a dialog should appear with an option in the center to "Sign In." Do this, and enter your Unity ID. Since the university has a license to Power BI, this should present a standard Unity login screen where you can use your Unity ID and password to sign in to Power BI. Once you do this, you will have the ability to publish to and access the Power BI cloud.

Basic Power BI Desktop

BI Desktop loads with a blank report worksheet opened by default. On the left of the worksheet are three workspace icons allowing access to reports, data, and relationships, respectively. On the right are vertical tabs for filters, visualizations, and data fields. These can be shown or hidden with the > or < icons.

BI Desktop's initial report worksheet after starting the program.
bi-start

In order to visualize, we need to start with a dataset. We'll be using a modified version of a Excel activity dataset originally provided by Peter Aldhous as part of his Tableau demo. Once you've downloaded the activity dataset to your computer, open it in Excel and spend a few minutes familiarizing yourself with its different fields.

Loading Data

To start, we need to load the activity dataset into BI Desktop.

  1. Launch BI Desktop.
  2. Choose "Get data" from the initial introduction dialog.
  3. Select "Excel" from the list of available data sources, and click the "Connect" button.
  4. Open "USDA_activity_dataset.xlsx" from the File Dialog.
  5. Choose the "USDA Activity Dataset" sheet, and click the "Load" button.

Once the Excel worksheet has loaded, you should see a blank BI Desktop worksheet report, with the fields from the USDA Activity Dataset in the "Fields" column on the far right side of the window.

BI Desktop's initial worksheet after loading the activity dataset.
bi-start

Fields

Unlike Tableau or SAS VA, BI Desktop does not try to subdivide data attributes into categorical (qualitative) or continuous (qualitative) values. Instead, the "Fields" column lists each data attribute with a checkbox, an optional symbol (e.g., Σ), and its name from the spreadsheet header row. The checkbox identifies whether the given attribute is in-use for some part of the currently selected visualization. The symbol defines data type: Σ for numeric attributes, 🌐 for a geographic attribute, a calculator-like symbol for a calculated field, and blank for text for other data that BI Desktop has not automatically classified.

Similar to Tableau and SAS VA, however, visualizations are created by selecting a visualization type and using drag-and-drop to define the visualization's properties. For example, to create a "% Smokers" by "State" bar graph, perform the following steps.

  1. Click on "Stacked column chart" in the "Visualizations" column (top row, second icon); a placeholder bar chart will appear in the report worksheet.
  2. Drag "State" to the "Axis" field, to use state name on the X-axis.
  3. Drag "% Smokers" to the "Value" field, to visualize percent smokers in separate vertical bars; notice that this also defines the Y-axis.
  4. Grab the bottom-right handle and drag the visualization to fill the screen.
  5. Notice that if you hover over a bar, "% Smokers" is reported as a large number; this is because the default aggregation is Sum; to change it, right-click on the "% smokers" in the "Value" field and choose "Median" to change the aggregation to median.
  6. By default, the graph is sorted by "% Smokers;" if you wanted to sort by "State", click on the ellipsis symbol (⋯), and choose "Sort by → State."
A vertical bar graph of the median percentage of smokers by state.
bi-state-med-smoker

Dot Map

Next, we'll generate a dot map of the adult obesity rate for each county.

  1. Clear the worksheet selecting any visualizations you've built and pressing the Del key.
  2. Choose the "Map" visualization (third row, sixth icon) to insert a dot map visualization onto the report worksheet; expand the visualization to fill the report canvas.
  3. We will build a hierarchical State–County map, allowing a user to see an overview at the State level, then click on an individual state to "drill down" to the County level.
  4. First, we must define State and County to be geographic attributes; to do this, select "County" in the "Fields" column, choose the "Modeling" tab at the top of the screen, and in the "Properties" region choose a "Data Category" of "County"; do the same for the "State" attribute, but select "State or Province" rather than "County."
  5. Notice that a globe icon has been placed left of both "County" and "State" to identify them as geographic properties.
  6. Drag "State" to the "Location" field, then drag "County" to the "Location" field below "State"; "County" must be below "State" for the drill-down hierarchy to work properly.
  7. At the top of the dot-map visualization are a row of icons. Select the down arrow (↓) to turn on hierarchical navigation.
  8. Now, if you click on any bubble for a given state, the visualization will automatically "zoom in" on that state and show bubbles for each county; click on the up arrow (↑) to return to the state view.
  9. To color the bubbles by "Adult obesity rate," click on the paint roller icon in the "Visualizations" tab to format the visualization.
  10. Click on the "Data colors" label to expand it, then hover over "Default color"; a set of vertical ellipsis (⋮) will appear to the right, click on them and choose the "Conditional formatting" label that appears.
  11. In the "Default color" dialog, choose "Adult obesity rate" in the "Based on field" drop-down menu, and choose "Median" in the "Summarization" drop-down menu.
  12. Select the "Diverging" checkbox, and for the "Minimum", "Center", and "Maximum" color drop-downs, choose a dark blue, white, and a dark red, respectively, then click "OK."
  13. Bubbles in the State and County maps will now be saturated blue for below the median, white for at the median, and saturated red for above the median.
A map showing average obesity rate by state and county using a red–blue colour scale: red for values above the median, blue for values below the median, and saturation for values farther from the median.
bi-obesity-red-blue

Scatterplots

Next, we'll create some scatterplots to compare adult obesity rate to other, possibly related measures.

  1. Select the "Scatter chart" visualization from the "Visualizations" column (third row, second icon); note: make sure no other visualization is selected, otherwise the scatterplot will replace that visualization.
  2. In the "Fields" section of the "Visualizations" column (leftmost icon under the visualization type icons), drag "Adult obesity rate" to the "X Axis" label and "% smokers" to the "Y Axis" label.
  3. Drag "County" to the "Details" label to show a dot for each county.
  4. Click the magnifying glass "Analysis" icon, click on "Trend Line," and click on "+ Add" to add a linear trend line; choose a "Style" of "Dotted."
  5. Right-click on the scatterplot visualization and choose "Copy → Copy visual"; hit Ctrl+V twice to create two copies of the original scatterplot; drag them to position them relative to one another.
  6. In the second scatterplot, drag "% eating few fruits and vegetables" to the "Y Axis" label, replacing "% smokers", then right-click and choose "Median" for the aggregation method.
  7. Repeat these operations with the third scatterplot and the "% who do not exercise" attribute.

Notice that, by default, BI Desktop links data points in the visualizations. For example, click on any bubble in a scatterplot, and the dot map will zoom in on that bubble's state and county. Similarly, if you click on a state bubble to drill down to a state's counties, only that state's county dots will appear in the scatterplots.

Three scatterplots showing comparing average obesity rate by county to percentage of smokers, percentage eating few fruits and vegetables, and percentage who do not exercise.
bi-scatterplot

Dual-Axis Line Graph

To compare "Adult Obesity" and "% Smokers," we want to build a dual-axis line graph. Unfortunately, BI Desktop does not currently have the ability to support two Y-axes for two line graphs, although it does support a combination bar+line graph. Microsoft's current suggestion is to embed an R-based ggplot2 dual-axis graph. Although this seems somewhat high maintenance on BI Desktop's part, it does give us an opportunity to see how to embed R-based plots into a BI dashboard. It is also possible to use Python to construct plots (e.g., with Pyplot) in a similar fashion.

To start, we need to ensure that BI Desktop has access to an R interpreter, and optionally to an R integrated development environment (IDE) so that R can render the plots properly. To do this, choose "File", then "Options and Settings → Options." Select "R scripting" from the "GLOBAL" list, and enter or confirm a path to an R interpreter in the "Detected R home directories:" drop-down list, and an R IDE in the "Detected R IDEs" drop-down list. Because I already have R 3.3.1 and RStudio installed, BI Desktop has located them and automatically populated my fields.

BI Desktop's detected R home directory and R IDE options, located in the R scripting pane of the Options dialog.
bi-R-settings

Once the location of R and its corresponding IDE are set, we can insert an "R Visual" onto our report worksheet. To do this, click "R script visual" (5th row, 4th icon) to insert an R script-based visualization. You should see a ".R" placeholder and an "R script editor" at the bottom of the screen. Data you want to use in the R script must be added to the R visual. Since we want to compare "Adult obesity rate" and "% smokers" by "State", we drag all three attributes from the "Fields" list to the "Values" field in the "Visualizations" column. Notice that comment lines appear in the R editor denoting that a dataframe called dataset will automatically be created whenever the R visual executes or updates.

Next, we add the following R code to produce a dual-axis ggplot2 line graph of "Adult obesity rate" and "% smokers" by "State."

library( "ggplot2" )
library( "reshape2" )

df_long <- melt( dataset, id.vars = "State", measure.vars=c( "Adult obesity rate", "% smokers" ) )
p <- ggplot( df_long, aes( x=State, y=value, colour=variable ) )
p <- p + geom_point( size=2.0 )
p <- p + geom_line( aes( group=variable ), size=1.0 )
p <- p + scale_y_continuous( name="Obesity (%)", sec.axis=sec_axis( ~ ., name="Smokers (%)", labels=function(x) paste0( x,"%" ) ), labels=function(x) paste0( x,"%" ) )
p <- p + scale_colour_manual( labels=c( "obesity", "smokers" ), values=c( "orange", "blue" ) )
p <- p + theme( axis.text.x=element_text( angle=-45, hjust=0 ) )
p <- p + labs( colour="" )
p

Once you have entered this code beneath the line # Paste or type your script code here:, you can run it by clicking on the run icon in the R editor's title bar (). Assuming you pointed to a valid R interpreter and entered the code properly, you should see a dual-axis line graph in your R visual. You can hide the R editor by clicking on the down arrow in its titlebar, or by deselecting the R visual.

Dual-axis line graphs comparing median obesity rate by state to median percentage smokers.
bi-dual-line

One problem with this solution is that, unless you're using Microsoft Power BI Pro, R and Python-based visuals will not render if your dashboard is published to the web. Rather than dealing with R, we can switch to a different visualization type that provides the same "visual search for correlation" capability. An obvious choice is a side-by-side bar chart. This has the added advantage that, in this case, both "Adult obesity rate" and "% smokers" are reported by percent, so we only need a single axis to properly annotate both variables. To create the side-by-side bar chart, we'll first create two new columns in our dataset that report true percentages on the range 0…1 rather than the 0…100 range that both "Adult obesity rate" and "% smokers" use. This will ensure we can format the fields as percentages, to properly show a percent symbol on the bar graph's vertical axis.

  1. In the "Home" tab, choose "New Measure" from the "Calculations" region of the ribbon.
  2. In the dialog that appears, replace Measure = with Obesity_Frac = [Adult obesity rate] / 100.0
  3. Hit Return to enter the new measure, then click the "X" to close the dialog.
  4. Select the new "Obesity_Frac", click the "Modeling" tab at the top of the window, and in the "Formatting" region click the "%" button and choose 1 decimal point to format the field as a percentage with a single digit of precision.
  5. Repeat the same steps, but name the new column "Smoker_Frac" and use "[% smokers]" in place of "[Adult obesity rate]". Remember to format it as a single-decimal percentage.

Now that we have our new measures created, we can construct the side-by-side bar chart.

  1. Click "Clustered column chart" (first row, fourth icon) in the "Visualizations" column.
  2. Drag "State" to the "Axis" field.
  3. Drag "Obesity_Frac" and "Smoker_Frac" to the "Value" field.
  4. Click the down-arrow in "Obesity_Frac" and choose an aggregation type of "Median"; do the same for "Smoker_Frac".

The result is a side-by-side bar chart, organized by state, showing each state's obesity rate and smoker rate as percentages. You can click the … in the top-right of the visualization to choose to sort either ascending or descending by state, obesity, or smoker percentage. As noted above, this has the additional advantage of being fully functional when it's published to the web.

Side-by-side bar graphs comparing median obesity rate to median percentage smokers by state.
bi-side-chart

Dashboard

In some sense, visualizations are dashboards, or "reports" in BI Desktop terminology, since they provide built-in interactivity by default. For example, we have already seen that selecting an item in any visualization automatically "links" to all other visualizations, filtering their contents to match the selected item or items.

BI Desktop does provide some additional elements to add items to a report. In the "Home" tab's "Insert" region there are options for "Text box", "Image", or "Shape" that can be placed on the dashboard. Under "Custom visuals", you can import specialized visualizations or data manipulation widgets (e.g., sliders, word clouds, etc.) from Microsoft's visual repository. Some of these are built by Microsoft, and others are contributed by independent programmers. Power BI provides an API to allow construction of custom visuals.

Finally, slicers, which act like filters, can be inserted directly into a report, to make them more prominent and accessible versus using the Filter column. As an example, we will create a drop-down menu slicer to allow users to filter our visualizations to one or more individual states.

To insert a state slicer into our report, complete the following steps.

  1. Choose "Slicer" from the "Visualizations" column (fifth row, first icon).
  2. Drag "State" to the "Field" field; the slicer will fill with checkboxes and associated state names.
  3. Hover over the slicer, and click the down arrow that appears in the upper-right corner; change the selection from "List" to "Dropdown".
  4. We now have a dropdown menu that allows users to filter the visualizations to a single state, or multiple states by Ctrl+Click'ing multiple state checkboxes; it can be sized and placed on the report like any other visualization.
Side-by-side bar graphs comparing median obesity rate to median percentage smokers by state.
bi-slicer

Notice in this image I've also cleaned up the visualization. I've added a title as a text box, and tweaked the text for axes and titles in the visualizations to convert them from potentially uninformative variable names to more descriptive, English-language explanations. Any one of these changes by itself is not significantly impactful, but the whole is definitely more than the sum of the parts, since the dashboard now appears professional and ready for use by domain experts who are not necessarily visualization or Power BI savvy.

Publishing

The final step is to upload your work to the web. Similar to Tableau Public, this provides a method to easily share your dashboard through a web browser.

To publish to the BI website, make sure you've logged in with your Unity ID and password as describe in the Setup section. Next, Click on "Publish," which is in the "Share" region of the "Home" tab. BI Desktop will ask you if you want to save any changes you've made to the Dashboard, then ask you to "Select a destination." Choose "My workspace" and click "Select." Your report will be uploaded to the BI website, and you will be given options to "Open 'xxx.pbix' in Power BI" or "Get Quick Insights." If you click the "Open..." link, a web browser sheet will open containing your BI report.

Unfortunately, the ability to "Share" your dashboard is again restricted to Power BI Pro users. Providing the URL to the dashboard won't work, since you need to be logged into Power BI with your account credentials in order to access the dashboard. However, this does give you the ability to demonstrate dashboards from the web, which is useful in situations like external presentations, where the BI Desktop is unlikely to be installed on the machine you are using.

This tutorial only begins to explain the power and flexibility of Power BI. You can use Power BI's online documentation to explore BI in more detail. You can also come to talk to me if you're having problems, and I can try to help you find instructions to construct the visualizations you want to build.

Also, remember that, just like Tableau, BI Desktop is a tool to allow you to build visualizations. Like any tool, BI doesn't necessarily guarantee that the visualizations you produce are good visualizations. Try to apply the design principles we discussed in the visualization lecture, to make sure your visualizations are clear, effective, and well structured. The default visualizations that BI provides are unlikely to offer this, since BI doesn't have your domain expertise and understanding of the data's context to decide how best to present it.