Little-Known Ways to Save Time with Python in Power BI
It All Started with a Single Script…
Power BI has built-in functions and tools, but still there are some limitations, for example.
If you want to do the imputation, run statistical analysis, or machine learning, you need to use external tools.
Here we have an option to integrate Python into the Power BI workflow.
Configuration in Power BI Desktop:
First, install a Python distribution on your local machine because Power BI Desktop uses it to execute scripts.
Microsoft recommends Python 3.11 or later for the best experience in Power BI Desktop. Power BI Service supports specific versions listed in its documentation.
Show Power BI where to find your Python installation:
Open Power BI Desktop.
Navigate to the File menu.
Select Options and settings.
Click on Options.
Locate the Python scripting section under the Global settings.
Specify your Python home directory.
Use your preferred IDE to write scripts: VS Code or PyCharm. Then copy it into Power BI once it runs without errors.
Python scripts sometimes combine data from various sources. Power BI blocks these scripts if your privacy levels are too restrictive.
Set your privacy levels to Public or Organizational in Options.
Integration Layers:
Power BI integrates Python in three main areas:
Data import.
Power query.
Custom visualizations.
Data import:
You can find some data sources without a native Power BI connector. Python can fetch this data.
Use the Get Data feature to start this process:
Select Get Data on the Home ribbon.
Choose More and search for Python scripts.
Paste your code into the script window.
If you want to combine 50 Excel files from a single folder, use a Python script to simplify this task.
It reads each file into a list of DataFrames and concatenates them into one table for Power BI.
import pandas as pd
import os
path = ‘C:/DataFiles/’
files = [f for f in os.listdir(path) if f.endswith(’.xlsx’)]
data_frames = []
for file in files:
temp_df = pd.read_excel(os.path.join(path, file))
data_frames.append(temp_df)
final_table = pd.concat(data_frames, ignore_index=True)Power Query:
1) Data Cleaning and Transformation:
Before you dive into the Python implementation, know what operations you should perform in Python.
Perform basic operations with the M language because Power Query is more efficient with its own tools for such simple things.
Power BI and Python interact with each other using temporary CSV-like files and will lead to performance issues, especially for large datasets (> 250 MB).
Use Python here to do things that Power Query cannot: fuzzy matching, ReGex, anomaly detection, or imputation.
Open Transform Data to enter the Power Query Editor.
Select the Transform tab.
Click Run Python Script.
Power BI creates a variable called [dataset] that contains information from the previous step in Pandas DataFrame.
Use it to replace nulls based on column averages with the fillna method.
dataset[’Sales’] = dataset[’Sales’].fillna(dataset[’Sales’].mean())Power Query has an option to run data profiling, but it is limited for large datasets with over one million rows.
[ydata-profiling] library can generate detailed column statistics.
After you have cleaned your data, inspect your numeric columns. Some of the numeric columns have anomalies.
Use the Interquartile Range (IQR) method in Python to find and delete these outliers.
import pandas as pd
Q1 = dataset[’Revenue’].quantile(0.25)
Q3 = dataset[’Revenue’].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
filtered_dataset = dataset[
(dataset[”Revenue”] >= lower_bound) & (dataset[”Revenue”] <= upper_bound)
]Before you delete the anomalies, analyze them because sometimes they might be important. Check my detailed guide on this topic here.
2) Statistical tests and Machine Learning:
Statistical tests:
You can run statistical tests like t-tests or chi-square with SciPy or PyMC libraries in Python.
from scipy import stats
import pandas as pd
# Separate the data into the two groups you want to compare
# Replace ‘Category’ and ‘Value’ with your actual column names
group1 = dataset[dataset[’Category’] == ‘Group A’][’Value’]
group2 = dataset[dataset[’Category’] == ‘Group B’][’Value’]
# Run the Independent T-test
t_stat, p_value = stats.ttest_ind(group1, group2)
# Create a DataFrame to return the results to Power BI
df_results = pd.DataFrame({
‘Stat’: [’T-Statistic’, ‘P-Value’],
‘Value’: [t_stat, p_value]
})Time Series Forecasting:
Power BI has a native forecasting feature and uses exponential smoothing for simple trends.
Python has more options for manual optimization of seasonality (for example, weekly cycles) and trends (additive vs. multiplicative) and time series decomposition with ARIMA or Prophet models.
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
# 1. Prepare the data index (required for statsmodels)
dataset[’Date’] = pd.to_datetime(dataset[’Date’])
dataset.set_index(’Date’, inplace=True)
dataset = dataset.asfreq(’D’, fill_value=0)
# 2. Define and fit the ARIMA model
# Note: (5,1,0) are example (p, d, q) parameters that require optimization
model = ARIMA(dataset[’Sales’], order=(5, 1, 0))
model_fit = model.fit()
# 3. Forecast future values (e.g., 30 days)
forecast = model_fit.forecast(steps=30)
# 4. Format the output for Power BI
forecast_df = forecast.reset_index()
forecast_df.columns = [’Date’, ‘Forecasted_Value’]Regression Analysis:
Use Python to run regression models if you want to find relationships between variables.
In this example, a test determines the statistical relationship between two variables, how [Lead Time] affects the [Average Daily Rate].
from scipy import stats
# Power BI automatically provides your data as a DataFrame named ‘dataset’
# This calculates the slope, intercept, and p-value
slope, intercept, r_value, p_value, std_err = stats.linregress(dataset[’LeadTime’], dataset[’ADR’])
# To output these values into a new Power BI table
import pandas as pd
result_df = pd.DataFrame({
‘Metric’: [’Slope’, ‘Intercept’, ‘P-Value’, ‘R-Squared’],
‘Value’: [slope, intercept, p_value, r_value**2]
})Unsupervised Learning and Segmentation:
Python libraries are one of the best in unsupervised machine learning.
You can group customers based on their purchase frequency and total spend with K-Means Clustering:
from sklearn.cluster import KMeans
import pandas as pd
# Select features for clustering
X = dataset[[’Frequency’, ‘Spend’]]
# Define the model with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)
# Fit the model and predict clusters
dataset[’ClusterID’] = kmeans.fit_predict(X)After running this script in Power Query, every customer will get a [ClusterID]. You can use this ID as a legend in a scatter plot to highlight the groups: one group might represent high-value customers and another: occasional buyers.
Visuals:
Power BI does not natively support some visuals. Use Matplotlib, Seaborn, or Plotly to create them.
Distribution Analysis: Violin plots show both spread and probability density (KDE) of data.
Correlation Heatmaps: Use it to identify relationships between multiple variables.
Joint and Pair Plots: They show both the distribution of individual variables and the relationships between them in a single grid.
Here is the code to create a correlation heatmap to identify which factors influence sales with Seaborn:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(10, 8))
sns.heatmap(dataset.corr(), annot=True, cmap=’coolwarm’)
plt.show()Until recently, Python visuals stayed entirely static, but the Plotly library now supports tooltips and zoom.
import plotly.express as px
import pandas as pd
fig = px.scatter(dataset, x=”Weight”, y=”Height”, color=”Gender”)
fig.show()Python Errors Troubleshooting:
If your script works in your IDE but Power BI could not execute it, check your library versions.
Make sure the version in your local environment matches the version supported by the Power BI Service.
Index Errors: Python uses zero-based indexing. Do not reference columns by their position.
Data Type Mismatch: Power BI expects specific date and number formats. If your Python script returns a date as a string, Power BI will not recognize it. Use pd.to_datetime() function to correct formatting.
Library Missing Errors: If Power BI reports a missing library, use pip list to see every installed package, and use [pip install library_name] in your terminal if it is missing.
Limitations:
Python custom visuals in Power BI could not process more than 150,000 rows. Use them on some subset or pre-aggregate the data.
Most library images are static and cannot perform cross-filtering.
Python visuals process data in memory. If your dataset is very large, you might run into memory limits.
Python operations break query folding. Move these operations to the end of the data cleaning funnel.
The Power BI service does not support all libraries, so you need to check on this link if you can use it in your dashboard: https://learn.microsoft.com/en-us/power-bi/connect-data/service-python-packages-support
Power BI terminates any script running longer than 30 minutes.
Power BI service will not render Python visuals if execution takes more than 1 minute.
From May 2026, Power BI will drop support for Python visualizations in public dashboards and customer embeds.
Desktop vs Fabric:
You can work with Python with Power BI in both Power BI Desktop and Fabric.
Microsoft has a feature within Microsoft Fabric called Fabric Notebooks.
Fabric Notebooks are cloud-based Jupyter notebooks and offer superior computation.
You can perform all data cleaning, statistical tests, and machine learning in service, save the result to OneLake storage, and connect it to your Power BI project via the Direct Lake connector.
Best practices:
Store your complex scripts in a version control system like Git.
Use comments to explain every step.
Pre-Aggregate Data: Never pass raw transactional data to a Python visual if you only need totals.
Limit Visuals: Use few Python visuals per page. Each visual requires a separate engine execution and slows down the report.
Optimize Imports: Only import the modules you need in a specific library.
Push all possible aggregation back to SQL before using Python for large datasets.
Use Power Query and SQL for primary data cleaning and reshaping.
📌Do you want to see more?
Subscribe for more Power BI and Python tips and leave a comment below👇


