Finding the best Data Source in Power BI – Part I

I believe most of the report creators are going to start importing data to Power BI from Excel. It’s really easy to start from there as many users already know how to use Excel.

I reckon that’s the reason why Import data from Excel is also the first option to add data in Power BI Desktop.

But sooner or later importing data from Excel (and picking up whole Folders) could slow down report refresh times.

Additionally, when it comes to data availability and accessibility referencing a file on a local drive is far from ideal.

The problem you might face

I also start most of my reports from Excel import. Given that I have a strong background in the tool, I just find it easier to start out from there.

However, I had to realize that once a model size reaches a certain point I could easily go for a coffee or tea while I wait for the report to refresh.

Don’t get me wrong, I like coffee. But it’s a suboptimal process.

Given that I don’t like to wait, I knew that I have to adjust something. And that’s when I started exploring other connectors and data sources.

I set myself a task – in my day job – to find the best possible data source. Of course, saying best could be subjective, especially that we all have different ways of measuring “best”.

What am I looking for?

There is no simple answer to that question…

But I knew that I want my refresh to run quickly so I don’t have to wait too long for it to run. Additionally, I wanted to create a better process for business continuity. While it may sound weird at first, I prefer to have processes that anyone can take over from the next day.

It means that if I get hit by a bus, the company that I work for will be able to use the datasets and reports that I have created. Essentially, making sure that those pbix files and all the data cleansing will be available for more people. Let’s say the whole Analyst Team.

And lastly, something that’s not too time-consuming to migrate over. The last thing that I want is to create unnecessary extra work for myself.

Clear and objective measures

Now that I clarified what I want, I had to find a way to measure success (and failure).

Luckily, I found Phil Seamark’s blog post about visualising Power BI refreshes:

Visualise your Power BI Refresh – Phil Seamark on DAX

Equipped with this fantastic tool I was ready to embark on my journey to find the best Data Source in Power BI.


Starting point – Baseline

As an analyst I started with setting my benchmark.

How long does it take to refresh my report? The answer is 2 minutes and 59 seconds.

Just out of curiosity, I moved all of the files and folders from a network drive to my local drive to see the best possible speed that I could achieve.

Please note while this seems to be an extremely great result, this is exactly what we are trying to avoid from a data accessibility perspective. Saving the pbix file on my machine with absolute zero access for anyone else is a HUGE RISK.

OneDrive

My first contester is Microsoft’s cloud platform, OneDrive Business – or SharePoint as these two are technically the same.

I moved all of my files to the cloud and replaced Power Query codes with the below.

Dimension tables:

let
Source = SharePoint.Contents(
“https://SHAREPOINT SITE LINK/”
),
Documents = Source{[Name = “Documents”]}[Content],
#”Report Refresh” = Documents{[Name = “Report Refresh”]}[Content],
#”Customers xlsx” = #”Report Refresh”{[Name = “Customers.xlsx”]}[Content],
#”Imported Excel Workbook” = Excel.Workbook(#”Customers xlsx”),
#”Customer_Sheet” = #”Imported Excel Workbook”{[Item = “Customer”, Kind = “Sheet”]}[Data]
in
Customer_Sheet

Fact (sales) table:

let
Source = SharePoint.Contents(“https://SHAREPOINT SITE LINK/”),
Documents = Source{[Name=”Documents”]}[Content],
#”Report Refresh” = Documents{[Name=”Report Refresh”]}[Content],
#”Monthly Sales Report” = #”Report Refresh”{[Name=”Monthly Sales Report”]}[Content]
in
#”Monthly Sales Report”

And that’s it. Migrating to OneDrive is that simple.

Time to check performance. It may not be a huge improvement, but still… I managed to refresh the report in 2 minutes and 5 seconds.

Although I must admit that one time I was able to refresh the report within a minute and a half. It probably has to do something with the fact that I ran the refresh after 11pm, outside of the high internet traffic hours.

Based on my experience so far OneDrive connector is usually on par or slightly better than connecting to a remote server via VPN.

Business Continuity

While report refresh times were roughly the same, moving data to OneDrive from a local file provides the first step in data accessibility.

With this small step I can share a OneDrive folder with my teammates and they can start using the same raw or source data that I use.

It is a huge step in removing data silos. Especially given how little effort it takes.

Conclusion

While I don’t want to spoiler just yet which Data Source or connector I decided to use, I would highly recommend you to try OneDrive. As I said, it doesn’t take too much time to move data and reports there, but the benefits are going to be visible immediately.


If you are a bit more Audio-Visual type of learner you can find a video on my YouTube channel as well.

Make sure to give it a thumbs up 👍 and subscribe as well, so you won’t miss new episodes!

Roland

🙏Thanks for watching and if you liked this video please hit the 👍 button.

📢 Also be sure to subscribe to the channel so you never miss a new video.


0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *