Skip to content

Daniel Crawford's Data Blog

Tips for performant database designs

Menu
  • Home
  • Blog
  • Resources
  • About
  • Contact
Menu

Synapse Fundamentals for Performance Tuning – Statistics

Posted on August 12, 2022September 5, 2025 by Daniel Crawford

Statistics in Synapse Dedicated SQL Pools is without question the most important and the most frequently overlooked reason for poor query performance.  SQL Server is forgiving in that autocreate and autoupdate statistics are frequently enabled but Synapse only offers autocreate statistics.   Autoupdate statistics is not an option available in Synapse today.  This leads to a few critical questions around statistics…

What is the role of statistics in query performance for Synapse Dedicated SQL Pools compared to SQL Server?  What’s the difference?

Statistics in SQL Server determine whether it is more or less efficient to do certain operations when reading the data.  These operations may include join types, lookups, parallelism, etc…  In Synapse Dedicated SQL Pools, statistics perform an identical function within each distribution but also (and more importantly) determine data movement operations between distributions to complete a query.  These data movement operations include broadcasting or shuffling data to collocate it to complete joins and aggregations.

It is important to note a few things at this point as it directly impacts query performance in Synapse Dedicated SQL Pools:

  1. AutoCreate statistics will automatically create statistics when a query could use this information to create a more efficient query plans based upon query filter and join predicates only and will create single column statistics only. 
  2. Autocreate statistics is done synchronously meaning that queries may encounter performance hit on first run when stats need to be created prior to running a query.  It is best to create and maintain statistics in advance of regular workloads.  This does mean that developers need to understand and anticipate the workload requests coming from clients.
  3. While setting autoupdate statistics is not an option in Synapse Dedicated SQL Pools, it is critical to performance that they are maintained.  In fact individual distributions each have autoupdate statistics enabled by default but the DW engine does not – that is because the query optimizer for Synapse exists on the control node where no data is persisted. 
  4. When you issue the update statistics command, stats are aggregated from individual distributions to be stored on the control node to be used by the optimizer and determine the DSQL plan.  Thus, accurate statistics on the control node are critical to the most optimal query plan.
  5. Every table has a default statistic on it even if stats have never been created on the table.  That table level statistic is set to 1000 records and is used by the optimizer to determine optimal query plan to limit data movement.  This means if you have never updated statistics after table creation, the query optimizer is creating query plans based upon the expectation that there are only 1000 records in the table, regardless if there are 0 records or 1 billion records in the table.  As you can imagine, this can cause some very bad data movement and very bad performance.  This is probably the most frequent cause of poor query performance in Synapse Dedicated SQL Pools.

How and when should I maintain statistics in Synapse?

The how of maintaining statistics is straightforward.  Updating statistics is very easy – it is the same as SQL Server:  UPDATE STATISTICS [schema].[table];  This syntax is simple and covers all statistics on a single table which is a great place to start when doing tuning in a blanket fashion. 

There is one caveat to consider and with large tables in particular: your sampling percentage.  If not specified, your update statistics command will use a sampling percentage that is non-linear to the size of your table.  While allowing for default sampling works well in most scenarios, issues arise when there is potentially low cardinality and therefore skew to the data.  This is a scenario where the default sampling is not enough, and you have to use a larger sampling size.  Increasing sampling size requires more time to perform maintenance but it will pay off for your user queries against that table.

The question of “when” I should update statistics in Synapse is not so easy to answer as to how.  The obvious answer is to update them when they are no longer accurate.  Most documentation points you to checking when statistics were last updated but that doesn’t tell us the ratio of change on the table since they were last updated, and this is where you can run into trouble.  Some will say that it is important to update statistics when they are more than 20% inaccurate but even then, this may not be sufficient.  For Synapse, the best practice guidance provided is to update after every data load, but this may not be possible for very large fact tables with billions of records and updating statistics can take significant time.  In this case, a schedule should be implemented for slowly changing dimensions and large fact tables (perhaps weekly) depending on the ratio of change.

How to Check Accuracy of Statistics:

If you are a DBA and/or responsible for maintenance, you will likely want a query to check accuracy of statistics.  Remember that the important part of stats in Synapse Dedicated SQL Pools is the accuracy of what the control node believes is the record count (as well as histogram, density, etc…) compared to what the sum of the record counts actually is across all distributions.  If the percent difference is greater than X (maybe 20%), you would want to update statistics.  The best way to get the difference is via a script like the one below (Note that this can run a very long time in large environments. As always this is meant to be an example of how to query DMV’s and is not necessarily a production ready script.)

Loading

Categories

  • Architecture Patterns
  • Fabric
  • Performance Tuning
  • Synapse
  • Top 10 Performance Considerations

Archives

  • July 2023
  • January 2023
  • December 2022
  • November 2022
  • September 2022
  • August 2022

Recent Synapse Videos

In this video Bogdan joins Stijn to talk about Microsoft Fabric performance and what happens underneath the hood while processing a query! <br /><br />  <br /><br />Polaris white paper: https://www.vldb.org/pvldb/vol13/p3204-saborit.pdf <br /><br />  <br /><br />Bogdan Crivat - VP Synapse Analytics<br /><br />https://twitter.com/bogdanC_guid <br /><br />https://www.linkedin.com/in/bogdanc/ <br /><br />  <br /><br />Stijn Wynants - Senior Product Manager <br />https://www.linkedin.com/in/stijn-wynants-ba528660/ <br />https://sql-stijn.com/ <br />https://twitter.com/SQLStijn
Performance at Scale with Microsoft Fabric: Query Processing!
As part of our Fabric Espresso series, we're diving deep into the realm of Data Engineering and Data Science. Join us as our senior product managers - Estera Kot, Ted Vilutis, and Stijn Wynants discuss the crucial features of Microsoft Fabric that every data engineer should know about! <br /><br />From insights into data engineering within Microsoft Fabric to decision guides for copying data into Fabric, and shortcuts that point to other storage locations - we've got it all covered in this exciting new video. <br /><br />Check out these key resources for a deep dive: <br /><br />👉 Data Engineering in Microsoft Fabric: https://learn.microsoft.com/en-us/fabric/data-engineering/data-engineering-overview  <br /><br />👉 Decision Guide to Copy Data into Fabric: https://learn.microsoft.com/en-us/fabric/get-started/decision-guide-pipeline-dataflow-spark <br /><br />👉 Shortcuts to Other Storage Locations: https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts <br /><br />  <br /><br />Meet the Speakers: <br /><br />1️⃣ Stijn Wynants: Senior Product Manager at Microsoft <br /><br />LinkedIn: https://www.linkedin.com/in/stijn-wynants-ba528660/ <br /><br />Twitter: https://twitter.com/SQLStijn <br /><br />Blog: https://sql-stijn.com/ <br /><br />  <br /><br />2️⃣ Estera Kot: Senior Product Manager at Microsoft <br /><br />LinkedIn: https://www.linkedin.com/in/esterakot/ <br /><br />Twitter: https://twitter.com/estera_kot <br /><br />  <br /><br />3️⃣ Ted Vilutis: Senior Product Manager at Microsoft <br /><br />LinkedIn: https://www.linkedin.com/in/tedvilutis/ <br /><br />Twitter: https://twitter.com/tvilutis <br /><br />  <br /><br />Don't miss out on this opportunity to learn directly from the Microsoft Fabric Product Group Team and elevate your data engineering skills with Microsoft Fabric! 🚀
Top Microsoft Fabric Features that Every Data Engineer Should Know
Welcome to our Fabric Espresso series! In this video Ambika joins Stijn to talk about table clone in Warehouse within Microsoft Fabric. We will talk a bit more deep on what is a table clone, when and how to use a table clone.<br /><br />Clone table: https://learn.microsoft.com/fabric/data-warehouse/clone-table<br />Clone table using T-SQL: https://learn.microsoft.com/fabric/data-warehouse/tutorial-clone-table<br />CREATE TABLE AS CLONE OF: https://learn.microsoft.com/sql/t-sql/statements/create-table-as-clone-of-transact-sql?view=fabric<br /><br />Ambika Jagadish - Product Manager for Warehouse in Microsoft Fabric<br />www.linkedin.com/in/ambikajagadish
Fabric Espresso: Table clone in Warehouse within Microsoft Fabric
Welcome to the fourth video in our How To series for Real-Time Analytics in Microsoft Fabric!<br /><br />In this video, Guy Reginiano, a Product Manager for Synapse Real-Time Analytics in Microsoft Fabric, will show you how to get data from Azure Event Hubs into a Real-Time Analytics KQL database in Microsoft Fabric.<br /><br />00:07 Introduction<br />00:25 Basic definitions<br />01:12 Event Hubs ingestion demo<br />06:34 Interact with the ingested data using KQL<br />07:54 Create a Cloud Connection in Fabric<br />10:24 Wrap-Up<br /><br />#microsoftfabric #synapserealtimeanalytics
Synapse Real-Time Analytics: Stream data from Event Hubs into a KQL database for powerful analytics
Welcome to our Fabric Espresso series! In this video Bogdan joins Stijn to talk about Microsoft Fabric and what place Azure Synapse has in all of this. We will talk a bit more deep on the engine that was created for Fabric Warehouse and how it compares to Dedicated SQL Pools!<br /><br />Polaris white paper: https://www.vldb.org/pvldb/vol13/p3204-saborit.pdf<br /><br />Bogdan Crivat - VP Synapse Analytics<br />https://twitter.com/bogdanC_guid<br />https://www.linkedin.com/in/bogdanc/<br /><br />Stijn Wynants - Senior Program Manager<br />https://www.linkedin.com/in/stijn-wynants-ba528660/<br />https://sql-stijn.com/<br />https://twitter.com/SQLStijn
Fabric Espresso: Will Fabric replace Azure Synapse?
Load More... Subscribe
This error message is only visible to WordPress admins

Cannot collect videos from this channel. Please make sure this is a valid channel ID.