SQL Server Analysis Services (SSAS) Tabular is a popular choice as an analytical engine for many customers. With its state-of-the-art compression algorithms, multi-threaded query processor and in-memory capabilities, SSAS Tabular can provide super quick access to data by reporting client applications. However, as a consultant, I have been called by many clients to resolve slow query performance when accessing data from SSAS Tabular models. My experiences have taught me most, if not all, of the performance issues can be resolved by taking care of the following five subject areas.
Estimate Current Size and Growth Carefully
Tabular models compress data really well and on an average, you can expect to see 10x the compression rates (though it can be much more or less depending on the cardinality of your data). However, when you are estimating the size of your model as well as future growth, a rough figure like this is not going to be optimal. If you already have data sitting in a data warehouse, import a subset of that data — say a month — to find the model size for that and then extrapolate the value based on the required number of rows / years as well as the number of columns. If not, try to at least get a subset of the data from the source to find the model size. There are tools like BISM Memory Report and Vertipaq Analyzer that can further help in this process.
It is also important to record the number of users who will be accessing the system currently as well as the estimated growth for the number of users.
Select or Upgrade Hardware Appropriately
Everyone knows SSAS Tabular is a memory intensive application, and one major issue I have seen is only the RAM is considered when hardware selections are made. However, just focusing on the RAM is not enough and there are a lot of other variables. Suppose all the other variables are constant and there is an unlimited budget, these are the recommendations:
- CPU Speed – The faster, the better, will help in computing results faster especially when there is a bottleneck on the single-threaded formula engine.
- CPU Cores – In theory, the more the better as it helps in managing concurrent user loads. However, in reality, a rise in the number of cores usually corresponds to a decrease in the CPU speed due to management overload. So a balanced approach has to be taken when determining the CPU Cores and Speed. Also, licensing cost increases with the number of cores for many software.
- CPU Sockets – The lesser, the better as SSAS Tabular is not NUMA aware till SQL 2014. However, this is expected to change in SQL 2016 where some NUMA optimization has been made. For large tabular models, it might be a challenge to go single socket as the amount of RAM that can be supported on a system will depend on the CPU sockets.
- CPU Cache – The more, the better. Retrieving data from CPU caches are 10-100x faster than retrieving data from RAM.
- CPU Architecture – The newer, the better due to the hardware performance optimizations. For eg, Intel Xeon processors with Haswell architecture is always going to be faster than Sandy architecture keeping all other variables constant.
- Amount of RAM – Should have at least 2.5x the model size, if the model is going to be processed on the same server. The amount of RAM can be lesser in cases of certain scale out architectures where the model is processed in a separate server.
- RAM Speed – The faster, the better (yes, RAMs have speed too!) This is very important for a memory-bound application like Tabular and should always go for the faster speeds, if budget allows.
- Storage – Not important at all as it does not have any effect on query performance. However, if budget allows, it might not be a bad idea to get faster storage like SSDs, as that will help in maintenance related activities like backup, storage or even getting the tabular model online faster when the service is restarted. Apart from this, there are other factors also like network latency, server architecture (scale out), etc that have to be considered, but depending on the budget and specific customer requirements, a balanced approach will have to be made.
Design the Data Model Properly
Tabular is really good at performance and in the case of small models, is extremely forgiving in terms of bad design. However, when the amount of data grows, performance problems begin to show up. In theory, you will get the best performance in SSAS tabular if the entire data is flattened into a single table. However, in reality, this would translate to an extremely bad user experience as well as a lengthy and expensive ETL process. So the best practice is to have a star schema, generally. Also, it is recommended to only include the relevant columns from the source tables, as increasing the columns will result in an increase in model size which in turn will result in slower query performances. Increase in number of rows might still be ok as long as the cardinality of the columns don’t change much.
Depending on the specific customer requirements, there could be deviations from the best practices. For e.g., we built custom aggregate tables along with the detailed fact table in the case of a very large production model for a client. The resultant measure had a conditional statement to retrieve data from the aggregate table if the detailed level dimension data was not used in the report. Since the aggregate table was only 1/10 the size of the detailed fact table, the query came out 10x times faster whenever the details were not used, which was almost 90% of the times.
Optimize the DAX Calculations
In case of small models, Tabular is extremely forgiving in terms of bad DAX code also. However, just like in the case of bad design, performance takes a hit for the worse as you increase the data, add more users, or run complex queries. DAX performance tuning is the most difficult to tune from the current list, and it is important to have a strategy for maintaining and tuning the performance. A good place to start would be the Performance Tuning of Tabular models in SSAS 2012 whitepaper.
Monitor User Query Patterns and Train Users
Once your model is in production, it is important to keep monitoring the user query patterns as well as the resources to see potential bottlenecks. Through this, you can find whether the performance issues are being caused due to inefficient DAX, bad design, insufficient resources or most importantly, whether it is just because a user is using the model inefficiently. For e.g., in one of the cases, we found out the slow performance for all users was due to a single user dumping the entire 100 GB model into spreadsheets so he could perform custom calculations on top of it. This blocked the queries for all the other users and made things really slow for them. With appropriate requirement gatherings, we ensured all the required calculations for that user were there in the model and then trained the user to use the model for his analytics.
The success of any tabular project depends on the adoption by the end users and it is needless to say the adoption would be much better if the system is fast. These 5 tips will ensure you already have a jumpstart on that journey.
Want to get the most out of your tabular project? Contact us for a consultation.