-
Notifications
You must be signed in to change notification settings - Fork 339
How many data analysis tasks can BI software handle?
Actually, not much!
From the early call for multidimensional analysis to the recent call for agile BI, BI vendors have been emphasizing self-service capabilities, claiming that business personnel can analyze data at their own discretion, and users often have strong needs. Both parties can easily form purchasing behavior as soon as they reach an agreement.
However, in terms of the expected job content for most users who lack BI application experience, the goal of self-service can be said to be far from achieved! From experience, the best situation can only solve about 30% of the problems, while most BI products cannot even reach this number and can only handle about 10% of the requirements.
The fundamental aspect of data analysis lies in data, or rather in calculations, rather than interfaces. The new generation of agile BI products has made significant progress in terms of interface aesthetics and operational convenience, but their data processing capabilities have not yet exceeded the scope of multidimensional analysis.
Multidimensional analysis refers to filtering a pre prepared dataset (called a CUBE) under certain conditions (called slicing) to view certain statistical values in specified dimensions, such as the sales volume and sales amount of a certain product by region and month in a certain year, which are written in SQL in the following way:
SELECT Di,...,SUM(Mi),COUNT(Mi),... FROM T GROUP BY Di,... WHERE Di=... AND Di IN (...) ...
The role of BI software is to make it particularly convenient for users to change filtering conditions and viewing dimensions, allowing them to move from the currently visible state to the target state, which is called rotation, drilling, slicing, etc. This is very meaningful for interactive analysis. However, no matter how convenient and smooth it is, the calculations that can be done are still like this: GROUP BY+WHERE, and there is not much more.
BI products sometimes support some simple cross row/group operations, such as calculating the ratio, ranking, proportion, etc. Obviously, the results of these operations are also of great significance for business analysis. Cross row/group operation is different from regular summary. Regular summary can be calculated with one given set of dimensions, while cross row/group operation requires an additional range level. However, most BI software on the market does not pay attention to this and arbitrarily sets the range level as the entire set. If you want to see the ranking of sales of a certain region in the province, you can only first slice the data within the province, and then look at each province one by one. Otherwise, you can only see the ranking nationwide, which is very inconvenient. When examining BI software, special attention should be paid to whether it supports setting a range layer for cross row/group operations. Of course, manufacturers will say that they can support cross row/group operations, but weak support can lead to some analysis tasks being unable to be implemented.
Achieving multidimensional analysis can only solve about 10% of self-service needs, which is the most common self-service capability of BI products.
The main problem of multidimensional analysis is the model-building process, which involves preparing a dataset in advance and limiting the analyzed data to the dataset. But in practical applications, it often goes beyond this range, which may involve unexpected data items or performing association operations with another dataset, which can lead to re model-building. However, model-building requires the help of technical personnel, making it impossible for business personnel to engage in self-service.
To address this limitation, some BI products have begun to provide associated query capabilities. Usually, a step is added before multidimensional analysis, which can calculate a new dataset based on the association of multiple datasets before conducting multidimensional analysis, or support certain association operations between multiple cubes during the multidimensional analysis process. This is equivalent to allowing business users to build model themselves to a certain extent.
However, implementing associated queries is not easy, and the root cause is that the definition of JOIN in relational databases is too simple, resulting in the association relationships between datasets appearing too cumbersome and beyond the understanding of many business personnel. This dilemma can be improved with the assistance of the BI product interface. A good BI product can enable business personnel to automatically handle the unique relationship between tables. But facing situations that are not unique (common in enterprise applications) is still difficult, and we will discuss this topic in detail later. To fundamentally solve this problem, it is necessary to change the data organization model at the database layer. Almost all BI products do not redefine the data model of the database, which limits their ability to perform associated queries.
A popular example that can be used to test the association ability of BI products is to search for male employees of female managers. This simple query requirement involves multiple associations of the same dataset, which most BI products cannot handle (unless modeled beforehand).
With the ability of associated query, the proportion of self-service requirements that BI products can solve can increase to 20% -30%, depending on the strength of the association ability provided by the product.
The remaining more requirements will involve multi-step procedural calculations, which completely exceeds the design goals of most BI products. However, it is a problem that users particularly expect to solve. Only by achieving this step can business personnel analyze data as they please.
A simple way is to use BI products to export basic data, and the rest can be done by business personnel using desktop tools such as Excel. Excel is flexible and convenient enough to implement multi-step operations relatively freely. In this sense, Excel is the most important BI tool.
However, Excel is powerless in the face of complex data calculations and repetitive operations, and in the case of a large amount of data, it cannot handle them, making it unsuitable for many application scenarios. These missing parts in Excel can be supplemented with esProc SPL, but it's a long story to discuss later.
For the most common self-service data analysis needs of users, the ability of BI products is actually quite weak. The common situation is that BI vendors talk about multidimensional analysis, while users think about problems that require procedural calculations to solve. This misalignment can lead to high expectations and great disappointment. Users need to be clear about their self-service needs: is multidimensional analysis sufficient? How many associated query requirements are there? Will business personnel raise a large number of questions that require procedural calculations? Only in this way can reasonable expectations be set, and one know where the BI product plays a role to oneself, without being confused by the fancy interface and smooth operation of the product, and avoiding regrets afterwards.
In the BI industry, there used to be a term called shelfware, which was used to describe products that were not very useful. The manufacturer personnel jokingly said that they were not selling software, but rather shelfware that was thrown onto the shelf and not moved when purchased. During the demonstration phase, users have high expectations that business personnel can analyze on their own, but in reality, it is completely different. It is used as a reporting tool as a result, and the ability of these BI software to produce reports is still very poor. This is a big pitfall for purchasing BI products!
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code