Cardinality Oriented Databases

Columnar-oriented databases have gained immense popularity in recent times. Their design, which is inherently columnar, tends to follow a sparse data format. This means that for a given query, which typically focuses on data related to a specific unique value, the system only needs to process a small segment of the entire columnar structure. Scanning the whole columnar for every query is inefficient and often unnecessary.

To address this, many columnar databases implement sparse indexing or bloom filters to bypass irrelevant data. However, this approach still results in the loading of a significant amount of unnecessary data. The advantage of

columnar databases is their high compression ratio, which can offset the cost of loading this excess data to some extent.

In contrast, cardinality-oriented databases focus specifically on values. They operate under the assumption that most queries target a few specific values rather than the entire dataset. With this approach, unique values are stored in separate segments. Consequently, a query only loads and scans the segments containing the relevant values. This strategy significantly minimizes data loading size, thereby reducing bandwidth wastage and enhancing overall efficiency.

the comparison of columnar-oriented vs cardinality-oriented

Conclusion

For complex analysis requests focusing on unique values, a cardinality-oriented design proves superior to columnar-oriented approaches. By exchanging CPU for IO, one can achieve lower costs and enhanced performance using minimal computational resources, leveraging improved IO capabilities.