Schemaless
Schemaless Data Ingestion and Querying
ClapDB enterprise edition supports schemaless data ingestion and querying. This feature is designed to handle both flexible and performance.
What is Schemaless
The Role of a Schema
- Perform schema validation during data ingestion.
- Compress data appropriately based on type during storage.
- Index data appropriately.
- Use type-appropriate utility functions during queries.
- Different data types have different comparison methods, affecting how data is ordered.
From the above points, the main functions of a schema are evident during data writing and querying phases.
Why Schemaless is Needed
Data in OLAP databases is read-only and historical data does not change. Therefore, when an alter table operation occurs, old data should retain its previous type and still be accessible. Users may frequently perform DDL (Data Definition Language) operations such as: Adding a new column at any time. Changing a column’s type at any time. Creating a new value for an enum type. For various reasons, data ingested may involve a mixture of different data types.
Schemaless in ClapDB
Constraints
- Under normal working conditions, columns cannot be deleted.
- Data can only relinquish its past schema definition after being archived.
- At the same point in time, within the same database, tables cannot have duplicate names, and within the same table, columns cannot have duplicate names.
Definition
- A database consists of several tables, which in turn are made up of several columns.
- Columns are defined by type, but these are not unique definitions. Instead, they are dynamic types composed of multiple types. Each row of data should belong to one of these types. At any given moment, a column possesses a list of types. If the ingested data does not specify a type, the system will attempt to convert it using these types in order until successful.
- When querying, you can specify which type of data in a column to query.
- Modifications to the schema do not affect old data; they only constrain the validation and casting of new data during ingestion.
Concepts
Immutable
ClapDB is a data platform designed for store & query, and the data is immutable. Once data is written, it cannot be modified by default. This is a fundamental principle of ClapDB.
So update is not supported in ClapDB, and the data is append-only. (user could just use the truncate
or delete
to remove the data.)
Table Schema Mode
Static
This mode behaves like a traditional database, the column schema is fixed, and the data type is fixed.
Dynamic
This mode allows the column schema to be changed, and the data type is flexible.
Dynamic table column schema constraint
In dynamic schema table, user can set the column dynamic detail manually.
If you want a column is a union of multiple types, you can __n
as the suffix of the column name, and the n
is the index of the union type, n
should start from 0, and unique.
if you do not use __n
, the column is struct type of a fixed type.
DDL
Ingesting Data
- Ingesting data can specify the type of the data, just use
column_name__n
as the column name, and then
is the index of the union type. - Ingesting data can do not specify the type of the data, the system will try to convert the data to the column type, the order is from 0 to largest index.
- if the column’s last index type is
text
, the data will be always ingesting success(may be truncated to a shorter length).
Query Data
- If the table is static, the ClapDB will behave like a traditional database.
- With dynmaic schema table, you can query the data with the type of the column use the
__n
suffiex. - Or use
::type
to specify the sub type of the column, if you do not want to use__n
suffix(maybe forgotten). - If not use
__n
suffix either::type
, ClapDB will try to all sub types of the column with the query. - If
::type
is used with__n
suffix,::type
still works like a cast operator.