Skip to content

Ingesting

Where the data comes from

ClapDB is a data warehouse, which means it is a central repository for all the data that your organization collects. This data can come from a variety of sources.

In ClapDB free, you can only add data from Object Storage, and currently only from AWS S3.

ingesting throughput and latency

throughput

As a true cloud-native data warehouse, ClapDB is designed to handle write data volumes of any scale. (up to more than 100PB per day)

latency

ClapDB ingestion writes data to a Message Queue, from where it is batched and written to the storage layer. The process can be parallelized and distributed across any number of Lambdas, resulting in very low latency. However, since the processing requires batching a certain amount of data, the query latency for ClapDB Free / Pro will typically be in the order of minutes. If you have a higher write throughput, you can configure faster processing cycles. Of course, if you upgrade to ClapDB Enterprise, you can achieve sub-second query latency.

data formats supported

  • ndjson
  • csv
  • tsv

use the insert select sql to add data to ClapDB.

insert select from s3 use the s3() table function. see s3 table function.

insert into hdfs_plain_logs select * from s3('https://clapdb-datasets-ap-south-1.s3.dualstack.ap-south-1.amazonaws.com/hdfs_plain_logs.ndjson', 'NDJSON')

how to create a table

before ingesting, you should make sure the destination table was created.

ClapDB’s DDL data was based on JSON, you can use POST/GET http method to access the DDL data.

suggest use clapctl to generate the correct HTTP posting data.

Terminal window
clapctl sql -n demo-for-rookie -v -s "create table demo(enter_time time, body text, tenant_id int);"

will generate the following output

Terminal window
curl -X 'POST' -d 'create table demo(enter_time time, body text, tenant_id int);' -H 'Authorization: **your credential**' -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -H 'X-Pset-Value: null=NULL' '**your clapdb gateway address**'

check more details in ClapDB DDL.

HTTP

By now, HTTP can just post one line once to ClapDB.

Terminal window
curl --location --request POST "${PREFIX}production/local/public/large_table" \
--header 'Content-Type: application/json' \
--data-raw '{
"id": NULL,
"project_id": 11,
"sign": 100
}'

SQL client

ClapDB provides a SQL client to access the data, you can use the SQL client to ingest data to ClapDB.

ClapDB Enterprise support SQL client to ingest data to ClapDB.

just like below:

INSERT INTO cars (brand, model, year) VALUES ('Ford', 'Mustang', 1964);

Auto ingest from other data sources

ClapDB Enterprise will support auto import from other data sources.

  1. OLTP databases, like MySQL, PostgreSQL, Oracle, SQL server, etc.
  2. Online database or table Services, like Google Sheet, Airtable, etc.
  3. SaaS services, like Salesforce, Hubspot, Google Analysis etc.
  4. Open Source software, like Kafka, Redis, etc.