Skip to content

Integrating S3 with ClapDB

You can insert data from S3 into ClapDB.

S3 Table Functions

The s3 table function allows you to read files from S3 compatible storage. The outline for this syntax is:

s3(path, [aws_access_key_id, aws_secret_access_key,] [format])

where:

  • path Bucket URL with a path to the file.
  • format — The format of the file.

Preparation

To interact with our S3-based dataset, we prepare a standard table as our destination. The statement below creates a table named mgbench_logs2 in the default database:

CREATE TABLE mgbench_logs2 (
log_time timestamp,
client_ip ipv4,
request text,
status_code uint16,
object_size uint64
);

Inserting Data from S3

To read and insert the data into our instance, We combine our s3 function with a simple INSERT statement to achieve this.

Note that we aren’t required to list our columns because our target table provides the required structure. This requires the columns to appear in the order specified in the table DDL statement: columns are mapped according to their position in the SELECT clause.

columns can also be mapped according to their names using ‘CSVWithNames’.

insert into mgbench_logs2
select * from s3('https://clapdb-datasets-ap-south-1.s3.dualstack.ap-south-1.amazonaws.com/mgbench2.csv', 'CSV')

NOTE: the bucket clapdb-datasets-ap-south-1 is public readable, so we don’t need specify aws_access_key_id and aws_secret_access_key.