Benchmark

The ez-prep command in Singularity provides a streamlined approach to benchmarking.

Preparing Test Data

Initially, you need to generate data for benchmarking. Sparse files are used here to remove disk IO time from the benchmark. Currently, Singularity does not perform CID deduplication, so it processes these files as random bytes.

mkdir dataset
truncate -s 1024G dataset/1T.bin

If you aim to include disk IO time in your benchmark, use the following method to create a random file:

dd if=/dev/urandom of=dataset/8G.bin bs=1M count=8192

Using ez-prep

The ez-prep command streamlines data preparation from a local folder with minimal configurable options.

Benchmarking Inline Preparation

Inline preparation negates the need for exporting CAR files, saving metadata directly to the database:

time singularity ez-prep --output-dir '' ./dataset

Benchmarking with In-Memory Database

To minimize disk IO, opt for an in-memory database:

time singularity ez-prep --output-dir '' --database-file '' ./dataset

Benchmarking with Multiple Workers

For optimal CPU core utilization, set concurrency for the benchmark. Note: each worker uses approximately 4 CPU cores:

time singularity ez-prep --output-dir '' -j $(($(nproc) / 4 + 1)) ./dataset

Interpreting Results

Typical output will resemble:

real    0m20.379s
user    0m44.937s
sys     0m8.981s
  • real: Actual elapsed time. Using more workers should reduce this time.

  • user: CPU time used in user space. Dividing user by real approximates the number of CPU cores used.

  • sys: CPU time used in kernel space (represents disk IO).

Comparison

The following benchmarks were conducted on a random 8G file:

Tool
clock time (sec)
cpu time (sec)
memory (KB)

Singularity w/ inline prep

15.66
51.82
99

Singularity w/o inline prep

19.13
51.51
99

go-fil-dataprep

16.39
43.94
83

generate-car

42.6
56.08
44

go-car + stream-commp

70.21
139.01
42

Last updated