✨
Singularity
English
English
  • Overview
    • What is Singularity
    • V1 or V2
  • Installation
    • Download binaries
    • Install via docker
    • Built from source
    • Deploy to production
    • Version upgrade
  • Data Preparation
    • Get Started
    • Performance Tuning
  • Content Distribution
    • Distribute CAR files
  • Deal Making
    • Create a deal schedule
  • Topics
    • Inline Preparation
    • Benchmark
  • 💻CLI Reference
    • Menu
    • Ez Prep
    • Version
    • Admin
      • Init
      • Reset
      • Migrate Dataset
      • Migrate Schedule
    • Download
    • Extract Car
    • Deal
      • Schedule
        • Create
        • List
        • Update
        • Pause
        • Resume
        • Remove
      • Send Manual
      • List
    • Run
      • Api
      • Dataset Worker
      • Content Provider
      • Deal Tracker
      • Deal Pusher
      • Download Server
    • Wallet
      • Import
      • List
      • Remove
    • Storage
      • Create
        • Acd
        • Azureblob
        • B2
        • Box
        • Drive
        • Dropbox
        • Fichier
        • Filefabric
        • Ftp
        • Google Cloud Storage
        • Gphotos
        • Hdfs
        • Hidrive
        • Http
        • Internetarchive
        • Jottacloud
        • Koofr / Digi Storage
          • Digistorage
          • Koofr / Digi Storage
          • Other
        • Local
        • Mailru
        • Mega
        • Netstorage
        • Onedrive
        • Oos
          • Env_auth
          • Instance_principal_auth
          • No_auth
          • Resource_principal_auth
          • User_principal_auth
        • Opendrive
        • Pcloud
        • Premiumizeme
        • Putio
        • Qingstor
        • AWS S3 and compliant
          • Aws
          • Alibaba
          • Arvancloud
          • Ceph
          • Chinamobile
          • Cloudflare
          • Digitalocean
          • Dreamhost
          • Huaweiobs
          • Ibmcos
          • Idrive
          • Ionos
          • Liara
          • Lyvecloud
          • Minio
          • Netease
          • Other
          • Qiniu
          • Rackcorp
          • Scaleway
          • Seaweedfs
          • Stackpath
          • Storj
          • Tencentcos
          • Wasabi
        • Seafile
        • Sftp
        • Sharefile
        • Sia
        • Smb
        • Storj
          • Existing
          • New
        • Sugarsync
        • Swift
        • Union
        • Uptobox
        • Webdav
        • Yandex
        • Zoho
      • Explore
      • List
      • Remove
      • Update
        • Acd
        • Azureblob
        • B2
        • Box
        • Drive
        • Dropbox
        • Fichier
        • Filefabric
        • Ftp
        • Google Cloud Storage
        • Gphotos
        • Hdfs
        • Hidrive
        • Http
        • Internetarchive
        • Jottacloud
        • Koofr / Digi Storage
          • Digistorage
          • Koofr / Digi Storage
          • Other
        • Local
        • Mailru
        • Mega
        • Netstorage
        • Onedrive
        • Oos
          • Env_auth
          • Instance_principal_auth
          • No_auth
          • Resource_principal_auth
          • User_principal_auth
        • Opendrive
        • Pcloud
        • Premiumizeme
        • Putio
        • Qingstor
        • AWS S3 and compliant
          • Aws
          • Alibaba
          • Arvancloud
          • Ceph
          • Chinamobile
          • Cloudflare
          • Digitalocean
          • Dreamhost
          • Huaweiobs
          • Ibmcos
          • Idrive
          • Ionos
          • Liara
          • Lyvecloud
          • Minio
          • Netease
          • Other
          • Qiniu
          • Rackcorp
          • Scaleway
          • Seaweedfs
          • Stackpath
          • Storj
          • Tencentcos
          • Wasabi
        • Seafile
        • Sftp
        • Sharefile
        • Sia
        • Smb
        • Storj
          • Existing
          • New
        • Sugarsync
        • Swift
        • Union
        • Uptobox
        • Webdav
        • Yandex
        • Zoho
      • Rename
    • Prep
      • Create
      • List
      • Status
      • Rename
      • Attach Source
      • Attach Output
      • Detach Output
      • Start Scan
      • Pause Scan
      • Start Pack
      • Pause Pack
      • Start Daggen
      • Pause Daggen
      • List Pieces
      • Add Piece
      • Explore
      • Attach Wallet
      • List Wallets
      • Detach Wallet
      • Remove
  • 🌐Web API Reference
    • Admin
    • Deal Schedule
    • Deal
    • File
    • Job
    • Piece
    • Preparation
    • Storage
    • Wallet Association
    • Wallet
    • Specification
  • ❓FAQ
    • Database is locked
Powered by GitBook
On this page
  • Inline Preparation
  • DAG Updates
  • Parallelism in Data Preparation
  • Scanning
  • Packing
  • Use Server's Last Modified Time
  • Retry Strategy
  • Retry on Network Request
  • Retry on Network IO
  • Skip Inaccessible Files

Was this helpful?

Edit on GitHub
  1. Data Preparation

Performance Tuning

PreviousGet StartedNextDistribute CAR files

Last updated 1 year ago

Was this helpful?

Singularity offers a range of configurations allowing users to optimize data preparation performance. This guide elucidates these configurations and provides instructions for tuning them effectively.

Inline Preparation

  • Description: Inline preparation eradicates the need for extra disk space to store CAR files. However, it incurs a minor overhead in database lookups and storage.

  • Implications: The overhead is usually negligible but can become significant for datasets containing many small files.

  • Configuration: To disable, Use --no-inline with singularity prep create.

  • Further Reading:

DAG Updates

  • Description: During preparation, Singularity refreshes the DAG and CID for each directory, which is useful for real-time tracking of changes.

  • Implications: This introduces a slight database overhead as directories get updated each time a CAR file is prepared.

  • Configuration: To disable, use --no-dag with singularity prep create.

Parallelism in Data Preparation

Scanning

  • Description: Scanning involves traversing the source storage to curate a file list. While fast on local storage, it might be sluggish for remote storage like S3.

  • Configuration:

    • Enable Parallelism: Use --client-scan-concurrency <number> with singularity storage create or singularity storage update.

    • Note: Enabling can cause files to be processed in a non-deterministic order.

Packing

  • Description: Packing merges multiple files into a single CAR file, a both CPU-intensive and IO-intensive operation. For remote storage with network limitations, increasing parallelism is beneficial.

  • Configuration:

    • Adjust Parallelism: Use --concurrency <number> with singularity run dataset-worker.

Use Server's Last Modified Time

  • Description: Some remote storages such as AWS S3 offer custom mtime and server-side last modified time. By default, Singularity checks for custom mtime and uses it if available. Otherwise, it uses the server's last modified time.

  • Implication: Skip checking custom mtime and directly use server's last modified time can reduce the number of requests to the remote storage.

  • Configuration: To prioritize server's time and bypass object metadata fetching, use --client-use-server-mod-time with singularity storage create or singularity storage update.

Retry Strategy

Retry on Network Request

  • Description: For failed remote folder listings or file openings, Singularity leverages RClone's retry mechanism.

  • Configuration: To increase Retries, use --client-low-level-retries <number> with singularity storage create or singularity storage update.

Retry on Network IO

  • Description: Despite successful network requests, network IO can fail due to unstable network connections. Singularity supports retrying and resuming from the last successful point.

  • Configuration: Use below flags with singularity storage create or singularity storage update.

 --client-retry-backoff value      # Delay backoff for retrying IO read errors (default: 1s)
 --client-retry-backoff-exp value  # Exponential delay backoff for retrying IO read errors (default: 1.0)
 --client-retry-delay value        # Initial delay before retrying IO read errors (default: 1s)
 --client-retry-max value          # Max number of retries for IO read errors (default: 10)

Skip Inaccessible Files

  • Description: Permissions might prevent accessing certain files from remote storage. These issues may only surface when attempting to open the file, causing the packing job to fail.

  • Configuration: To skip inaccessible files, use --client-skip-inaccessible-files with singularity storage create or singularity storage update.

Inline Preparation