omicverse.alignment.parallel_fastq_dump¶
- omicverse.alignment.parallel_fastq_dump(sra_id, threads=1, outdir='.', tmpdir=None, min_spot_id=1, max_spot_id=None, split_files=False, gzip=False, **kwargs)[source]¶
Download SRA data in parallel using parallel-fastq-dump.
This function wraps the parallel-fastq-dump tool to download sequencing data from NCBI SRA (Sequence Read Archive) in parallel for faster downloads.
- Parameters:
sra_id (str) – SRA accession ID (for example
SRR2244401).threads (int, optional) – Number of parallel threads used by
parallel-fastq-dump.outdir (str, optional) – Output directory for downloaded FASTQ files.
tmpdir (str|None, optional) – Temporary directory for chunk/intermediate files.
min_spot_id (int, optional) – Minimum SRA spot ID to download.
max_spot_id (int|None, optional) – Maximum SRA spot ID to download.
Nonedownloads all remaining spots.split_files (bool, optional) – Split paired-end reads into separate
*_1/*_2FASTQ files.gzip (bool, optional) – Compress output FASTQ files using gzip.
**kwargs – Additional flags passed through to
parallel-fastq-dump.
- Return type:
- Returns:
dict[str,str|int] – Download metadata including input parameters and discovered output FASTQ paths.
Examples – >>> import omicverse as ov >>> # Download SRA data with 4 threads and split files >>> result = ov.alignment.parallel_fastq_dump( … sra_id=’SRR2244401’, … threads=4, … outdir=’fastq_output/’, … split_files=True, … gzip=True … ) >>> # Download with spot range limit >>> result = ov.alignment.parallel_fastq_dump( … sra_id=’SRR2244401’, … threads=8, … outdir=’fastq_output/’, … min_spot_id=1, … max_spot_id=100000, … split_files=True, … gzip=True … )