omicverse.alignment.bulk_rnaseq_pipeline¶
- omicverse.alignment.bulk_rnaseq_pipeline(sra_ids=None, samples=None, genome_dir='star_index', gtf='genes.gtf', output_dir='pipeline_output', genome_fasta_files=None, threads=8, memory='50G', jobs=None, skip_download=False, skip_qc=False, gzip_fastq=True, gene_mapping=True, auto_install=True, overwrite=False)[source]¶
Run a complete bulk RNA-seq pipeline from SRA accessions or local FASTQs.
The pipeline chains:
prefetch->fqdump->fastp->STAR->featureCount.- Parameters:
sra_ids (str or list of str, optional) – SRA accession IDs to download. Required unless samples is provided.
samples (tuple or list of tuples, optional) – Pre-existing FASTQ sample tuples
(name, fq1, fq2_or_None). When provided, the download step is skipped automatically.genome_dir (str) – Path to (or for) the STAR genome index directory.
gtf (str) – Path to the GTF annotation file.
output_dir (str) – Root output directory. Sub-directories are created per step.
genome_fasta_files (list of str, optional) – Genome FASTA file(s) for auto-building the STAR index.
threads (int) – Threads per tool invocation.
memory (str) – Memory limit for STAR BAM sorting (e.g.
'50G').jobs (int, optional) – Number of concurrent jobs.
Noneauto-detects.skip_download (bool) – Skip the prefetch + fqdump steps (requires samples).
skip_qc (bool) – Skip the fastp QC step.
gzip_fastq (bool) – Compress FASTQ output from fqdump.
gene_mapping (bool) – Map gene_id to gene_name in featureCounts output.
auto_install (bool) – Auto-install missing CLI tools via conda/mamba.
overwrite (bool) – Force re-run even when outputs already exist.
- Returns:
Merged gene-level count matrix (genes x samples).
- Return type: