Running

In order to run pyIPSA workflow you have to specify input and output folders in the configuration file. Open file config/config.yaml in pyIPSA directory and specify your desired input and output folders. Paths must be absolute or relative to pyIPSA directory. Input folder must have at least one alignment file (BAM).

To run pyIPSA use the following command while in root directory:

$ snakemake --cores <number of cores>

To run in cluster environment using Grid Engine:

$ snakemake --cluster qsub --j <number of jobs>

Also you can create your own custom config. Just copy default config to the same folder and change the values you need. To run with custom config:

$ snakemake --configfile config/my_config.yaml

Default config file config/config.yaml must be present along with custom one.

For other running options consult with snakemake docs.

Configuration

config/config.yaml has many other useful options:

  • pooled - if True, merge junctions from all samples before retrieving sites if True

  • primary - if True, use only primary alignment for multimapped reads

  • unique - if True, do not use multimapped reads

  • threads - number of threads used to read single alignment file

  • min_offset - minimal offset when aggregating junctions

  • min_intron_length - minimal allowed length of junction

  • max_intron_length - maximal allowed length of junction

  • entropy - minimal value of entropy used for filtering out junctions or sites

  • total_count - minimal allowed count while filtering out junctions

  • gtag - if True, use only junctions with GT/AG splice sites

  • genome_filenames - stores full names of genomes

  • genome_urls - stores URLs to genome files

  • annotation_urls - stores URLs to annotation files