API

seqtools.utils

seqtools.utils.fileOpen(fname, mode='rt', encoding='latin-1')[source]

Open a file, including gzip files

Parameters:
  • fname – The filename to open. Gzip files are distinguished by ending in ‘.gz’
  • mode – File mode for opening [default ‘r’]
Returns:

An open file handle

Note: the gzip module in python is REALLY slow, so this function uses subprocess and command-line gzip instead.

seqtools.utils.revcomp(sequence)[source]

Reverse complement a string

Parameters:sequence – The DNA string (all caps)

This function includes a caching mechanism, so watch memory usage!

seqtools.utils.sortVcfBySequence(vcf, seqnames, seqmap=None)[source]

Sort a tabixed VCF file by sequence names

>>> import vcf
>>> v = vcf.Reader('vcf.gz')
>>> from seqtools.utils import sortVcfBySequence
>>> w = vcf.Writer(open('sorted.vcf','w'),v)
>>> faifile = open('ucsc.hg19.fasta.fai')
>>> seqnames = [x.split('   ')[0] for x in faifile]
>>> for rec in sortVcfBySequence(v,seqnames):
    w.write_record(rec)
>>> w.close()
>>> v.close()

seqtools.fastq

class seqtools.fastq.FastqRecord(header, sequence, line3, quality)[source]

Very simple fastq class containing header, sequence, line3, and quality as strings

class seqtools.fastq.FastqRecord(header, sequence, line3, quality)[source]

Very simple fastq class containing header, sequence, line3, and quality as strings

class seqtools.fastq.Fastq(fname)[source]
close()[source]
name()[source]

seqtools.varscan

parse VCF output from VarScan and fix the ALT column to adhere with VCF specifications

seqtools.varscan.fixLine(line)[source]

Fix a varscan VCF line

Prints the output to stdout. Fixes the ALT column and also fixes the FREQ field to be a floating point value, easier for filtering.

Parameters:line – a pre-split and stripped varscan line
seqtools.varscan.fixVarscanVcfFile(iterable)[source]

Takes an interator over a varscan VCF file and returns an iterator over fixed VCF lines, including header.

Parameters:iterable – any iterable of the VCF lines
Returns:An iterator over fixed VCF lines

Usage is like so:

>>> from seqtools.varscan import fixVarscanVcfFile
>>> varscan = fixVarscanVcfFile(open('filename.vcf','r'))
>>> for line in varscan:
    print line

seqtools.vcf

seqtools.demultiplexer

seqtools.strucvar

seqtools.strucvar.crest.crestLineToBedLines(crestline, extrastring=None)[source]

Takes a line from a CREST file and turns it into two BED lines

Parameters:
  • crestline – a single string representing the CREST output
  • extrastring – a single string to concatenate to the bed output, useful for including sample information, etc.
Returns:

a string containing the two bed lines