Artificial Intelligence Programming Lab(AIPLab) 討論區

Please login or register.

請輸入帳號, 密碼以及預計登入時間



作者 主題: [Tool] HTSeq: Analysing high-throughput sequencing data  (閱讀 1902 次)


  • 管理員
  • Hero Member
  • *****
  • 文章: 1839
    • 檢視個人資料
[Tool] HTSeq: Analysing high-throughput sequencing data
« 於: 十月 24, 2016, 11:51:51 pm »
HTSeq: Analysing high-throughput sequencing data with Python

HTSeq is a Python package that provides infrastructure to process datafrom high-throughput sequencing assays.
  • Please see the chapter A tour through HTSeq first for an overview on the kind of analysisyou can do with HTSeq and the design of the package, and then look at the referencedocumentation.
  • While the main purpose of HTSeq is to allow you to write your own analysis scripts,customized to your needs, there are also a couple of stand-alone scripts forcommon tasks that can be used without any Python knowledge. See the Scriptssection in the overview below for what is available.

HTSeq is described in the following publication:
Simon Anders, Paul Theodor Pyl, Wolfgang Huber
HTSeq — A Python framework to work with high-throughput sequencing data
Bioinformatics (2014), in print, online at doi:10.1093/bioinformatics/btu638

If you use HTSeq in research, please cite this paper in your publication.

Documentation overview
    Prequisites and installation

    Download links and installation instructions can be found hereA tour through HTSeq

    The Tour shows you how to get started. It explains how to install HTSeq, and thendemonstrates typical analysis steps with explicit examples. Read this first, andthen see the Reference for details.A detailed use case: TSS plots

    This chapter explains typical usage patterns for HTSeq by explaining in detailthree different solutions to the same programming task.Counting reads

    This chapter explorer in detail the use case of counting the overlap of readswith annotation features and explains how to implement custom logic bywriting on’s own customized counting scriptsReference documentation

    The various classes of HTSeq are described here.
      Reference overview

      A brief overview over all classes.Sequences and FASTA/FASTQ files

      In order to represent sequences and reads (i.e., sequences with base-call qualityinformation), the classes Sequence and SequenceWithQualities are used.The classes FastaReader and FastqReader allow to parse FASTA and FASTQfiles.Genomic intervals and genomic arrays

      The classes GenomicInterval and GenomicPosition represent intervals andpositions in a genome. The class GenomicArray is an all-purpose containerwith easy access via a genomic interval or position, and GenomicArrayOfSetsis a special case useful to deal with genomic features (such as genes, exons,etc.)Read alignments

      To process the output from short read aligners in various formats (e.g., SAM),the classes described here are used, to represent output files and alignments,i.e., reads with their alignment information.Features

      The classes GenomicFeature and GFF_Reader help to deal with genomicannotation data.Other parsers

      This page describes classes to parse VCF, Wiggle and BED files.Scripts

      The following scripts can be used without any Python knowledge.
        Quality Assessment with htseq-qa

        Given a FASTQ or SAM file, this script produces a PDF file with plots depictingthe base calls and base-call qualities by position in the read. This is useful toassess the technical quality of a sequencing run.Counting reads in features with htseq-count

        Given a SAM file with alignments and a GFF file with genomic features, this scriptcounts how many reads map to each feature.[/list]
        • Appendices

        HTSeq is developed by Simon Anders at EMBL Heidelberg (Genome BiologyUnit). Please do not hesitate to contact me (anders at embl dot de) if youhave any comments or questions.


        HTSeq is free software: you can redistribute it and/or modifyit under the terms of the GNU General Public License as published bythe Free Software Foundation, either version 3 of the License, or(at your option) any later version.

        This program is distributed in the hope that it will be useful,but WITHOUT ANY WARRANTY; without even the implied warranty ofMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.See theGNU General Public License for more details.

        The full text of the GNU General Public License, version 3, can be foundhere:

        Table Of Contents
        Previous topic
        HTSeq: Analysing high-throughput sequencing data with Python
        Next topic
        Prequisites and installation
        This Page
        Quick search
        Enter search terms or a module, class or function name.

        © Copyright 2010, Simon Anders.Created using Sphinx 1.2.2.

        SimplePortal Classic 2.0.5