MayaChemTools

Previous  TOC  NextInfoSequenceFiles.plCode | PDF | PDFA4

NAME

InfoSequenceFiles.pl - List information about sequence and alignment files

SYNOPSIS

InfoSequenceFiles.pl SequenceFile(s) AlignmentFile(s)...

InfoSequenceFiles.pl [-a, --all] [-c, --count] [-d, --detail infolevel] [-f, --frequency] [--FrequencyBins number | "number, number, [number,...]"] [-h, --help] [-i, --IgnoreGaps yes | no] [-l, --longest] [-s, --shortest] [--SequenceLengths] [-w, --workingdir dirname] SequenceFile(s)...

DESCRIPTION

List information about contents of SequenceFile(s) and AlignmentFile(s): number of sequences, shortest and longest sequences, distribution of sequence lengths and so on. The file names are separated by spaces. All the sequence files in a current directory can be specified by *.aln, *.msf, *.fasta, *.fta, *.pir or any other supported formats; additionally, DirName corresponds to all the sequence files in the current directory with any of the supported file extension: .aln, .msf, .fasta, .fta, and .pir.

Supported sequence formats are: ALN/CLustalW, GCG/MSF, PILEUP/MSF, Pearson/FASTA, and NBRF/PIR. Instead of using file extensions, file formats are detected by parsing the contents of SequenceFile(s) and AlignmentFile(s).

OPTIONS

-a, --all

List all the available information.

-c, --count

List number of of sequences. This is default behavior.

-d, --detail InfoLevel

Level of information to print about sequences during various options. Default: 1. Possible values: 1, 2 or 3.

-f, --frequency

List distribution of sequence lengths using the specified number of bins or bin range specified using FrequencyBins option.

This option is ignored for input files containing only single sequence.

--FrequencyBins number | "number,number,[number,...]"

This value is used with -f, --frequency option to list distribution of sequence lengths using the specified number of bins or bin range. Default value: 10.

The bin range list is used to group sequence lengths into different groups; It must contain values in ascending order. Examples:

100,200,300,400,500,600
200,400,600,800,1000

The frequency value calculated for a specific bin corresponds to all the sequence lengths which are greater than the previous bin value and less than or equal to the current bin value.

-h, --help

Print this help message.

-i, --IgnoreGaps yes | no

Ignore gaps during calculation of sequence lengths. Possible values: yes or no. Default value: no.

-l, --longest

List information about longest sequence: ID, sequence and sequence length. This option is ignored for input files containing only single sequence.

-s, --shortest

List information about shortest sequence: ID, sequence and sequence length. This option is ignored for input files containing only single sequence.

--SequenceLengths

List information about sequence lengths.

-w, --WorkingDir dirname

Location of working directory. Default: current directory.

EXAMPLES

To count number of sequences in sequence files, type:

% InfoSequenceFiles.pl Sample1.fasta
% InfoSequenceFiles.pl Sample1.msf Sample1.aln Sample1.pir
% InfoSequenceFiles.pl *.fasta *.fta *.msf *.pir *.aln

To list all available information with maximum level of available detail for a sequence alignment file Sample1.msf, type:

% InfoSequenceFiles.pl -a -d 3 Sample1.msf

To list sequence length information after ignoring sequence gaps in Sample1.aln file, type:

% InfoSequenceFiles.pl --SequenceLengths --IgnoreGaps Yes Sample1.aln

To list shortest and longest sequence length information after ignoring sequence gaps in Sample1.aln file, type:

% InfoSequenceFiles.pl --longest --shortest --IgnoreGaps Yes Sample1.aln

To list distribution of sequence lengths after ignoring sequence gaps in Sample1.aln file and report the frequency distribution into 10 bins, type:

% InfoSequenceFiles.pl --frequency --FrequencyBins 10 --IgnoreGaps Yes Sample1.aln

To list distribution of sequence lengths after ignoring sequence gaps in Sample1.aln file and report the frequency distribution into specified bin range, type:

% InfoSequenceFiles.pl --frequency --FrequencyBins "150,200,250,300,350" --IgnoreGaps Yes Sample1.aln

AUTHOR

Manish Sud

SEE ALSO

AnalyzeSequenceFilesData.pl, ExtractFromSequenceFiles.pl, InfoAminoAcids.pl, InfoNucleicAcids.pl

COPYRIGHT

Copyright (C) 2024 Manish Sud. All rights reserved.

This file is part of MayaChemTools.

MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

 

 

Previous  TOC  NextJuly 1, 2024InfoSequenceFiles.pl