InfoSequenceFiles.pl - List information about sequence and alignment files
InfoSequenceFiles.pl SequenceFile(s) AlignmentFile(s)...
InfoSequenceFiles.pl [-a, --all] [-c, --count] [-d, --detail infolevel] [-f, --frequency] [--FrequencyBins number | "number, number, [number,...]"] [-h, --help] [-i, --IgnoreGaps yes | no] [-l, --longest] [-s, --shortest] [--SequenceLengths] [-w, --workingdir dirname] SequenceFile(s)...
List information about contents of SequenceFile(s) and AlignmentFile(s): number of sequences, shortest and longest sequences, distribution of sequence lengths and so on. The file names are separated by spaces. All the sequence files in a current directory can be specified by *.aln, *.msf, *.fasta, *.fta, *.pir or any other supported formats; additionally, DirName corresponds to all the sequence files in the current directory with any of the supported file extension: .aln, .msf, .fasta, .fta, and .pir.
Supported sequence formats are: ALN/CLustalW, GCG/MSF, PILEUP/MSF, Pearson/FASTA, and NBRF/PIR. Instead of using file extensions, file formats are detected by parsing the contents of SequenceFile(s) and AlignmentFile(s).
List all the available information.
List number of of sequences. This is default behavior.
Level of information to print about sequences during various options. Default: 1. Possible values: 1, 2 or 3.
List distribution of sequence lengths using the specified number of bins or bin range specified using FrequencyBins option.
This option is ignored for input files containing only single sequence.
This value is used with -f, --frequency option to list distribution of sequence lengths using the specified number of bins or bin range. Default value: 10.
The bin range list is used to group sequence lengths into different groups; It must contain values in ascending order. Examples:
The frequency value calculated for a specific bin corresponds to all the sequence lengths which are greater than the previous bin value and less than or equal to the current bin value.
Print this help message.
Ignore gaps during calculation of sequence lengths. Possible values: yes or no. Default value: no.
List information about longest sequence: ID, sequence and sequence length. This option is ignored for input files containing only single sequence.
List information about shortest sequence: ID, sequence and sequence length. This option is ignored for input files containing only single sequence.
List information about sequence lengths.
Location of working directory. Default: current directory.
To count number of sequences in sequence files, type:
To list all available information with maximum level of available detail for a sequence alignment file Sample1.msf, type:
To list sequence length information after ignoring sequence gaps in Sample1.aln file, type:
To list shortest and longest sequence length information after ignoring sequence gaps in Sample1.aln file, type:
To list distribution of sequence lengths after ignoring sequence gaps in Sample1.aln file and report the frequency distribution into 10 bins, type:
To list distribution of sequence lengths after ignoring sequence gaps in Sample1.aln file and report the frequency distribution into specified bin range, type:
AnalyzeSequenceFilesData.pl, ExtractFromSequenceFiles.pl, InfoAminoAcids.pl, InfoNucleicAcids.pl
Copyright (C) 2024 Manish Sud. All rights reserved.
This file is part of MayaChemTools.
MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.