ExtractFromSequenceFiles.pl - Extract data from sequence and alignment files
ExtractFromSequenceFiles.pl SequenceFile(s) AlignmentFile(s)...
ExtractFromSequenceFiles.pl [-h, --help] [-i, --IgnoreGaps yes | no] [-m, --mode SequenceID | SequenceNum | SequenceNumRange] [-o, --overwrite] [-r, --root rootname] [-s, --Sequences "SequenceID, [SequenceID,...]" | "SequenceNum, [SequenceNum,...]" | "StartingSeqNum, EndingSeqNum"] [--SequenceIDMatch Exact | Relaxed] [-w, --WorkingDir dirname] SequenceFile(s) AlignmentFile(s)...
Extract specific data from SequenceFile(s) and AlignmentFile(s) and generate FASTA files. You can extract sequences using sequence IDs or sequence numbers.
The file names are separated by spaces. All the sequence files in a current directory can be specified by *.aln, *.msf, *.fasta, *.fta, *.pir or any other supported formats; additionally, DirName corresponds to all the sequence files in the current directory with any of the supported file extension: .aln, .msf, .fasta, .fta, and .pir.
Supported sequence formats are: ALN/CLustalW, GCG/MSF, PILEUP/MSF, Pearson/FASTA, and NBRF/PIR. Instead of using file extensions, file formats are detected by parsing the contents of SequenceFile(s) and AlignmentFile(s).
Print this help message.
Ignore gaps or gap columns during during generation of new sequence or alignment file(s). Possible values: yes or no. Default value: yes.
In order to remove gap columns, length of all the sequence must be same; otherwise, this option is ignored.
Specify how to extract data from sequence files: extract sequences using sequence IDs or sequence numbers. Possible values: SequenceID | SequenceNum | SequenceNumRange. Default: SequenceNum with value of 1.
The sequence numbers correspond to position of sequences starting from 1 for first sequence in SequenceFile(s) and AlignmentFile(s).
Overwrite existing files.
New sequence file name is generated using the root: <Root><Mode>.<Ext>. Default new file: <SequenceFileName><Mode>.<Ext>. This option is ignored for multiple input files.
This value is -m, --mode specific. In general, it's a comma delimites list of sequence IDs or sequence numbers.
For SequenceID value of -m, --mode option, input value format is: SequenceID,.... Examples:
For SequenceNum value of -m, --mode option, input value format is: SequenceNum,.... Examples:
For SequenceNum value of -m, --mode option, input value format is: StaringSeqNum,EndingSeqNum. Examples:
Sequence IDs matching criterion during SequenceID value of -m, --mode option: match specified sequence ID exactly or as sub string against sequence IDs in the files. Possible values: Exact | Relaxed. Default: Relaxed. Sequence ID match is case insenstitive during both options.
Maximum sequence length per line in sequence file(s). Default: 80.
Location of working directory. Default: current directory.
To extract first sequence from Sample1.fasta sequence file and generate Sample1SequenceNum.fasta sequence file, type:
To extract first sequence from Sample1.aln alignment file and generate Sample1SequenceNum.fasta sequence file without any column gaps, type:
To extract first sequence from Sample1.aln alignment file and generate Sample1SequenceNum.fasta sequence file with column gaps, type:
To extract sequence number 1 and 4 from Sample1.fasta sequence file and generate Sample1SequenceNum.fasta sequence file, type:
To extract sequences from sequence number 1 to 4 from Sample1.fasta sequence file and generate Sample1SequenceNumRange.fasta sequence file, type:
To extract sequence ID "Q9P993/104-387" from sequence from Sample1.fasta sequence file and generate Sample1SequenceID.fasta sequence file, type:
AnalyzeSequenceFilesData.pl, InfoSequenceFiles.pl
Copyright (C) 2024 Manish Sud. All rights reserved.
This file is part of MayaChemTools.
MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.