RDKitRemoveInvalidMolecules.py - Remove invalid molecules
RDKitRemoveInvalidMolecules.py [--infileParams <Name,Value,...>] [--mode <remove or count>] [ --outfileParams <Name,Value,...> ] [--overwrite] [-w <dir>] [-o <outfile>] -i <infile>
RDKitRemoveInvalidMolecules.py -h | --help | -e | --examples
Identify and remove invalid molecules based on success or failure of RDKit molecule readers or simply count the number of invalid molecules.
The supported input file formats are: SD (.sdf, .sd), SMILES (.smi., csv, .tsv, .txt)
The supported output file formats are: SD (.sdf, .sd), SMILES (.smi)
Print examples.
Print this help message.
Input file name.
A comma delimited list of parameter name and value pairs for reading molecules from files. The supported parameter names for different file formats, along with their default values, are shown below:
Possible values for smilesDelimiter: space, comma or tab.
Specify whether to remove invalid molecules and write out filtered molecules to output file or or simply count the number of invalid molecules.
Output file name.
A comma delimited list of parameter name and value pairs for writing molecules to files. The supported parameter names for different file formats, along with their default values, are shown below:
Default value for compute2DCoords: yes for SMILES input file; no for all other file types.
Overwrite existing files.
Location of working directory which defaults to the current directory.
To remove invalid molecules and generate an output file SMILES file containing valid molecules, type:
To count number of valid and invaid molecules without generating any output file, type:
To remove invalid molecules from a CSV SMILES file, SMILES strings in column 1, name in column 2, and generate output SD file containing valid molecules, type:
RDKitConvertFileFormat.py, RDKitRemoveDuplicateMolecules.py, RDKitRemoveSalts, RDKitSearchFunctionalGroups.py, RDKitSearchSMARTS.py, RDKitStandardizeMolecules.py
Copyright (C) 2024 Manish Sud. All rights reserved.
The functionality available in this script is implemented using RDKit, an open source toolkit for cheminformatics developed by Greg Landrum.
This file is part of MayaChemTools.
MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.