RDKitPerformRGroupDecomposition.py - Perform R group decomposition analysis
RDKitPerformRGroupDecomposition.py [--coreScaffold <ByMCS, BySMARTS or BySMILES>] [--decompositionParams <Name,Value,...>] [--infileParams <Name,Value,...>] [--mcsParams <Name,Value,...>] [--outfileParams <Name,Value,...>] [--overwrite] [--quote <yes or no>] [--removeUnmatched <yes or no>] [--smartsOrSmilesCoreScaffold <text>] [-w <dir>] -i <infile> -o <outfile>
RDKitPerformRGroupDecomposition.py -h | --help | -e | --examples
Perform R group decomposition for a set of molecules in a series containing a common core scaffold. The core scaffold is identified by a SMARTS string, SMILES string, or using maximum common substructure (MCS) search. Multiple core scaffolds may be specified using SMARTS or SMILES strings for set of molecules corresponding to multiple series.
The core scaffolds along with appropriate R groups are written out as SMILES strings to a SD or text file. The unmatched molecules without any specified core scaffold are written to a different output file.
The supported input file formats are: Mol (.mol), SD (.sdf, .sd), SMILES (.smi, .txt, .csv, .tsv)
The supported output file formats are: SD (.sdf, .sd), CSV/TSV (.csv, .tsv, .txt)
Specify a core scaffold for a set of molecules in a series. The core scaffold is identified by an explicit SMARTS string, SMILES string, or using maximum common substructure (MCS) search. Multiple core scaffolds may be specified using SMARTS or SMILES strings for set of molecules corresponding to multiple series.
Parameter values to use during R group decomposition for a series of molecules. In general, it is a comma delimited list of parameter name and value pairs. The supported parameter names along with their default values are shown below:
A brief description of each supported parameter taken from RDKit documentation, along with their possible values, is as follows.
RGroupCoreAlignment - Mapping of core labels:
RGroupMatching: Greedy, GreedyChunks, Exhaustive
matchOnlyAtRGroups - Allow R group decomposition only at specified R groups. Possible values: yes, no.
removeHydrogenOnlyGroups - Remove all R groups that only have hydrogens. Possible values: yes, no.
removeHydrogensPostMatch - Remove all hydrogens from the output molecules. Possible values: yes, no.
Print examples.
Print this help message.
Input file name.
A comma delimited list of parameter name and value pairs for reading molecules from files. The supported parameter names for different file formats, along with their default values, are shown below:
Possible values for smilesDelimiter: space, comma or tab.
Parameter values to use for identifying a maximum common substructure (MCS) in a series of molecules. In general, it is a comma delimited list of parameter name and value pairs. The supported parameter names along with their default values are shown below:
Possible values for atomCompare: CompareAny, CompareElements, CompareIsotopes. Possible values for bondCompare: CompareAny, CompareOrder, CompareOrderExact.
A brief description of MCS parameters taken from RDKit documentation is as follows:
Output file name.
A comma delimited list of parameter name and value pairs for writing molecules to files. The supported parameter names for different file formats, along with their default values, are shown below:
Default value for compute2DCoords: yes for SMILES input file; no for all other file types. The kekulize and smilesIsomeric parameters are also used during generation of SMILES strings for CSV/TSV files.
Overwrite existing files.
Quote SMILES strings and molecule names before writing them out to text files. Possible values: yes or no. Default: yes for CSV (.csv) text files; no for TSV (.tsv) and TXT (.txt) text files.
Remove unmatched molecules containing no specified core scaffold from the output file and write them to a different output file.
SMARTS or SMILES string to use for core scaffold during 'SMARTS' or 'SMILES' value of '-c, --coreScaffold' option. Multiple core scaffolds may be specified using a comma delimited set of SMARTS or SMILES strings.
Location of working directory which defaults to the current directory.
To perform R group decomposition for a set of molecules in a series using MCS to identify a core scaffold and write out a CSV file containing R groups, type:
To perform R group decomposition for a set of molecules in a series using a specified core scaffold and write out a SD file containing R groups, type:
To perform R group decomposition for a set of molecules in a series using MCS to identify a core scaffold and write out CSV files containing matched and unmatched molecules without quoting values, type:
To perform R group decomposition for a set of molecules in multiple series using specified core scaffolds and write out a TSV file containing R groups, type:
To perform R group decomposition for a set of molecules in a CSV SMILES file, SMILES strings in olumn 1, name in column 2, and write out a CSV file containing R groups, type:
RDKitConvertFileFormat.py, RDKitSearchFunctionalGroups.py, RDKitSearchSMARTS.py
Copyright (C) 2024 Manish Sud. All rights reserved.
The functionality available in this script is implemented using RDKit, an open source toolkit for cheminformatics developed by Greg Landrum.
This file is part of MayaChemTools.
MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.