RDKitEnumerateCompoundLibrary.py - Enumerate a virtual compound library
RDKitEnumerateCompoundLibrary.py [--colmode <collabel or colnum>] [--colRxnName <text or number>] [--colRxnSMARTS <text or number>] [--compute2DCoords <yes or no>] [--infileParams <Name,Value,...>] [--mode <RxnByName or RxnBySMARTS>] [--outfileParams <Name,Value,...>] [--overwrite] [--prodMolNames <UseReactants or Sequential>] [--rxnName <text>] [--rxnNamesFile <FileName or auto>] [--smartsRxn <text>] [--sanitize <yes or no>] [-w <dir>] -i <ReactantFile1,...> -o <outfile>
RDKitEnumerateCompoundLibrary.py [--colmode <collabel or colnum>] [--colRxnName <text or number>] [--colRxnSMARTS <text or number>] [--rxnNamesFile <FileName or auto>] -l | --list
RDKitEnumerateCompoundLibrary.py -h | --help | -e | --examples
Perform a combinatorial enumeration of a virtual library of molecules for a reaction specified using a reaction name or SMARTS pattern and reactant input files.
The SMARTS patterns for supported reactions names [ Ref 134 ] are retrieved from file, ReactionNamesAndSMARTS.csv, available in MayaChemTools data directory. The current list of supported reaction names is shown below:
'1,2,4_triazole_acetohydrazide', '1,2,4_triazole_carboxylic_acid_ester', 3_nitrile_pyridine, Benzimidazole_derivatives_aldehyde, Benzimidazole_derivatives_carboxylic_acid_ester, Benzofuran, Benzothiazole, Benzothiophene, Benzoxazole_aromatic_aldehyde, Benzoxazole_carboxylic_acid, Buchwald_Hartwig, Decarboxylative_coupling, Fischer_indole, Friedlaender_chinoline, Grignard_alcohol, Grignard_carbonyl, Heck_non_terminal_vinyl, Heck_terminal_vinyl, Heteroaromatic_nuc_sub, Huisgen_Cu_catalyzed_1,4_subst, Huisgen_disubst_alkyne, Huisgen_Ru_catalyzed_1,5_subst, Imidazole, Indole, Mitsunobu_imide, Mitsunobu_phenole, Mitsunobu_sulfonamide, Mitsunobu_tetrazole_1, Mitsunobu_tetrazole_2, Mitsunobu_tetrazole_3, Mitsunobu_tetrazole_4, N_arylation_heterocycles, Negishi, Niementowski_quinazoline, Nucl_sub_aromatic_ortho_nitro, Nucl_sub_aromatic_para_nitro, Oxadiazole, Paal_Knorr_pyrrole, Phthalazinone, Pictet_Spengler, Piperidine_indole, Pyrazole, Reductive_amination, Schotten_Baumann_amide, Sonogashira, Spiro_chromanone, Stille, Sulfon_amide, Suzuki, Tetrazole_connect_regioisomer_1, Tetrazole_connect_regioisomer_2, Tetrazole_terminal, Thiazole, Thiourea, Triaryl_imidazole, Urea, Williamson_ether, Wittig
The supported input file formats are: SD (.sdf, .sd), SMILES (.smi, .csv, .tsv, .txt)
The supported output file formats are: SD (.sdf, .sd), SMILES (.smi)
Use column number or name for the specification of columns in a CSV file containing reaction names along with reaction SMARTS. You may specify a reaction names file using '--rxnNamesFile' option.
Column name or number corresponding to reaction names. The default value is automatically set based on the value of '-c, --colmode': 'RxnName' for 'collabel'; Reaction name column number for 'colnum'.
Column name or number corresponding to reaction SMARTS strings. The default value is automatically set based on the value of '-c, --colmode': 'RxnSMARTS' for 'collabel'; Reacton SMARTS column number for 'colnum'.
Compute 2D coordinates of product molecules before writing them out.
Comma delimited list of reactant file names for enumerating a compound library using reaction SMARTS. The number of reactant files must match number of reaction components in reaction SMARTS. All reactant input files must have the same format.
A comma delimited list of parameter name and value pairs for reading molecules from files. The supported parameter names for different file formats, along with their default values, are shown below:
Possible values for smilesDelimiter: space, comma or tab. These parameters apply to all reactant input files, which must have the same file format.
Print examples.
Print this help message.
List available reaction names along with corresponding SMARTS patterns without performing any enumeration. In addition, reaction SMARTS patterns are validated.
Indicate whether a reaction is specified by a reaction name or a SMARTS pattern. Possible values: RxnByName or RxnBySMARTS.
Output file name.
A comma delimited list of parameter name and value pairs for writing molecules to files. The supported parameter names for different file formats, along with their default values, are shown below:
Generate names of product molecules using reactant names or assign names in a sequential order. Possible values: UseReactants or Sequential. Format of molecule names: UseReactants - <ReactName1>_<ReactName2>..._Prod<Num>; Sequential - Prod<Num>
Overwrite existing files.
Name of a reaction to use for enumerating a compound library. This option is only used during 'RxnByName' value of '-m, --mode' option.
Specify a file name containing data for names of reactions and SMARTS patterns or use default file, ReactionNamesAndSMARTS.csv, available in MayaChemTools data directory.
Default reactions SMARTS file format: RxnName,RxnSMARTS.
The local file format is assumed to be same as the default file format. You may explicitly specify column names or numbers for reaction name and reaction SMARTS using '--colRxnName' and '--colRxnSMARTS' options.
SMARTS pattern of a reaction to use for enumerating a compound library. This option is only used during 'RxnBySMARTS' value of '-m, --mode' option.
Sanitize product molecules before writing them out.
Location of working directory which defaults to the current directory.
To list all available reaction names along with their SMARTS pattern, type:
To perform a combinatorial enumeration of a virtual compound library corresponding to named amide reaction, Schotten_Baumann_amide, and write out a SMILES file type:
To run the previous command using a local reaction names file with explicit specification of column names containing reaction names and SMARTS, and write out a SMILES file type:
To perform a combinatorial enumeration of a virtual compound library corresponding to an amide reaction specified using a SMARTS pattern and write out a SD file containing sanitized molecules, computed 2D coordinates, and generation of molecule names from reactant names, type:
To perform a combinatorial enumeration of a virtual compound library corresponding to an amide reaction specified using a SMARTS pattern and write out a SD file containing unsanitized molecules, without generating 2D coordinates, and a sequential generation of molecule names, type:
RDKitConvertFileFormat.py, RDKitFilterPAINS.py, RDKitSearchFunctionalGroups.py, RDKitSearchSMARTS.py
Copyright (C) 2024 Manish Sud. All rights reserved.
The functionality available in this script is implemented using RDKit, an open source toolkit for cheminformatics developed by Greg Landrum.
This file is part of MayaChemTools.
MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.