1. Wall, L.; Christiansen, T.; Schwartz, R.L. Programming Perl, 2nd edition. O'Reilly Media Inc., September 1996.
2. CPAN: Comprehensive Perl archive network.
3. FSF: Free software foundation.
4. Knuth, D.E. The art of computer programming. Vol. 1-3. 2nd edition. Addison-Wesley, September 1998.
5. Press, W.H.; Teukolsky, S.A.; Vetterling, W.T.; Flannery, B.P. Numerical recipies in C: the art of scientific computing. 2nd edition. Cambridge University Press, 1992.
6. Orwant, J.; MacDonald, J.; Hietaniemi, J. Mastering algorithms with Perl. O'Reilly Media Inc., August 1999.
7. Data for elements in the periodic table.
8. Isotope data for elements in the periodic table.
9. Main data source for amino acids.
10. PerlMol - Perl modules for molecular chemistry.
11. OpenBabel: The open source chemistry toolbox.
12. CDK: The chemistry development kit.
14. CTFile Formats.
15. Conway, D. Object oriented Perl. 1st edition. O'Reilly Media Inc., January 2000.
16. Friedl, J.E.F. Mastering regular expressions. 3rd edition. O'Reilly Media Inc., August 2006.
17. Schulz, G.E.; Schirmer, R.H. Principles of protein structure. Springer-Verlag, January 1997.
18. Saenger, W. Principles of nucleic acid structure. Springer-Verlag, 1983.
19. Cornish-Bowden, A. Nomenclature for incompletely specified bases in nucleic acid sequence. Nucleic Acids Res. 1985, 13, 3021-3030.
20. Clapham, C. A concise Oxford dictionary of mathematics. Oxford University Press, 1990.
21. Cook, J.L. Conversion factors. Oxford University Press, 1993.
22. Pauling, L. The nature of chemical bond. 3rd edition. Cornell University Press, June 1960.
23. Daylight theory manual.
24. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Am. Chem. Soc. 1988, 28, 31-36.
25. Weininger, D.; Weininger, A.; Weininger, J.L. SMILES. 2. Algorithm for generation of unique SMILES notation. J. Am. Chem. Soc. 1989, 29, 97-101.
26. Weininger, D. SMILES. 3. Depit. Graphical depiction of chemical structures. J. Am. Chem. Soc. 1990, 30, 237-243.
27. OEChem TK manual.
28. Parkin, G. Valence, oxidation number, and formal charge: Three related but fundamentally different concepts. J. Chem. Educ. 2006, 83, 791-799.
29. Gateiger, J.; Jochum, C. An algorithm for the perception of synthetically important rigngs. J. Chem. Inf. Comput. Sci. 1979, 19, 43-47.
30. Balducci, R.; Pearlman, R.S. Efficient exact solution of the ring perception problem. J. Chem. Inf. Comput. Sci. 1994, 34, 822-831.
31. Hanser, T.; Jauffret, P.; Kaufmann, G. A new algorithm for exhaustive ring perception in a molecular graph. J. Chem. Inf. Comput. Sci. 1996, 36, 1146-1152.
32. Cahn, R.S.; Ingold, C.; Prelog, V. Specification of molecular chirality. Angew. Chem. Internat. Edit. 1966, 5, 385-415.
33. Prelog, V.; Helmchen, G. Basic principles of the CIP-system and proposals for revision. Angew. Chem. Internat. Edit. 1982, 21, 567-583.
34. Mata, P.; Lobo, A.M.; Marshall, C.; Johnson, P.A. The CIP seqeunce rules: Analysis and proposal for a revision. Tetrahedron. 1993, 4, 657-668.
35. Nourse, J.G.; Carhart, R.E.; Smith, D.H.; Djerassi, C. Exhaustive generation of stereoisomers for structure elucidation. J. Am. Chem. Soc. 1979, 101, 1216-1223.
36. Nourse, J.G.; Smith, D.H.; Carhart, R.E.; Djerassi, C. Computer-assisted elucidation of molecular structue with stereochemistry. J. Am. Chem. Soc. 1980, 102, 6289-6295.
37. Fused ring systems.
38. A hash function for hash table lookup.
39. Ralaivola, L.; Swamidass, S.J.; Saigo, H.; Baldi, P. Graph kernals for chemical informatics. Neural Networks. 2005, 18, 1093-1110.
40. Willett. P.; Barnard, J.M.; Downs, G.M. Chemical Similarity Searching. J. Chem. Inf. Comput. Sci. 1998, 38, 983-996.
41. Holliday, J.D.; Hu, C-Y.; Willett, P. Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings, Combinatorial Chemistry & High Throughput Screening. 2002, Vol. 5, No. 2, 155-166.
42. Flinger, M.; Verducci, J.; Blower, P. A modification of the Jacard-Tanimoto similarity index for diverse selection of chemical compounds using binary strings. Technometrics. 2002, 44, 110-119.
43. Wang, Y.; Bajorath, J. Balancing the influence of molecular complexity in fingerprint similarity searching. J. Chem. Inf. Comput. Sci. 2008, 48, 75-84.
44. Flower, D.R. On the properties of bit string-based measures of chemical similarity. J. Chem. Inf. Comput. Sci. 1998, 38, 379-386.
45. The Enkfil.dat and Eksfil.dat files: The keys to understanding MDL keyset technology.
46. Durant, J.L.; Leland, B.A.; Henry, D.H.; Nourse, J.G. Reoptimization of MDL Keys for Use in Drug Discovery. J. Chem. Inf. Comput. Sci. 2002, 42, 1273-1280.
47. Description of public MACCS keys.
48. Morgan, H.L. The generation of a unique machine description for chemical structures - A technique developed at chemical abstracts service. J. Chem. Doc. 1965, 5, 107-112.
49. Penny, R.H. A connectivity code for use in describing chemical structures. J. Chem. Doc. 1965, 5, 113-117. J. Chem. Doc. 1973, 3, 153-157.
50. Adamson, G.W.; Cowell, J.; Lynch, M.F.; McLure, A.H.; Town, W.G. Yapp, M. Strategic considerations in design of a screening system for substructure searches of chemical structure files.
51. Wipke, W.T.; Krishnan, S.; Ouchi, G.I. Hash functions for rapid storage and retrieval of chemical structures. J. Chem. Inf. Comput. Sci. 2002, 42, 1273-1280. 1978, 18, 31- .
52. Rogers, D.; Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Mod. 2010, 50, 742-754.
53. Faulon, J.-L.; Visco, D.P., Jr.; Pophale, R.S. The Signature Molecular Descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 2003, 43, 707-720.
54. Faulon, J.-L.; Collins, M.J.; Carr, R.D. The signature molecular descriptor. 4. Canonizing molecules using extended valence sequences. J. Chem. Inf. Comput. Sci. 2004, 44, 427-436.
55. Bender, A.; Mussa, H.Y.; Glen, R.C.; Reiling, S. Molecular similarity searching using atom environments, information-based feature selection, and a naive bayesian classifier. J. Chem. Inf. Comput. Sci. 2004, 44, 170-178.
56. Bender, A.; Mussa, H.Y.; Glen, R.C.; Reiling, S. Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance. J. Chem. Inf. Comput. Sci. 2004, 44, 1708-1718.
57. Carhart, R.E.; Smith, D.H.; Venkataraghavan, R. Atom pairs as molecular features in structure-activity studies: Definition and application. J. Chem. Inf. Comput. Sci. 1985, 25, 64-73.
58. Nilakantan, R.; Bauman, N.; Dixon, J.S.; Venkataraghavan, R. Topological torsion: A new molecular descriptor for SAR applications. Comparison with other descriptors. J. Chem. Inf. Comput. Sci. 1987, 27, 82-85.
59. Langham, J.L.; Jain, A.N. Accurate and interpretable computational modeling of chemical mutagenicity. J. Chem. Inf. Comput. Sci. 2008, 48, 1833-1839.
60. Schneider, G.; Neidhart, W.; Giller, T.; Schmid, G. Scaffold-hopping by topological pharmacophore search: A contribution to virtual screening. Angew. Chem. Int. Ed. 1999, 38, 2894-2896.
61. Fechner, U.; Franke, L.; Renner, S.; Schneider, P. Schneider, G. Comparison of correlation vector methods for ligand-based similarity searching. J. Comput. Aided Mol. Des. 2003, 17, 687-698.
62. Fechner, U.; Schneider, G. Evaluation of distance metrics for ligand-based similarity searching. ChemBioChem. 2004, 5, 538-540.
63. Downs, G.M.; Willett, P.; Fisanick, W. Similarity searching and clustering of chemical-structure databases using molecular property data. J. Chem. Inf. Comput. Sci., 1994, 34, 1094-1102.
64. Chen, X.; Reynolds, C.H.; Performance of similarity measures in 2D fragment-based similarity searching: Comparison of structural descriptors and similarity coefficients. J. Chem. Inf. Comput. Sci. 2002, 42, 1407-1414.
65. Steffen, R.; Fechner, U.; Schneider, G. Alignment-free pharmacophore patterns: A correlation-vector approach. Pharmacophores and pharmacophore searches. 2006. Volume 32. Wiley-VCH. 49-80.
66. McGregor, M.J.; Muskal, S. M. Pharmacophore fingerprinting. 1. Application to QSAR and focused library design. J. Chem. Inf. Comput. Sci. 1999, 39, 569-574.
67. Floyd, R.W. Algorithm 97: Shortest path. Communications of the ACM. 1962, 5, 345.
68. Horvath, D. Topological pharmacophores. Cheminformatics approaches to virtual screening. 2008. RSC Publishing. 44-75.
69. Ewing, T.; Baber, C.; Feher, M. Novel 2D fingerprints in ligand-based virtual screening. J. Chem. Inf. Model. 2006, 46, 2423-2431.
70. Watson, P. Naive Bayes classification using 2D pharmacophore feature triplet vectors. J. Chem. Inf. Model. 2008, 48, 166-178
71. Bonachera, F.; Parent, B.; Barbosa, F.; Froloff, N.; Horvath, D. Fuzzy tricentric pharmacophore fingerprints. 1. Topological fuzzy pharmacophore triplets and adapted molecular similarity scoring schemes. J. Chem. Inf. Model., 2006, 46, 2457-2477.
72. Kearsley, S.K.; Sallamack, S.; Fluder, E.M.; Andose, J.D.; Mosley, R.T.; Sheridan, R.P. Chemical Similarity Using Physiochemical Property Descriptors.J. Chem. Inf. Comput. Sci., 1996, 36, 118-127.
73. Filimonov, D.; Poroikov, V.; Borodina, Y.; Gloriozova, T. Chemical similarity assessment through multilevel neighborhoods of atoms: Definition and comparison with the other Descriptors. J. Chem. Inf. Comput. Sci., 1999, 39, 666-670.
74. RDKit - Cheminformatics and Machine Learning Software.
75. Kier, L.B.; Hall, L.H. Electrotopological state indices for atom types: A novel combination of electronic, topological, and valence state information. J. Chem. Inf. Comput. Sci. 1995, 35, 1039-1045.
76. Kier, L.B.; Hall, L.H. Molecular structure description - The electrotopological state. Academic Press, 1999.
77. Molconn-Z - Program for generation of Molecular Connectivity, Shape, and Information Indices.
78. Kier, L.B.; Hall, L.H. The E-State as the basis for molecular structure space definition and structure similarity. J. Chem. Inf. Comput. Sci. 2000, 40, 784-791.
79. SYBYL atom types.
80. Clark, M.; Cramer III, R.D.; Opdenbosch, N.V. Validation of the general purpose Tripos 5.2 forcefield. J. Comput. Chem. 1989, 10, 982-1012.
81. Rappe, A.K.; Casewit, C.J.; Colwell, K.S.; Goddard III, W.A.; Skiff, W.M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 1992, 114, 10024-10035.
82. Rappe, A. K. Personal communication. 2009.
83. Halgren, T.A.; Merck Molecular Force Field. I. Basis, Form, Scope, Parameterization, and Performance of MMFF94. 1996, J. Comput. Chem., 17, 490-519.
84. Halgren, T.A.; Merck molecular force field. II. MMFF94 van der Waals and electrostatic parameters for intermolecular interactions. J. Compt. Chem. 1996, 17, 520-552.
85. Halgren, T.A.; Merck molecular force field. III. Molecular geometries and vibrational frequencies for MMFF94. J. Compt. Chem. 1996, 17, 553-586.
86. Halgren, T.A.; Nachbar, R. B.; Merck molecular force field. IV. conformational energies and geometries for MMFF94. J. Compt. Chem. 1996, 17, 587-615.
87. Halgren, T.A.; Merck molecular force field. V. Extension of MMFF94 using experimental data, additional computational data, and empirical rules. J. Compt. Chem. 1996, 17, 616-641.
88. Mayo, S.L.; Olafson, B.A.; Goddard III, W.A. DREIDING: A Generic Force Field for Molecular Simulations. J. Phys. Chem. 1990, 94, 8897-8909.
89. Wildman, S.A.; Crippen, G.M.; Prediction of Physicochemical Parameters by Atomic Contributions. J. Chem. Inf. Comput. Sci. 1999, 39, 868-873.
90. Ertl, P.; Rohde, B.; Selzer, P. Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport Properties. J. Med. Chem. 2000, 43, 3714-3717.
91. Ertl, P. Personal communication. 2010.
92. Veber, D.F.; Johnson, S. R.; Chend, H.Y.; Smith, B.R.; Ward, K.W.; Kopple, K.D. Molecular properties that influence the oral bioavailability of drug candidates. J. Med. Chem. 2002, 45, 2165-2623.
91. Lipinski, C.A.; Lombardo, F.; Dominy, B.W.; Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug. Del. Rev. 1997, 23, 3-25.
92. Congreve M.; Carr R., Murray C., Jhoti H.A. 'rule of three' for fragment-based lead discovery? Drug. Discov. Today. 2003, 8, 876-877.
93. Zhao, Y.H.; Abraham, M.H.; Zissimos, A.M. Fast calculation of van der Waals volume as a sum of atomic and bond contributions and its application to drug compounds. J. Org. Chem. 2003, 68, 7368-7373.
94. Chen, J.; Holliday, J.; Bradshaw, J.A machine learning approach to weighting schemes in the data fusion of similarity coefficients. J. Chem. Inf. Model. 2009, 49, 185-194.
95. Williams, C. Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance. Molecular Diversity. 2006, 10, 311-332.
96. Whittle, M.; Gillet, V.J.; Willett, P.; Loesel, J. Analysis of data fusion methods in virtual screening: Similarity and group Fusion. J. Chem. Inf. Model. 2006, 46, 2206-2219.
97. Hert, J.; Willett, P.; Wilton, D.J.; Acklin, P.; Azzaoui, K.; Jacoby, E.; Schuffenhauer, A. New methods for ligand-based virtual screening: Use of data fusion and machine learning to enhance the effectiveness of similarity searching. J. Chem. Inf. Model. 2006, 46, 462-470.
98. Chu, C-W.; Holliday, J.D.; Willett, P. Effect of data standardization on chemical clustering and similarity searching. J. Chem. Inf. Model., 2009, 49, 155-161.
99. Arif, S.M.; Holliday, J.D.; Willett, P. Inverse frequency weighting of fragments for similarity-based virtual screening. J. Chem. Inf. Model., 2010, 50, 1340-1349.
100. Chen, B.; Mueller, C.; Willett, P. Combinations rules for group fusion in similarity-based virtual screening. Mol. Inf. 2010, 29, 533-541.
101. Willett, P.; Similarity searching using 2D structural fingerprints. Chemoinformatics and Computational Chemical Biology. Methods in Molecular Biology. 2011, 672, 133-58.
102. Berglund, A.E.; Head, R.D. PZIM: A method for similarity searching using atom environments and 2d alignment. J. Chem. Inf. Model. 2010, 50, 1790-1795.
103. Baldi, P.; Nasr, R. When is chemical similarity significant? The statistical distribution of chemical similarity scores and its extreme values. J. Chem. Inf. Model. 2010, 50, 1205-1222.
104. Godden, J.W.; Stahura, F.L,; Bajorath, J. Anatomy of fingerprint search calculations on structurally diverse sets of active compounds. J. Chem. Inf. Model. 2005, 45, 1812-1819.
105. Geppert, H.; Horvath, T.; Gartner, T.; Wrobel, S.; Bajorath, J. Support-vector-machine-based ranking significantly improves the effectiveness of similarity searching using 2d fingerprints and multiple reference compounds. J. Chem. Inf. Model. 2008, 48, 742-746.
106. Wang, Y.; Geppert, H.; Bajorath, J. Shannon entropy-based fingerprint similarity search strategy. J. Chem. Inf. Model., 2009, 49, 1687-1691.
107. Nisius, B.; Bajorath, J. Molecular fingerprint recombination: Generating hybrid fingerprints for similarity searching from different fingerprint types. ChemMedChem. 2009, 4, 1859-1863.
108. Vogt, M.; Bajorath, J. Predicting the Performance of Fingerprint Similarity Searching. Chemoinformatics and Computational Chemical Biology. Methods in Molecular Biology. 2011, 672, 159-173.
109. Muchmore, S.W.; Debe, D.A.; Metz, J.T.; Brown, S.P.; Martin, Y. .; Hajduk, P. H. Application of belief theory to similarity data fusion for use in analog searching and lead hopping. J. Chem. Inf. Model. 2008, 48, 941-948.
110. Bender, A.; Jenkins, J.L.; Scheiber, J.; Sukuru, S.C.K.; Glick, M.; Davies, J. W. How similar are similarity searching methods? A principal component analysis of molecular descriptor space. J. Chem. Inf. Model. 2009, 49, 108-119.
111. Sastry, M.; Lowrie, J.F.; Dixon, S.L.; Sherman, W. Large-scale sstematic analysis of 2D fingerprint methods and parameters to improve virtual screening enrichments. J. Chem. Inf. Model. 2010, 50, 771-784.
112. Tiikkainen, P.; Markt, P.; Wolber, G.; Kirchmair, J.; Distinto, S.; Poso, A.; Kallioniemi. O. Critical comparison of virtual screening methods against the MUV data set. J. Chem. Inf. Model., 2009, 49, 2168-2178.
113. Venkatraman, V.; Prez-Nueno, V. I.; Mavridis L.; Ritchie, D.W. Comprehensive comparison of ligand-based virtual screening tools against the DUD data set reveals limitations of current 3D methods. J. Chem. Inf. Model., 2010, 50, 2079-2093.
114. Chemfp - Cheminformatics fingerprints file formats and tools.
115. Yan, A.; Gasteiger, J.; Prediction of aqueous solubility of organic compounds by topological descriptors. QSAR Comb Sci. 2003, 22, 821-829.
116. Lovering, F.; Bikker, J.; Humblet, C. Escape from flatland: Increasing saturation as an approach to improving clinical success. J. Med. Chem. 2009, 52, 6752-6756.
117. Hann, M.M.; Leach, A.R.; Harper, G. Molecular complexity and its impact on the probability of finding leads for drug discovery. J. Chem. Inf. Comput. Sci. 2001, 41, 856-864.
118. Schuffenhauer, S.; Brown, N.; Selzer, P.; Ertl, P.; Jacoby, E. Relationships between molecular complexity, biological activity, and structural diversity. J. Chem. Inf. Model., 2006, 46, 525-535.
119. Walters, W.P.; Green, J.; Weiss, J.R.; Murcko, M. A. What do medicinal chemists actually make? A 50-year retrospective. J. Med. Chem. 2011, 54, 6405-6416.
120. Park, S.K.; Miller, K.W. Random number generators: Good ones are hard to find. Communications of the ACM. 1998, 10, 1192- 1200.
121. Huang R.; Southall N.; Wang Y.; Yasgar A.; Shinn P.; Jadhav A.; Nguyen D. T.; Austin C. P. The NCGC pharmaceutical collection: A comprehensive resource of clinically approved drugs enabling repurposing and chemical genomics. Sci. Transl. Med. 2011, 80ps16.
122. Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Research. 2000, 28, 235-242.
123. Jmol: An open-source Java viewer for chemical structures in 3D.
124. Lloyd, D. What is aromaticity? J. Chem. Inf. Comput. Sci. 1996, 36, 442-447.
125. Sayle, R. Cheminformatics toolkits: A personal perspective.
126. Dominus, M. J. Higher-order Perl.
127. OpenSMILES.
128. Tim Vandermeersch. OpenSMARTS.