FingerprintsVector
use Fingerprints::FingerprintsVector;
use Fingerprints::FingerprintsVector qw(:all);
FingerprintsVector class provides the following methods:
new, AddValueIDs, AddValues, CityBlockDistanceCoefficient, CosineSimilarityCoefficient, CzekanowskiSimilarityCoefficient, DiceSimilarityCoefficient, EuclideanDistanceCoefficient, GetDescription, GetFingerprintsVectorString, GetID, GetIDsAndValuesPairsString, GetIDsAndValuesString, GetNumOfNonZeroValues, GetNumOfValueIDs, GetNumOfValues, GetSupportedDistanceAndSimilarityCoefficients, GetSupportedDistanceCoefficients, GetSupportedSimilarityCoefficients, GetType, GetValue, GetValueID, GetValueIDs, GetValueIDsString, GetValues, GetValuesAndIDsPairsString, GetValuesAndIDsString, GetValuesString, GetVectorType, HammingDistanceCoefficient, IsFingerprintsVector, JaccardSimilarityCoefficient, ManhattanDistanceCoefficient, NewFromIDsAndValuesPairsString, NewFromIDsAndValuesString, NewFromValuesAndIDsPairsString, NewFromValuesAndIDsString, NewFromValuesString, OchiaiSimilarityCoefficient, SetDescription, SetID, SetType, SetValue, SetValueID, SetValueIDs, SetValues, SetVectorType, SoergelDistanceCoefficient, SorensonSimilarityCoefficient, StringifyFingerprintsVector, TanimotoSimilarityCoefficient
The methods available to create fingerprints vector from strings and to calculate similarity and distance coefficients between two vectors can also be invoked as class functions.
FingerprintsVector class provides support to perform comparison between vectors containing three different types of values:
Type I: OrderedNumericalValues
Type II: UnorderedNumericalValues
Type III: AlphaNumericalValues
Before performing similarity or distance calculations between vectors containing UnorderedNumericalValues or AlphaNumericalValues, the vectors are transformed into vectors containing unique OrderedNumericalValues using value IDs for UnorderedNumericalValues and values itself for AlphaNumericalValues.
Three forms of similarity and distance calculation between two vectors, specified using CalculationMode option, are supported: AlgebraicForm, BinaryForm or SetTheoreticForm.
For BinaryForm, the ordered list of processed final vector values containing the value or count of each unique value type is simply converted into a binary vector containing 1s and 0s corresponding to presence or absence of values before calculating similarity or distance between two vectors.
For two fingerprint vectors A and B of same size containing OrderedNumericalValues, let:
For SetTheoreticForm of calculation between two vectors, let:
For BinaryForm of calculation between two vectors, let:
Additionally, for BinaryForm various values also correspond to:
Various similarity and distance coefficients [ Ref 40, Ref 62, Ref 64 ] for a pair of vectors A and B in AlgebraicForm, BinaryForm and SetTheoreticForm are defined as follows:
CityBlockDistance: ( same as HammingDistance and ManhattanDistance)
AlgebraicForm: SUM ( ABS ( Xai - Xbi ) )
BinaryForm: ( Na - Nc ) + ( Nb - Nc ) = Na + Nb - 2 * Nc
SetTheoreticForm: | SetDifferenceXaXb | - | SetIntersectionXaXb | = SUM ( Xai ) + SUM ( Xbi ) - 2 * ( SUM ( MIN ( Xai, Xbi ) ) )
CosineSimilarity: ( same as OchiaiSimilarityCoefficient)
AlgebraicForm: SUM ( Xai * Xbi ) / SQRT ( SUM ( Xai ** 2) * SUM ( Xbi ** 2) )
BinaryForm: Nc / SQRT ( Na * Nb)
SetTheoreticForm: | SetIntersectionXaXb | / SQRT ( |Xa| * |Xb| ) = SUM ( MIN ( Xai, Xbi ) ) / SQRT ( SUM ( Xai ) * SUM ( Xbi ) )
CzekanowskiSimilarity: ( same as DiceSimilarity and SorensonSimilarity)
AlgebraicForm: ( 2 * ( SUM ( Xai * Xbi ) ) ) / ( SUM ( Xai ** 2) + SUM ( Xbi **2 ) )
BinaryForm: 2 * Nc / ( Na + Nb )
SetTheoreticForm: 2 * | SetIntersectionXaXb | / ( |Xa| + |Xb| ) = 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) / ( SUM ( Xai ) + SUM ( Xbi ) )
DiceSimilarity: ( same as CzekanowskiSimilarity and SorensonSimilarity)
AlgebraicForm: ( 2 * ( SUM ( Xai * Xbi ) ) ) / ( SUM ( Xai ** 2) + SUM ( Xbi **2 ) )
BinaryForm: 2 * Nc / ( Na + Nb )
SetTheoreticForm: 2 * | SetIntersectionXaXb | / ( |Xa| + |Xb| ) = 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) / ( SUM ( Xai ) + SUM ( Xbi ) )
EuclideanDistance:
AlgebraicForm: SQRT ( SUM ( ( ( Xai - Xbi ) ** 2 ) ) )
BinaryForm: SQRT ( ( Na - Nc ) + ( Nb - Nc ) ) = SQRT ( Na + Nb - 2 * Nc )
SetTheoreticForm: SQRT ( | SetDifferenceXaXb | - | SetIntersectionXaXb | ) = SQRT ( SUM ( Xai ) + SUM ( Xbi ) - 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) )
HammingDistance: ( same as CityBlockDistance and ManhattanDistance)
AlgebraicForm: SUM ( ABS ( Xai - Xbi ) )
BinaryForm: ( Na - Nc ) + ( Nb - Nc ) = Na + Nb - 2 * Nc
SetTheoreticForm: | SetDifferenceXaXb | - | SetIntersectionXaXb | = SUM ( Xai ) + SUM ( Xbi ) - 2 * ( SUM ( MIN ( Xai, Xbi ) ) )
JaccardSimilarity: ( same as TanimotoSimilarity)
AlgebraicForm: SUM ( Xai * Xbi ) / ( SUM ( Xai ** 2 ) + SUM ( Xbi ** 2 ) - SUM ( Xai * Xbi ) )
BinaryForm: Nc / ( ( Na - Nc ) + ( Nb - Nc ) + Nc ) = Nc / ( Na + Nb - Nc )
SetTheoreticForm: | SetIntersectionXaXb | / | SetDifferenceXaXb | = SUM ( MIN ( Xai, Xbi ) ) / ( SUM ( Xai ) + SUM ( Xbi ) - SUM ( MIN ( Xai, Xbi ) ) )
ManhattanDistance: ( same as CityBlockDistance and HammingDistance)
AlgebraicForm: SUM ( ABS ( Xai - Xbi ) )
BinaryForm: ( Na - Nc ) + ( Nb - Nc ) = Na + Nb - 2 * Nc
SetTheoreticForm: | SetDifferenceXaXb | - | SetIntersectionXaXb | = SUM ( Xai ) + SUM ( Xbi ) - 2 * ( SUM ( MIN ( Xai, Xbi ) ) )
OchiaiSimilarity: ( same as CosineSimilarity)
AlgebraicForm: SUM ( Xai * Xbi ) / SQRT ( SUM ( Xai ** 2) * SUM ( Xbi ** 2) )
BinaryForm: Nc / SQRT ( Na * Nb)
SetTheoreticForm: | SetIntersectionXaXb | / SQRT ( |Xa| * |Xb| ) = SUM ( MIN ( Xai, Xbi ) ) / SQRT ( SUM ( Xai ) * SUM ( Xbi ) )
SorensonSimilarity: ( same as CzekanowskiSimilarity and DiceSimilarity)
AlgebraicForm: ( 2 * ( SUM ( Xai * Xbi ) ) ) / ( SUM ( Xai ** 2) + SUM ( Xbi **2 ) )
BinaryForm: 2 * Nc / ( Na + Nb )
SetTheoreticForm: 2 * | SetIntersectionXaXb | / ( |Xa| + |Xb| ) = 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) / ( SUM ( Xai ) + SUM ( Xbi ) )
SoergelDistance:
AlgebraicForm: SUM ( ABS ( Xai - Xbi ) ) / SUM ( MAX ( Xai, Xbi ) )
BinaryForm: 1 - Nc / ( Na + Nb - Nc ) = ( Na + Nb - 2 * Nc ) / ( Na + Nb - Nc )
SetTheoreticForm: ( | SetDifferenceXaXb | - | SetIntersectionXaXb | ) / | SetDifferenceXaXb | = ( SUM ( Xai ) + SUM ( Xbi ) - 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) ) / ( SUM ( Xai ) + SUM ( Xbi ) - SUM ( MIN ( Xai, Xbi ) ) )
TanimotoSimilarity: ( same as JaccardSimilarity)
AlgebraicForm: SUM ( Xai * Xbi ) / ( SUM ( Xai ** 2 ) + SUM ( Xbi ** 2 ) - SUM ( Xai * Xbi ) )
BinaryForm: Nc / ( ( Na - Nc ) + ( Nb - Nc ) + Nc ) = Nc / ( Na + Nb - Nc )
SetTheoreticForm: | SetIntersectionXaXb | / | SetDifferenceXaXb | = SUM ( MIN ( Xai, Xbi ) ) / ( SUM ( Xai ) + SUM ( Xbi ) - SUM ( MIN ( Xai, Xbi ) ) )
Using specified FingerprintsVector property names and values hash, new method creates a new object and returns a reference to newly created FingerprintsVectorsVector object. By default, the following properties are initialized:
Examples:
Adds specified ValueIDs to FingerprintsVector and returns FingerprintsVector.
Adds specified Values to FingerprintsVector and returns FingerprintsVector.
Returns value of CityBlock distance coefficient between two FingerprintsVectors using optionally specified CalculationMode and optional checking of vector values.
Possible CalculationMode values: AlgebraicForm, BinaryForm or SetTheoreticForm. Default CalculationMode value: AlgebraicForm. Default SkipValuesCheck value: 0.
Returns value of Cosine similarity coefficient between two FingerprintsVectors using optionally specified CalculationMode and optional checking of vector values.
Possible CalculationMode values: AlgebraicForm, BinaryForm or SetTheoreticForm. Default CalculationMode value: AlgebraicForm. Default SkipValuesCheck value: 0.
Returns value of Czekanowski similarity coefficient between two FingerprintsVectors using optionally specified CalculationMode and optional checking of vector values.
Possible CalculationMode values: AlgebraicForm, BinaryForm or SetTheoreticForm. Default CalculationMode value: AlgebraicForm. Default SkipValuesCheck value: 0.
Returns value of Dice similarity coefficient between two FingerprintsVectors using optionally specified CalculationMode and optional checking of vector values.
Possible CalculationMode values: AlgebraicForm, BinaryForm or SetTheoreticForm. Default CalculationMode value: AlgebraicForm. Default SkipValuesCheck value: 0.
Returns value of Euclidean distance coefficient between two FingerprintsVectors using optionally specified CalculationMode and optional checking of vector values.
Possible CalculationMode values: AlgebraicForm, BinaryForm or SetTheoreticForm. Default CalculationMode value: AlgebraicForm. Default SkipValuesCheck value: 0.
Returns a string containing description of fingerprints vector.
Returns a FingerprintsString containing vector values and/or IDs in FingerprintsVector corresponding to specified Format.
Possible Format values: IDsAndValuesString, IDsAndValues, IDsAndValuesPairsString, IDsAndValuesPairs, ValuesAndIDsString, ValuesAndIDs, ValuesAndIDsPairsString, ValuesAndIDsPairs, ValueIDsString, ValueIDs, ValuesString, or Values.
Returns ID of FingerprintsVector.
Returns VectorType of FingerprintsVector.
Returns FingerprintsVector value IDs and values as space delimited ID/value pair string.
Returns FingerprintsVector value IDs and values as string containing space delimited IDs followed by values with semicolon as IDs and values delimiter.
Returns number of non-zero values in FingerprintsVector.
Returns number of value IDs FingerprintsVector.
Returns number of values FingerprintsVector.
Returns an array containing names of supported distance and similarity coefficients.
Returns an array containing names of supported disyance coefficients.
Returns an array containing names of supported similarity coefficients.
Returns FingerprintsVector vector type.
Returns fingerprints vector Value specified using Index starting at 0.
Returns fingerprints vector ValueID specified using Index starting at 0.
Returns fingerprints vector ValueIDs as an array or reference to an array.
Returns fingerprints vector ValueIDsString with value IDs delimited by space.
Returns fingerprints vector Values as an array or reference to an array.
Returns FingerprintsVector value and value IDs as space delimited ID/value pair string.
Returns FingerprintsVector values and value IDs as string containing space delimited IDs followed by values with semicolon as IDs and values delimiter.
Returns FingerprintsVector values as space delimited string.
Returns value of Hamming distance coefficient between two FingerprintsVectors using optionally specified CalculationMode and optional checking of vector values.
Possible CalculationMode values: AlgebraicForm, BinaryForm or SetTheoreticForm. Default CalculationMode value: AlgebraicForm. Default SkipValuesCheck value: 0.
Returns 1 or 0 based on whether Object is a FingerprintsVector.
Returns value of Jaccard similarity coefficient between two FingerprintsVectors using optionally specified CalculationMode and optional checking of vector values.
Possible CalculationMode values: AlgebraicForm, BinaryForm or SetTheoreticForm. Default CalculationMode value: AlgebraicForm. Default SkipValuesCheck value: 0.
Returns value of Manhattan distance coefficient between two FingerprintsVectors using optionally specified CalculationMode and optional checking of vector values.
Possible CalculationMode values: AlgebraicForm, BinaryForm or SetTheoreticForm. Default CalculationMode value: AlgebraicForm. Default SkipValuesCheck value: 0.
Creates a new FingerprintsVector of ValuesType using IDsAndValuesPairsString containing space delimited value IDs and values pairs and returns new FingerprintsVector object. Possible ValuesType values: OrderedNumericalValues, NumericalValues, or AlphaNumericalValues.
Creates a new FingerprintsVector of ValuesType using IDsAndValuesString containing semicolon delimited value IDs string followed by values strings and returns new FingerprintsVector object. The values within value and value IDs tring are delimited by spaces. Possible ValuesType values: OrderedNumericalValues, NumericalValues, or AlphaNumericalValues.
Creates a new FingerprintsVector of ValuesType using ValuesAndIDsPairsString containing space delimited value and value IDs pairs and returns new FingerprintsVector object. Possible ValuesType values: OrderedNumericalValues, NumericalValues, or AlphaNumericalValues.
Creates a new FingerprintsVector of ValuesType using ValuesAndIDsString containing semicolon delimited values string followed by value IDs strings and returns new FingerprintsVector object. The values within values and value IDs tring are delimited by spaces. Possible ValuesType values: OrderedNumericalValues, NumericalValues, or AlphaNumericalValues.
Creates a new FingerprintsVector of ValuesType using ValuesString containing space delimited values string and returns new FingerprintsVector object. The values within values and value IDs tring are delimited by spaces. Possible ValuesType values: OrderedNumericalValues, NumericalValues, or AlphaNumericalValues.
Returns value of Ochiai similarity coefficient between two FingerprintsVectors using optionally specified CalculationMode and optional checking of vector values.
Possible CalculationMode values: AlgebraicForm, BinaryForm or SetTheoreticForm. Default CalculationMode value: AlgebraicForm. Default SkipValuesCheck value: 0.
Sets Description of fingerprints vector and returns FingerprintsVector.
Sets ID of fingerprints vector and returns FingerprintsVector.
Sets VectorType of fingerprints vector and returns FingerprintsVector.
Sets FingerprintsVector values Type and returns FingerprintsVector. Possible Type values: OrderedNumericalValues, NumericalValues, or AlphaNumericalValues.
During calculation of similarity and distance coefficients between two FingerprintsVectors, the following conditions apply to vector type, size, value and value IDs:
Sets a FingerprintsVector value specified by Index starting at 0 to Value along with optional index range check and returns FingerprintsVector.
Sets a FingerprintsVector value ID specified by Index starting at 0 to ValueID along with optional index range check and returns FingerprintsVector.
Sets FingerprintsVector value IDs to specified ValueIDs and returns FingerprintsVector.
Sets FingerprintsVector value to specified Values and returns FingerprintsVector.
Returns value of Soergel distance coefficient between two FingerprintsVectors using optionally specified CalculationMode and optional checking of vector values.
Possible CalculationMode values: AlgebraicForm, BinaryForm or SetTheoreticForm. Default CalculationMode value: AlgebraicForm. Default SkipValuesCheck value: 0.
Returns value of Sorenson similarity coefficient between two FingerprintsVectors using optionally specified CalculationMode and optional checking of vector values.
Possible CalculationMode values: AlgebraicForm, BinaryForm or SetTheoreticForm. Default CalculationMode value: AlgebraicForm. Default SkipValuesCheck value: 0.
Returns value of Tanimoto similarity coefficient between two FingerprintsVectors using optionally specified CalculationMode and optional checking of vector values.
Possible CalculationMode values: AlgebraicForm, BinaryForm or SetTheoreticForm. Default CalculationMode value: AlgebraicForm. Default SkipValuesCheck value: 0.
Returns a string containing information about FingerprintsVector object.
BitVector.pm, FingerprintsStringUtil.pm, FingerprintsBitVector.pm, Vector.pm
Copyright (C) 2024 Manish Sud. All rights reserved.
This file is part of MayaChemTools.
MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.