utils package¶
Submodules¶
utils.importAlgorithm module¶
-
utils.importAlgorithm.
algorithm_selection
(algorithm, random_state, contamination)¶ Select algorithm from tokens.
- Parameters
algorithm (str, optional (default='iforest', choices=['iforest','lof','ocsvm','robustcovariance','staticautoencoder','luminol','cblof','knn','hbos','sod','pca','dagmm','autoencoder','lstm_ad','lstm_ed'])) – The name of the algorithm.
random_state (np.random.RandomState) – The random state from the given random seeds.
contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.
- Returns
alg – The selected algorithm method.
- Return type
class
utils.plotUtils module¶
-
utils.plotUtils.
visualize_distribution
(X, prediction, score, path=None)¶ Visualize the original density distribution of the data in 2-dimension space.
- Parameters
X (numpy array of shape (n_test, n_features)) – Test data.
prediction (numpy array of shape (n_test, )) – The prediction result of the test data.
score (numpy array of shape (n_test, )) – The outlier score of the test data.
path (string) – The saving path for result figures.
-
utils.plotUtils.
visualize_distribution_static
(X, prediction, score, path=None)¶ Visualize the original distribution of the data in 2-dimension space, which outliers/inliers are colored as differnet scatter plot.
- Parameters
X (numpy array of shape (n_test, n_features)) – Test data.
prediction (numpy array of shape (n_test, )) – The prediction result of the test data.
score (umpy array of shape (n_test, )) – The outlier score of the test data.
path (string) – The saving path for result figures.
-
utils.plotUtils.
visualize_distribution_time_serie
(ts, value, path=None)¶ Visualize the time-serie data in each individual dimensions.
- Parameters
ts (numpy array of shape (n_test, n_features)) – The value of the test time serie data.
value (numpy array of shape (n_test, )) – The outlier score of the test data.
path (string) – The saving path for result figures.
-
utils.plotUtils.
visualize_outlierresult
(X, label, path=None)¶ Visualize the predicted outlier result.
- Parameters
X (numpy array of shape (n_test, n_features)) – The test data.
label (numpy array of shape (n_test, )) – The label of test data produced by the algorithm.
-
utils.plotUtils.
visualize_outlierscore
(value, label, contamination, path=None)¶ Visualize the predicted outlier score.
- Parameters
value (numpy array of shape (n_test, )) – The outlier score of the test data.
label (numpy array of shape (n_test, )) – The label of test data produced by the algorithm.
contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.
path (string) – The saving path for result figures.
utils.utilities module¶
-
utils.utilities.
check_parameter
(param, low=-2147483647, high=2147483647, param_name='', include_left=False, include_right=False)¶ Check if an input is within the defined range.
- Parameters
param (int, float) – The input parameter to check.
low (int, float) – The lower bound of the range.
high (int, float) – The higher bound of the range.
param_name (str, optional (default='')) – The name of the parameter.
include_left (bool, optional (default=False)) – Whether includes the lower bound (lower bound <=).
include_right (bool, optional (default=False)) – Whether includes the higher bound (<= higher bound).
- Returns
within_range – Whether the parameter is within the range of (low, high)
- Return type
bool or raise errors
-
utils.utilities.
connect_server
(host, user, password)¶ Connect to server.
- Parameters
host (str) – Host name as the address of the TDEngine Server.
user (str) – User name of the TDEngine Server.
password (str) – Password for the TDEngine Server.
- Returns
conn (taos.connection.TDengineConnection) – TDEnginine connection name.
cursor (taos.cursor.TDengineCursor) – TDEnginine cursor name.
-
utils.utilities.
insert_demo_data
(conn, consur, database, table, ground_truth_flag)¶ Inserting demo_data. Monitoring the process of database operations with to create a database and table, with time stamps.
- Parameters
conn (taos.connection.TDengineConnection) – TDEnginine connection name.
cursor (taos.cursor.TDengineCursor) – TDEnginine cursor name.
database (str) – Connect database name.
table (str) – Table to query from.
ground_truth_flag (bool, optional (default=False)) – Whether uses ground truth to evaluate the performance or not.
- Returns
ground_truth – The ground truth numpy array.
- Return type
numpy array
-
utils.utilities.
output_performance
(algorithm, ground_truth, y_pred, time, outlierness)¶ Evaluate and output the performance given prediction results and groundtruth.
- Parameters
algorithm (str) – Which algorithm has been used.
ground_truth (numpy array of shape (n_train, )) – The ground truth numpy array.
y_pred (numpy array of shape (n_train, )) – The prediction results as numpy array.
time (time.time()) – The cost time of processing.
outlierness (numpy array of shape (n_train, )) – The outlierness results as numpy array.
-
utils.utilities.
query_data
(conn, cursor, database, table, start_time, end_time, time_serie_name, ground_truth=None, time_serie=False, ground_truth_flag=True)¶ Query data from given time range and table.
- Parameters
conn (taos.connection.TDengineConnection) – TDEnginine connection name.
cursor (taos.cursor.TDengineCursor) – TDEnginine cursor name.
database (str) – Connect database name.
table (str) – Table to query from.
start_time (str) – Time range, start from.
end_time (str) – Time range, end from.
time_serie_name (str) – Time_serie column name in the table.
ground_truth (numpy array of shape (n_samples,), optional (default=None)) – Ground truth value, for evaluation and visualization.
time_serie (bool, optional (default=False)) – Whether contains time stamps as one of the features or not.
ground_truth_flag (bool, optional (default=False)) – Whether uses ground truth to evaluate the performance or not.
- Returns
X – Queried data as DataFrame from given table and time range.
- Return type
pandas DataFrame
-
utils.utilities.
standardizer
(X, X_t=None, keep_scalar=False)¶ Conduct Z-normalization on data to turn input samples become zero-mean and unit variance.
- Parameters
X (numpy array of shape (n_samples, n_features)) – The training samples
X_t (numpy array of shape (n_samples_new, n_features), optional (default=None)) – The data to be converted
keep_scalar (bool, optional (default=False)) – The flag to indicate whether to return the scalar
- Returns
X_norm (numpy array of shape (n_samples, n_features)) – X after the Z-score normalization
X_t_norm (numpy array of shape (n_samples, n_features)) – X_t after the Z-score normalization
scalar (sklearn scalar object) – The scalar used in conversion
-
utils.utilities.
str2bool
(v)¶ Convert string to bool variable.
- Parameters
v (str) – String needs to be convert. ‘yes’, ‘true’, ‘t’, ‘y’, ‘1’; or ‘no’, ‘false’, ‘f’, ‘n’, ‘0’.
- Returns
- Return type
Bool