utils package

Submodules

utils.importAlgorithm module

utils.importAlgorithm.algorithm_selection(algorithm, random_state, contamination)

Select algorithm from tokens.

Parameters
  • algorithm (str, optional (default='iforest', choices=['iforest','lof','ocsvm','robustcovariance','staticautoencoder','luminol','cblof','knn','hbos','sod','pca','dagmm','autoencoder','lstm_ad','lstm_ed'])) – The name of the algorithm.

  • random_state (np.random.RandomState) – The random state from the given random seeds.

  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

Returns

alg – The selected algorithm method.

Return type

class

utils.plotUtils module

utils.plotUtils.visualize_distribution(X, prediction, score, path=None)

Visualize the original density distribution of the data in 2-dimension space.

Parameters
  • X (numpy array of shape (n_test, n_features)) – Test data.

  • prediction (numpy array of shape (n_test, )) – The prediction result of the test data.

  • score (numpy array of shape (n_test, )) – The outlier score of the test data.

  • path (string) – The saving path for result figures.

utils.plotUtils.visualize_distribution_static(X, prediction, score, path=None)

Visualize the original distribution of the data in 2-dimension space, which outliers/inliers are colored as differnet scatter plot.

Parameters
  • X (numpy array of shape (n_test, n_features)) – Test data.

  • prediction (numpy array of shape (n_test, )) – The prediction result of the test data.

  • score (umpy array of shape (n_test, )) – The outlier score of the test data.

  • path (string) – The saving path for result figures.

utils.plotUtils.visualize_distribution_time_serie(ts, value, path=None)

Visualize the time-serie data in each individual dimensions.

Parameters
  • ts (numpy array of shape (n_test, n_features)) – The value of the test time serie data.

  • value (numpy array of shape (n_test, )) – The outlier score of the test data.

  • path (string) – The saving path for result figures.

utils.plotUtils.visualize_outlierresult(X, label, path=None)

Visualize the predicted outlier result.

Parameters
  • X (numpy array of shape (n_test, n_features)) – The test data.

  • label (numpy array of shape (n_test, )) – The label of test data produced by the algorithm.

utils.plotUtils.visualize_outlierscore(value, label, contamination, path=None)

Visualize the predicted outlier score.

Parameters
  • value (numpy array of shape (n_test, )) – The outlier score of the test data.

  • label (numpy array of shape (n_test, )) – The label of test data produced by the algorithm.

  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

  • path (string) – The saving path for result figures.

utils.utilities module

utils.utilities.check_parameter(param, low=-2147483647, high=2147483647, param_name='', include_left=False, include_right=False)

Check if an input is within the defined range.

Parameters
  • param (int, float) – The input parameter to check.

  • low (int, float) – The lower bound of the range.

  • high (int, float) – The higher bound of the range.

  • param_name (str, optional (default='')) – The name of the parameter.

  • include_left (bool, optional (default=False)) – Whether includes the lower bound (lower bound <=).

  • include_right (bool, optional (default=False)) – Whether includes the higher bound (<= higher bound).

Returns

within_range – Whether the parameter is within the range of (low, high)

Return type

bool or raise errors

utils.utilities.connect_server(host, user, password)

Connect to server.

Parameters
  • host (str) – Host name as the address of the TDEngine Server.

  • user (str) – User name of the TDEngine Server.

  • password (str) – Password for the TDEngine Server.

Returns

  • conn (taos.connection.TDengineConnection) – TDEnginine connection name.

  • cursor (taos.cursor.TDengineCursor) – TDEnginine cursor name.

utils.utilities.insert_demo_data(conn, consur, database, table, ground_truth_flag)

Inserting demo_data. Monitoring the process of database operations with to create a database and table, with time stamps.

Parameters
  • conn (taos.connection.TDengineConnection) – TDEnginine connection name.

  • cursor (taos.cursor.TDengineCursor) – TDEnginine cursor name.

  • database (str) – Connect database name.

  • table (str) – Table to query from.

  • ground_truth_flag (bool, optional (default=False)) – Whether uses ground truth to evaluate the performance or not.

Returns

ground_truth – The ground truth numpy array.

Return type

numpy array

utils.utilities.output_performance(algorithm, ground_truth, y_pred, time, outlierness)

Evaluate and output the performance given prediction results and groundtruth.

Parameters
  • algorithm (str) – Which algorithm has been used.

  • ground_truth (numpy array of shape (n_train, )) – The ground truth numpy array.

  • y_pred (numpy array of shape (n_train, )) – The prediction results as numpy array.

  • time (time.time()) – The cost time of processing.

  • outlierness (numpy array of shape (n_train, )) – The outlierness results as numpy array.

utils.utilities.query_data(conn, cursor, database, table, start_time, end_time, time_serie_name, ground_truth=None, time_serie=False, ground_truth_flag=True)

Query data from given time range and table.

Parameters
  • conn (taos.connection.TDengineConnection) – TDEnginine connection name.

  • cursor (taos.cursor.TDengineCursor) – TDEnginine cursor name.

  • database (str) – Connect database name.

  • table (str) – Table to query from.

  • start_time (str) – Time range, start from.

  • end_time (str) – Time range, end from.

  • time_serie_name (str) – Time_serie column name in the table.

  • ground_truth (numpy array of shape (n_samples,), optional (default=None)) – Ground truth value, for evaluation and visualization.

  • time_serie (bool, optional (default=False)) – Whether contains time stamps as one of the features or not.

  • ground_truth_flag (bool, optional (default=False)) – Whether uses ground truth to evaluate the performance or not.

Returns

X – Queried data as DataFrame from given table and time range.

Return type

pandas DataFrame

utils.utilities.standardizer(X, X_t=None, keep_scalar=False)

Conduct Z-normalization on data to turn input samples become zero-mean and unit variance.

Parameters
  • X (numpy array of shape (n_samples, n_features)) – The training samples

  • X_t (numpy array of shape (n_samples_new, n_features), optional (default=None)) – The data to be converted

  • keep_scalar (bool, optional (default=False)) – The flag to indicate whether to return the scalar

Returns

  • X_norm (numpy array of shape (n_samples, n_features)) – X after the Z-score normalization

  • X_t_norm (numpy array of shape (n_samples, n_features)) – X_t after the Z-score normalization

  • scalar (sklearn scalar object) – The scalar used in conversion

utils.utilities.str2bool(v)

Convert string to bool variable.

Parameters

v (str) – String needs to be convert. ‘yes’, ‘true’, ‘t’, ‘y’, ‘1’; or ‘no’, ‘false’, ‘f’, ‘n’, ‘0’.

Returns

Return type

Bool

Module contents