CHE3007S
- CHE3007S
- DurbinWatson — Durbin-Watson statistic
- backfill — Function to backfill %nan values in data matrix
- bwrcolormap — Adapted by Dominic de Oliveira
- calculate_bins — A function to calculate the number of bins to be used for a subset of a data source, based on the full dataset provided. The bin width is calculated using the Freedman–Diaconis rule.
- che_boxplot — Draw a box-and-whiskers plot for data provided as column vectors.
- describe — This function calculates summary statistics per column for the dataset provided
- describe_strings — describe string data, similar to describe, but only relevant features. Considering combining into single function in future
- dfhead — Function to return the first 10 rows of a data matrix
- dfinfo — Function that displays the name of each column and the nr of non-null entries in that column
- dfview — Function to return the top 5 and bottom 5 rows of a data matrix
- dfview_all — Function to return a full data matrix, formatted neatly with headers
- downsample — Short description on the first line following the function header.
- find_indices — A function to find the indices of one matrix within another. The results will return the indices within the "header" vector where the values within "columns" were located.
- forwardfill — Function to forwardfill %nan values in data matrix
- get_dummies — One hot encoding function
- group_by — A function to perform a group by operation. This function groups data according to the unique values in the target column of the data, splits the data according to those groups and applies a function to the split data.
- heatmap — Function to plot a colour encoded matrix as a heatmap
- histogram — A function to plot a histogram with data split into different categories
- ingestCSV — Utility function for ingesting and splitting csv data.
- interpolatedfill — Function to interpolate fill %nan values in data matrix
- jointplot — Function to plot a colour encoded matrix as a heatmap
- kruskal — Returns the p, h-test value for the kruskal-wallis test. Provide input as a matrix, list of matrices or multiple matrices
- kurtosis — Kurtosis function
- kurtosis_test — Kurtosis statistical test
- mode_value — Helper function to calculate the modal value of a single column, returns the first instance encountered of mode value
- normality — Test for normality
- pairplot — Function to plot pairwise relatioships within a dataset. A grid will be created with each variable plotted against the other. The diagonals will dispay the frequency distributions of the data, which can be plotted as a histogram or with kernel density estimation of the distribution
- partial_dependance — Function to plot partial dependance for one (2d plot) or two (surface plot) variables.
- pcolor — Function to generate a pseudocolor plot with a non-regular rectangular grid.
- percentile — Determination of user selected percentile for vectors or matrices
- permutation_importance — Function to determine permutation importance, and optionally plot results. based on R2 value.
- plotall — Function to plot time series data for many variables. Automatically opens appropriate nr of figure windows.
- plotallscatter — Function to plot time series data for many variables. Automatically opens appropriate nr of figure windows.
- rank_data — A function to assign a rank to the data
- regression_summary — function to display key statistics of linear regression result
- rolling_average — Determine rolling averages to smooth out noisy data
- rotate_tick_labels — Function to rotate the tick labels of an axis using latex
- skewness — D'Agostino skewness
- skewness_test — Skewness test
- split_data — Function to split data into a single
- tiecorrect — A function to calculate the tie correction factor for H tests
- train_test_split — Function to split data into subsets, for testing and training data
- transform_MinMax — min-max scaling of data
- transform_standardization — centering and scaling of data
- zscore — Function to perform zscore scaling on data