nan_hist Statistical Visualization nan_normplot

NaN Toolbox >> NaN Toolbox > Statistical Visualization > nan_nhist

nan_nhist

Histogram

Calling Sequence

[theText, rawN, x] = nan_nhist(cellValues, 'parameter', value, ...)
t = nan_nhist(Y)
[t, N, X]= nan_nhist(...)
nan_nhist(Y,'PropertyName', . . . )
nan_nhist(Y,'PropertyName',PropertyValue, . . . )

Parameters

Histogram and bin settings:

'binfactor':

Effects the number of bins used. A larger number

'samebins':

this will make all the bins align with each other

'minbins':

The minimum number of bins allowed for each graph

'maxbins':

The maximum number of bins allowed for each graph

'stdtimes':

Number of times the standard deviation to set the

'minx':

crop the axis and histogram on the left. 'xmin'

'maxx':

crop the axis and histogram on the right. 'xmax'

'proportion':

Plot proportion of total points on the y axis

'pdf':

Plot the pdf on the y axis

'numbers':

Plot the raw numbers on the graph. 'number'

'smooth':

Plot a smooth line instead of the step function.

Text related parameters:

'titles','legend':

A cell array with strings to put in the legend or

'nolengend':

In case you pass a struct, you may force a legend

'text':

Outputs all numbers to text, even ones that are

'decimalplaces':

Number of decimal places numbers will be output

'npoints':

this will add (number of points) to the legend or

'xlabel':

Label of the lowest X axis

'ylabel':

Label of the Y axis, note that the ylabel default

'fsize':

Font size, default 12. 'fontsize'

'location':

Sets the location of the legend,

example:

NorthOutside. 'legendlocation'

Peripheral elements settings:

'median':

This will plot a stem plot of the median

'mode':

This will plot a stem plot of the mode

'serror':

Will put the mean and 'standard error' bars above

'noerror':

Will remove the mean and standard deviation error

'linewidth':

Sets the width of the lines for all the graphs

'color':

Sets the colormap to decide the colors of the

General Figure Settings:

'separate':

Plot each histogram separately, also use normal

'newfig':

Will make a new figure to plot it in. When using

'eps':

EPS file name of the generated plot to save. It

Description

t = nhist(Y) bins the elements of Y into equally spaced containers and returns a string with information about the distributions. If Y is a cell array or a list nhist will make graph the binned (discrete) probability density function of each data set for comparison on the same graph. It will return A cell array or structure which includes a string for each set of data.

[t, N, X]= nhist(...) also returns the number of items in each bin, N, and the locations of the left edges of each bin. If Y is a cell array or structure then the output is in the same form.

__________________________________________________________________________

Summary of what function does: 1) Automatically sets the number and range of the bins to be appropriate for the data.

2) Compares multiple sets of data elegantly on one or more plots, with legend or titles. It also graphs the mean and standard deviations. It can also plot the median and mode.

3) Outputs text with the usefull statistics for each distribution.

4) Allows for changing many more parameters

Highlighted features (see below for details)

'separate' to plot each set on its own axis, but with the same bounds

'binfactor' change the number of bins used, larger value =more bins

'samebins' force all bins to be the same for all plots

'legend' add a legend in the graph (default for structs)

'noerror' remove the mean and std plot from the graph

'median' add the median of the data to the graph

'text' return many details about each graph even if not plotted

Optional Properties

Note: Alternative names to call the properties are listed at the end of each entry.

The bin width is defined in the following way Disclaimer: this function is specialized to compare data with comparable standard deviations and means, but greatly varying numbers of points.

Scotts Choice used for this function is a theoretically ideal way of choosing the number of bins. Of course the theory is general and so not rigorous, but I feel it does a good job. (bin width) = 3.5*std(data points)/(number of points)^(1/3);

I did not follow it exactly though, restricting smaller bin sizes to be divisible by the larger bin sizes. In this way the different conditions can be accurately compared to each other.

The bin width is further adulterated by user parameter 'binFactor' (new bin width) = (old bin width) / (binFactor); it allows the user to make the bins larger or smaller to their tastes. Larger binFactor means more bins. 1 is the default

Source: http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width

Default function behaviour

If you pass it a structure, the field names will become the legend. All of the data outputted will be in structure form with the same field names. If you pass a cell array, then the output will be in cell form. If you pass an array or vector then the data is outputted as a string and two arrays.

standard deviation will be plotted as a default, unless one puts in the 'serror' paramter which will plot the standard error = std/sqrt(N)

There is no maximum or minimum X values.

minBins=10; The minimum number of bins for the histogram

maxBins=100;The maximum number of bins for a histogram

AxisFontSize = 12; 'fsize' the fontsize of everything.

The number of data points is not displayed

The lines in the histograms are black

faceColor = [.7 .7 .7]; The face of the histogram is gray.

It will plot inside a figure, unless 'newfig' is passed then it will make a new figure. It will take over and refit all axes.

linewidth=2; The width of the lines in the errobars and the histogram

stdTimes=4; The axes will be cutoff at a maximum of 4 times the standard deviation from the mean. Different data sets will be plotted with a different number of bins.

Acknowledgments

Thank you to the AP-Lab at Boston University for funding me while I developed this function. Thank you to the AP-Lab, Avi and Eli for help with designing and testing it.

Examples

A=list(rand(1,10^5,'normal'),rand(10^3,1,'normal')+1);
nan_nhist(A);
nan_nhist(A,'legend',['u=0','u=1']);
nan_nhist(A,'legend',['u=0','u=1'],'separate');
nan_nhist(A,'color','summer')
nan_nhist(A,'color',[.3 .8 .3],'separate')
nan_nhist(A,'binfactor',4)
nan_nhist(A,'samebins')
nan_nhist(A,'median','noerror')

// example #1: variations around an histogram of a gaussian random sample
d=rand(1,10000,'normal');
clf();nan_nhist(d,'proportion')
clf();nan_nhist(d)
clf();nan_nhist(d,'legend','rand(1,10000,''normal'')','color',[1 0 0],'proportion')

//example #2: histogram of a binomial (B(6,0.5)) random sample
d = grand(1000,1,"bin", 6, 0.5);
clf()
subplot(2,1,1)
nan_nhist(d,'proportion','legend',"normalized histogram")
subplot(2,1,2)
nan_nhist(d,'legend',"non normalized histogram")

// example #3: histogram of an exponential random sample
lambda = 2;
X = grand(100000,1,"exp", 1/lambda);
Xmax = max(X);
clf()
nan_nhist(X,'pdf','minx',0,'maxx',max(Xmax));
x = linspace(0,max(Xmax),100)';
plot2d(x,lambda*exp(-lambda*x),strf="000",style=5)
legends(["exponential random sample histogram" "exact density curve"],[1,5],opt="ur");

Authors

nan_hist Statistical Visualization nan_normplot