Home

Introduction

Literature

Installation

Documentation

Tutorial

Data structures of the package

Multivariate adaptive histograms

Package "delt" implements methods for estimating multivariate densities. The methods include multivariate adaptive histograms (greedy histograms and CART-histograms), bootstrap aggregation of adaptive histograms, and stagewise minimization estimators.

Package "delt" is an R package. R is a language and environment for statistical computing and graphics which can be downloaded from R archive network .

Package "delt" depends on package "denpro". The density estimates of package "delt" may be visualized using the visualization tools of package "denpro". See the page for denpro .

The packages are designed by Jussi Klemelä. I am grateful for bug reports.


Contents

  1. Introduction
  2. Literature
  3. Installation
  4. Documentation
  5. Tutorial
  6. Data structures of the package

Introduction:

Multivariate adaptive histograms are histograms with data-dependent partitions. These estimates are formed by minimizing a (complexity penalized) likelihood criterion or an L2 error criterion. Greedy histograms are constructed by applying a greedy optimization to find the partition of the histogram. CART-histogram are constructed by applying a greedy optimization followed by a complexity penalized pruning. A bootstrap aggregation of adaptive histograms is constructed by generating bootstrap samples to make several adaptive histograms and then taking an average of them. A stagewise minimization estimator is a convex combination of greedy histograms.


Literature:

Stagewise minimization is described in "Density estimation with stagewise optimization of the empirical risk" .


Installation instructions:

The programs are provided as an R-package.
  • Download package delt_0.8.3.tar.gz.

  • Windows and X OS binaries are available at http://www.r-project.org/ (-> CRAN -> Choose mirror -> Packages)

  • Installation instructions are provided by issuing command R CMD INSTALL --help or command R INSTALL --help. In unix, one may use the command (possibly as root):

    R CMD INSTALL delt_0.8.0.tar.gz

    For example, in Ubuntu the package will be in directory /usr/local/lib/R/site-library/delt/ and in Suse the package will be in directory /usr/lib/R/library/delt/

  • In R, use the command

    library(delt)

    which makes the functions available.

The function "pcf.greedy.kernel" has an option to use C++ function "densplitter2", and this function can be downloaded from holme page of article "Estimation of Level Set Trees Using Adaptive Partitions".


Documentation:

Below is a listing of the functions, that are included in the package.

Greedy histograms

  • eval.greedy : returns a greedy histogram
  • lstseq.greedy : returns a sequence of greedy histograms, and optionally their level set trees and shape trees
CART-histograms
  • eval.cart : returns an CART-histogram with a given number of leafs; the output is an evaluation tree.
  • lstseq.cart : returns a sequence of CART histograms and their level set trees corresponding to a scale of smoothing parameter values
  • CART histograms step by step
    • densplit : creates an evaluation tree generating an overfitting partition
    • prune : finds a sequence of pruned evaluation trees each generating an overfitting partition
    • eval.pick : picks a pruned evaluation tree from the collection of evaluation trees
Bootstrap aggregation of histograms
  • eval.bagg : returns an estimate which is an bootstrap aggregation of greedy histograms or CART-histograms
  • lstseq.bagg : returns a sequence of level set trees of bootstrap aggregated estimates
Stagewise minimization
  • eval.stage : returns a stagewise minimization estimator
  • eval.stage.gauss : 1D density estimator; a mixture of Gaussians found by stagewise optimization
Discretized kernel estimator with an adaptive partition Clustering
  • lst.cluster : assigns labels to data points according to cluster membership, when the clusters are defined as high density regions
Other utilities
  • partition : finds the partition defined by an evaluation tree
  • plotparti : plots the partition found by function "partition"
Tree transformation
  • lefrig2par : tranforms an evaluation tree so that it can be plotted by "plottree" function from "denpro" package
  • makebina : tranforms an evaluation tree to the tree object of R so that it can be plotted by "plot.tree" function from "tree" package or by "draw.tree" function from "maptree" package
  • rf2tree: transforms the output of function "randomForest" in package "randomForest" to an evaluation tree
  • downhigh: helps to transform a simple evaluation tree to a piecewise constant function
Miscallenous
  • intpcf : returns the integral of a piecewise constant function
  • supp : calculates the bounding box of the data (the smallest rectangle containing the data)
  • findleafs: finds the leafs of a tree

In R use the command help(func) to get online manual for "func".


Tutorial:

library(delt)      # load the library

Short tutorial

Generate the data
dendat<-sim.data(n=500,seed=5,type="mulmodII")

Calculate the estimates
eva<-eval.greedy(dendat,leaf=16)

eva<-eval.cart(dendat,leaf=16)

eva<-eval.bagg(dendat,B=3,leaf=12,prune="on")

eva<-eval.stage(dendat,leaf=10,M=3)

Draw the estimates
lst<-leafsfirst(eva)
plotvolu(lst)

dp<-draw.pcf(eva,pnum=c(60,60))         
persp(dp$x,dp$y,dp$z,theta=-20,phi=30)

CART-histograms

Estimation of a mixture of 2 dimensional Gaussians


# generate data

dendat<-sim.data(n=400,noisedim=0,seed=11,type="fssk")

# estimate 

eva<-eval.cart(dendat,leaf=15)

# we get the same estimate also with the commands

bt<-densplit(dendat)
treeseq<-prune(bt)
treeseq$leafs
eva<-eval.pick(treeseq,leaf=15)  

# plot the estimate

dp<-draw.pcf(eva,pnum=c(80,80),corona=2)
persp(dp$x,dp$y,dp$z,ticktype="detailed",xlab="x",ylab="y",zlab="",
phi=25,theta=-50)

# level set trees

lst<-leafsfirst(eva)  

plotvolu(lst)

plotbary(lst,coordi=1)

# binary tree plots

lr<-lefrig2par(eva)

plottree(lr,modelabel=F)

mb<-makebina(eva)

library(tree)
plot.tree(mb)

library(maptree)
draw.tree(mb)

Projection pursuit example


# generate data

dendat<-sim.data(n=225,noisedim=3,seed=11,type="fssk")

# grow tree

bt<-densplit(dendat,minobs=5)
treeseq<-prune(bt)
treeseq$leafs

leafnum<-19
eva<-eval.pick(treeseq,leafnum)  

# level set trees

lst<-leafsfirst(eva)  

plotvolu(lst)

plotbary(lst,coordi=4,modlabret=TRUE)

# slices

sl<-slicing(eva,c(0,0,0),d1=1,d2=2)

dp<-draw.pcf(sl,pnum=c(60,60))

persp(dp$x,dp$y,dp$z,theta=-50,phi=25,
xlab="x",ylab="y",zlab="",ticktype="detailed")

Bootstrap aggregation

Estimation of a mixture of 2 dimensional Gaussians


# generate data

dendat<-sim.data(n=400,noisedim=0,seed=11,type="fssk")

# estimate the density 

seed<-1
seedf<-10
sample="worpl"
prune="on"

B<-5
leaf<-20
eva<-eval.bagg(dendat,B=B,leaf=leaf,
    seed=seed,seedf=seedf,sample=sample,prune=prune)

# level set trees

lst<-leafsfirst(eva)
td<-treedisc(lst,eva,ngrid=50)
plotvolu(td) 

plotbary(td,coordi=1,modlabret=TRUE)

Projection pursuit example


#generate data

dendat<-sim.data(n=400,noisedim=3,seed=11,type="fssk")

# estimate the density 

seed<-1
seedf<-10
sample="baggworpl"
prune="on"
B<-5
leaf<-20
eva<-eval.bagg(dendat,B=B,leaf=leaf,
    seed=seed,seedf=seedf,sample=sample,prune=prune)

# level set trees

lst<-leafsfirst(eva)
td<-treedisc(lst,eva,ngrid=50)

plotvolu(td)

plotvolu(td,xlim=c(100000,250000))

plotbary(td,coordi=1,modlabret=TRUE)

# slices

sl<-slicing(eva,c(0,0,0),d1=1,d2=2)

dp<-draw.pcf(sl,pnum=c(60,60))

persp(dp$x,dp$y,dp$z,theta=-50,phi=25,
xlab="x",ylab="y",zlab="",ticktype="detailed")

Adaptive grid

A discretized kernel estimator with an adaptive grid is described in the home page of article "Estimation of Level Set Trees Using Adaptive Partitions"

Data structures of the package

See the page for denpro for the definition of a piecewise constant function and for the definition of an evaluation tree.