What is Truncation in Statistics and How Does it Work?

Truncation in statistics is a process that limits values above or below a certain threshold, resulting in a truncated sample. Learn more about how it works and how it differs from statistical censorship.

Charlotte Wilson17/12/251 minute read0 Comments

51.4k

In statistics, truncation is a process that limits values above or below a certain threshold, resulting in a truncated sample. This means that the exact value of the random variable is known for all cases below the threshold, but unknown for all cases above it. A truncated mean or trimmed mean is a statistical measure of the central trend, similar to the mean and median. It involves calculating the mean after discarding a certain percentage or fixed number of points from the high and low end of a probability distribution or sample. Truncation is different from statistical censorship, which involves collecting only partial information about data values.

Censoring data means collecting only partial information about data values, while truncating data means removing data values from a dataset altogether. Truncation occurs when values beyond a limit are excluded when they are collected or excluded when analyzed. Truncated regression is used to model dependent variables for which some of the observations are not included in the analysis due to the value of the dependent variable. This statement, although it depends on the data, only imposes the restriction that the data is within the limits of truncation. In Stan, these data can be modeled following a truncated normal distribution for observations as follows. One situation where it may be advantageous to use a truncated mean is when estimating the location parameter of a Cauchy distribution, a bell-shaped probability distribution with (much) thicker tails than a normal distribution.

For example, if we want to study the size of certain fish based on the specimens caught with a net, fish smaller than the net of the net will not be present in our sample. If the truncated fraction is too small, however, its effect can be ignored when analyzing the data. This can be done with a slight rearrangement of the model variable declarations in the previous section with known truncation points.