Title: | A Convex Optimization Tool for Signal Reconstruction from Multiple Ranked Lists |
---|---|
Description: | A mathematical optimization procedure in combination with statistical bootstrap for the estimation of the latent signals (sometimes called scores) informing the global consensus ranking (often named aggregation ranking). To solve mid/large-scale problems, users should install the 'gurobi' optimiser (available from <https://www.gurobi.com/>). |
Authors: | Luca Vitale Developer [aut], Bastian Pfeifer Maintainer [aut, cre], Michael G. Schimek Supervision [aut] |
Maintainer: | Bastian Pfeifer Maintainer <[email protected]> |
License: | GPL-2 |
Version: | 1.0 |
Built: | 2025-03-09 04:56:52 UTC |
Source: | https://github.com/pievos101/topksignal |
The elbow plot permits the identification of subsets of objects, e.g. top-$k$ or bottom-$q$ objects. On the x-axis all objects are ordered according to their rank positions. On the y-axis the corresponding estimated signal values are displayed. The idea of the elbow plot is to scan for 'jumps' in the sequence of ordered objects ? i.e. find signal estimates next to each other that are visually much distant - in an exploratory manner. The elbowPlot function requires the estimation results from the estimateTheta function.
elbowPlot(estimation, title = "")
elbowPlot(estimation, title = "")
estimation |
Results from the estimateTheta() function |
title |
A title for the plot |
A elbow plot
data(estimatedSignal) elbowPlot(estimatedSignal)
data(estimatedSignal) elbowPlot(estimatedSignal)
Object returned by the estimateTheta() function.
A list of various values returned by the estimateTheta() function.
A data frame with the signal estimation and the standard error computed by the bootstrap for each object
The estimated matrix noise
The signal estimates from all bootstrap iterations
data(estimatedSignal)
data(estimatedSignal)
The main function for the estimation of the signals informing the ranks is called estimateTheta(). The required parameters are: (1) a rank matrix, (2) the number of bootstrap samples (500 is recommended), (3) a constant for the support variables \(b>0\), default is 0.1, (4) the type of optimization technique: fullLinear, fullQuadratic, restrictedLinear, and restrictedQuadratic (the latter two recommended), (5) the type of bootstrap sampling scheme: classic.bootstrap and poisson.bootstrap (recommended), and (6) the number of cores for parallel computation. Each bootstrap sample is executed on a dedicated CPU core.
estimateTheta( R.input, b, num.boot, solver, type, bootstrap.type, nCore = ((detectCores() - 1)) )
estimateTheta( R.input, b, num.boot, solver, type, bootstrap.type, nCore = ((detectCores() - 1)) )
R.input |
A matrix where the rows represent the objects and the columns the assessors (rankers). |
b |
The penalization term. The suggested value is 0.1. |
num.boot |
The number of boostrap samples created from the input ranked matrix. A positive number is expected. |
solver |
A string that indicates which solver to use. Two options are available, 'gurobi' and 'nloptr'. We recommend to use gurobi for faster computation. Note, a licence is required. Check the corresponding documentation on how to install gurobi. |
type |
A string that indicates which model to use: four approaches are available: 'restrictedQuadratic', 'fullQuadratic', 'restrictedLinear' and 'fullLinear'. |
bootstrap.type |
A string that indicates which bootstrap method to use: 'classic.bootstrap' or 'poisson.bootstrap'. |
nCore |
The number of cores used for computation. Each core is used to calculate the signals from a bootstrap sample. Default number is detectCores() - 1. |
A list with the estimation information obtained:
estimation - A data frame with the signal estimation and the standard error computed by the bootstrap for each object
estimatedMatrixNoise - The estimated matrix noise
time - The execution time of the procedure
allBootstraps - The signal estimates from all bootstrap iterations
library(TopKSignal) set.seed(1421) p = 8 n = 10 input <- generate.rank.matrix(p, n) rownames(input$R.input) <- c("a","b","c","d","e","f","g","h") # For the following code Gurobi needs to be installed ## Not run: estimatedSignal <- estimateTheta(R.input = input$R.input, num.boot = 50, b = 0.1, solver = "gurobi", type = "restrictedQuadratic", bootstrap.type = "poisson.bootstrap",nCore = 1) ## End(Not run) data(estimatedSignal) estimatedSignal
library(TopKSignal) set.seed(1421) p = 8 n = 10 input <- generate.rank.matrix(p, n) rownames(input$R.input) <- c("a","b","c","d","e","f","g","h") # For the following code Gurobi needs to be installed ## Not run: estimatedSignal <- estimateTheta(R.input = input$R.input, num.boot = 50, b = 0.1, solver = "gurobi", type = "restrictedQuadratic", bootstrap.type = "poisson.bootstrap",nCore = 1) ## End(Not run) data(estimatedSignal) estimatedSignal
The generate.rank.matrix() function requires the user to specify the number of objects (items), called p, and the number of assessors, called n. The function simulates full ranked lists (i.e. no missing assignments) without ties.
generate.rank.matrix(p, n, percentageMissing = 0)
generate.rank.matrix(p, n, percentageMissing = 0)
p |
The number of objects. |
n |
The number of assessors. |
percentageMissing |
The percentage of the missing values. Note, missing data should be resolved by the rank() function before calling estimateTheta(). |
A list with simulated data
R.input - The rank matrix
thea.true - The true underlying signals from the assessments
sigmas - The standard error of the noise added for each assessor
matrixNoise - The noise added to the true signals in order to get the final rank matrix
p = 8 n = 10 input <- generate.rank.matrix(p, n) rownames(input$R.input) <- c("a","b","c","d","e","f","g","h")
p = 8 n = 10 input <- generate.rank.matrix(p, n) rownames(input$R.input) <- c("a","b","c","d","e","f","g","h")
The heatmap plot allows us to control for specific error patterns associated with the assessors. The heatmap plot displays information about the noises involved in the estimation process. The rows of the noise matrix are ordered by the estimated ranks of the consensus signal values. The columns are ordered by the column error sums. In the plot, the column with the lowest sum is positioned on the left side and the column with the highest sum is positioned on the right side. Hence, assessors positioned on the left show substantial consensus and thus are more reliable than those positioned to the far right. The heatmap plot is also an exploratory tool for the search for a subset of top-ranked objects (notion of top-$k$ objects ? see the package TopKLists on CRAN for details and functions). Please note, beyond exploratory tasks, the noise matrix can serve as input for various inferential purposes such as testing for assessor group differences. The heatmapPlot function requires the estimation results obtained from the estimateTheta function.
heatmapPlot(estimation, type = "full", title = "")
heatmapPlot(estimation, type = "full", title = "")
estimation |
The bootstrap estimation obtained from the estimateTheta function |
type |
The type of method used: Two options are available, 'full' or 'reduced' |
title |
The title of the plot |
A list with:
plot - A heatmap plot with the noise matrix (ordered values).
matrixNoiseOrdered - The matrix noise ordered by the columns. The objects are ordered by the estimated value.
estimateThetaOrdered - The theta vector ordered by their importance (from the highest value to the lowest).
data(estimatedSignal) heatmapPlot(estimatedSignal)
data(estimatedSignal) heatmapPlot(estimatedSignal)
A mathematical optimization procedure in combination with statistical bootstrap for the estimation of the latent signals (sometimes called scores) informing the global consensus ranking (often named aggregation ranking). When using TopKSignal in your work please cite: Schimek, M. G. et al. (2024). Effective signal reconstruction from multiple ranked lists via convex optimization. Data Mining and Knowledge Discovery. DOI: 10.1007/s10618-023-00991-z. The goal of estimating consensus signals and therefrom consensus ranks (an alternative form of aggregation ranks) across a number of assessors (humans or machines) is achieved via indirect inference. The input rank matrix is fully represented by order constraints. No distance measures or distributional assumptions are involved. The indirect inference procedure is built around a simple signal plus noise model. TopKSignal implements a set of different functions. They permit to construct artificial ranked lists, to derive sets of constraints from an input rank matrix, to run convex optimization (with a quadratic or a linear objective function), to perform bootstrap estimation (standard or Poisson bootstrap), and to produce numerical and graphical output. Different mathematical optimization techniques are available: Optimization with the full set of constraints or with a computationally cheaper restricted set of constraints in combination with either a quadratic or a linear objective function. Different boostrap sample schemes are available: the classical bootstrap and the computationally less demanding Poisson bootstrap.
The main function for the estimation of the signals informing the ranks is called estimateTheta(). The required parameters are: (1) a rank matrix, (2) the number of bootstrap samples (500 is recommended), (3) a constant for the support variables \(b>0\), default is 0.1, (4) the type of optimization technique: fullLinear, fullQuadratic, restrictedLinear, and restrictedQuadratic (the latter two recommended), (5) the type of bootstrap sampling scheme: classic.bootstrap and poisson.bootstrap (recommended), and (6) the number of cores for parallel computation. Each bootstrap sample is executed on a dedicated CPU core.
The generate.rank.matrix() function requires the user to specify the number of objects (items), called p, and the number of assessors, called n. The function simulates full ranked lists (i.e. no missing assignments) without ties.
The violin plot displays the bootstrap distribution of the estimated signals along with its means. The deviations from the mean values +/-2 standard errors SE and are shown in the plot. Analyzing the shape of the distribution and the standard error of the signal of each object, it is possible to evaluate its rank stability with respect to all other objects. The violinPlot function requires (1) the result obtained by the estimation procedure and (2) the 'true' (simulated) signals or ground truth (when available).
The heatmap plot allows us to control for specific error patterns associated with the assessors. The heatmap plot displays information about the noises involved in the estimation process. The rows of the noise matrix are ordered by the estimated ranks of the consensus signal values. The columns are ordered by the column error sums. In the plot, the column with the lowest sum is positioned on the left side and the column with the highest sum is positioned on the right side. Hence, assessors positioned on the left show substantial consensus and thus are more reliable than those positioned to the far right. The heatmap plot is also an exploratory tool for the search for a subset of top-ranked objects (notion of top-$k$ objects ? see the package TopKLists on CRAN for details and functions). Please note, beyond exploratory tasks, the noise matrix can serve as input for various inferential purposes such as testing for assessor group differences. The heatmapPlot function requires the estimation results obtained from the estimateTheta function.
The elbow plot permits the identification of subsets of objects, e.g. top-$k$ or bottom-$q$ objects. On the x-axis all objects are ordered according to their rank positions. On the y-axis the corresponding estimated signal values are displayed. The idea of the elbow plot is to scan for 'jumps' in the sequence of ordered objects ? i.e. find signal estimates next to each other that are visually much distant - in an exploratory manner. The elbowPlot function requires the estimation results from the estimateTheta function.
library(TopKSignal) set.seed(1421) p = 8 n = 10 input <- generate.rank.matrix(p, n) rownames(input$R.input) <- c("a","b","c","d","e","f","g","h") # For the following code Gurobi needs to be installed ## Not run: estimatedSignal <- estimateTheta(R.input = input$R.input, num.boot = 50, b = 0.1, solver = "gurobi", type = "restrictedQuadratic", bootstrap.type = "poisson.bootstrap",nCore = 1) ## End(Not run) data(estimatedSignal) estimatedSignal
library(TopKSignal) set.seed(1421) p = 8 n = 10 input <- generate.rank.matrix(p, n) rownames(input$R.input) <- c("a","b","c","d","e","f","g","h") # For the following code Gurobi needs to be installed ## Not run: estimatedSignal <- estimateTheta(R.input = input$R.input, num.boot = 50, b = 0.1, solver = "gurobi", type = "restrictedQuadratic", bootstrap.type = "poisson.bootstrap",nCore = 1) ## End(Not run) data(estimatedSignal) estimatedSignal
violinPlot
violinPlot(estimation, trueSignal = NULL, title = NULL)
violinPlot(estimation, trueSignal = NULL, title = NULL)
estimation |
The estimation list from the 'estimateTheta' function |
trueSignal |
The true signal (if available) |
title |
The title of the plot |
A violint plot with the estimated distribution of each object
data(estimatedSignal) violinPlot(estimatedSignal)
data(estimatedSignal) violinPlot(estimatedSignal)