Multidimensional analysis, dimension reduction, categorization with statistical approach - stability and reproducibility

Содержание

Слайд 2

Peculiarities : multiple parameters sparse data sets mosaic data fragmentary data

Peculiarities :
multiple parameters
sparse data sets
mosaic data
fragmentary data
misleading conventions
individual cases

Medical data

Aims :
determine

parameters distinguishing groups
predict affiliation of new element
testify individual hypotheses
Слайд 3

Normal distribution sparse data sets Medical or biological parameters usualy are

Normal distribution

sparse data sets

 

Medical or biological parameters usualy are
restricted in values

as more 0
ranged in values for several orders

Normal distribution is
unrestricted in values of parameter

Such parameters should be studied in log scale in matter of groups comparison
That could not be upplied if parameter got values ≤ 0

Groups are determined by two values for each parameter
geometric mean
geometric standard deviation factor

Слайд 4

Schematic picture Example

Schematic picture

Example

Слайд 5

Robustness and statweight We can find out that some elements of

Robustness and statweight

We can find out that some elements of

groups or their parameters could be
out of place
false affiliated

Statweight is used to make penalty for outrunned values
Normal-like one-humped function
Maximal value a bit lower than 1
Exponentially penalties
Deviations from group mean
Inaccuracy in each value
Interpreted as effective number of measurements

 

 

 

This allows to utilize margin for each individual value
Algorithm works the same for group and elements

Слайд 6

Binary classification group1 group2 new element which group is more suitable

Binary classification

group1
group2
new element

 

which group is more suitable for the element and

how we value probabilities??

 

 

 

Слайд 7

Binary classification group1 group2 new element which group is more suitable

Binary classification

group1
group2
new element

which group is more suitable for the element and

how we value probabilities for multiple dimensions??

Summarizing for multiple dimensions:
Should not be effect from fragmentary data so numbers of dimensions
No one dimension should take domination over others

We based a metric on how differ an element from a group in terms of group and element fluctuations of parameter

Should be taken mean value for classification
Value of classification for each dimension should be restricted

 

 

R value represents numerical classification of new element between two preset groups in range [0;1]
The R value is an approximation and should not be considered as probability. But it can serve as a certain factor.
All parameters are putted at same scale

fragmentary data

multiple parameters

Слайд 8

Non numerical data Discrete data for binary state parameters arbitrary pair

Non numerical data

Discrete data
for binary state parameters arbitrary pair numbers(>0) could

be given
For multistate parameters each state could be set as a parameter

Data out of measurements range
Zero or undetected level of parameter could be replaced by estimation of minimal detection level divided by a method accuracy. Enlarged deviation value should be assigned to such cases
Values, which exceeded maximum value, could be processed the same way

These substitutions should be done with new data with precautious because it can lead to certain artifacts and mistakes in interpretations.

Слайд 9

Creating new dependable parameters Certain experimental models and conditions allow to

Creating new dependable parameters

Certain experimental models and conditions allow to derive

definite assumptions that can be formed as new parameters
difference between control group and affected group
time effect for same object of study
etc.. individual cases

Layer of new parameters could also be derived out of data without any certain predictions
pair linear correlations between parameters could be valued

For each group element divide parameter value for group mean to normalize
Normalized values plotting on two dimensional plane for each pair of parameters separately
For each plot evaluating angles of lines between group center and each group elements
Calculating mean and deviations for angle
For each “pair plot” value of angle for new element could be calculated accordingly

Acquired set of new data could be processed as undependable parameters along with primary ones
We can or can not understand mechanics that arise new parameters

Angles or such derived parameters should be calculated in usual , non-logarithmic scale

correlation of parameters is a distinctive object appearing from certain processes and thus should be described separately
common pattern for calculations so it could be easily updated and scaled

Слайд 10

Data representation Each element of each group could be considered as

Data representation

Each element of each group could be considered as

a new element
graph for groups can be formed with elements of groups

Graphs of groups elements can be made
For relative affiliations by each parameter
For summarized relative affiliation

Scalar projection to line between centers of two groups

example

without fluctuation normalization

with fluctuation normalization

Using proposed metric

Слайд 11

Scalar projection to line between centers of two groups, two pairs

Scalar projection to line between centers of two groups, two pairs

of groups are considered

With pair correlations

Using proposed metric,
two pairs of groups are considered

Without pair correlations

without fluctuation normalization

with fluctuation normalization

example

Слайд 12

artifacts Sparse data sets with multiple parameters are a perfect source

artifacts

Sparse data sets with multiple parameters are a perfect source for

artifacts

example
Lets assume we got one general distribution
Rate that an element go to right wing is 0,5 same for left wing
Rate for random assigned two colors (4 elements and 5 elements) would separate is 2/2425=0,003906
So out of 1000 comparison of same distributions about 4 is roughly expected to be false discriminated

Questioning single parameter
Make rank of M parameters basing on P for null hypothesis
For given parameter recalculate Pi as Pi*M/ranki

to distinct random values separation of groups from consistent
Get more data
Studding distribution of values inside groups
Comparison with other parameters

Слайд 13

Dimensions reduction Rank of M parameters can be formed by null

Dimensions reduction

Rank of M parameters can be formed by null hypothesis

tests results for each parameter
Only given T part of top ranks to be considered
Result of null hypothesis P recalculate as P = P*M/T

Values for relative affiliations can be used to test null hypothesis

Only part of all parameters discriminates given groups

Dimensions reduction
Makes groups separation better
Lessens cost of future measurements required for classification of new elements
Unused parameters should not be forgotten but implied for multiple comparison attenuations

without dimension reduction

Using proposed metric
with fluctuation normalization

with dimension reduction

example

Слайд 14

Reproducibility Groups rearrangement Take one element out from a group Find

Reproducibility

Groups rearrangement
Take one element out from a group
Find all key

values for a data set
Repeat for each element of each group
That way we acquire rearranged data sets

sparse data sets

Rearranged data can be used for:
Finding key parameters for dimensions reduction
based on reverse results of test of null hypothesis for rearranged data sets
Representation of data

Groups rearrangement
Provides significant revaluation in cases of sparse data sets
Lessens effect of particular outrunning values of parameter
Provides more reproducible results

For each rearrangement:
rearranged data sets are different
Set of parameters could be different, due to fragmentary data
But metric for binary affiliation is the same
Its’ results could be compared directly

fragmentary data

without groups rearrangements

with groups rearrangements

example

Слайд 15

Data representation rearrangements dimension reductions Evaluation of groups separation summarizing relative

Data representation

rearrangements
dimension reductions

Evaluation of groups separation
summarizing relative affiliations on

rearranged data
For groups elements

affiliating new element
summarizing relative affiliations on full dataset
For groups elements
For new element

Affiliation value
Not probability
It is nodded to 0,5 as dimensions grow by unseparating parameters

 

 

 

 

 

This values ranged from [0;1] could be considered like probabilities

Lets consider affiliation as a parameter to calculate final affiliation

0,5

Prevalence of groups is ignored , but should be also taking into final consideration

Слайд 16

example Estimation for probabilities of affiliation to one of two groups

example

Estimation for probabilities of affiliation to one of two groups for

groups elements

Summarized relative affiliation to one of two groups for groups elements

All 378 parameters considered

All 378 parameters considered

51 parameters considered

51 parameters considered

Top 5 primary parameters considered

Top 5 primary parameters considered