Crosstab
configuration-examples Crosstab
A crosstab is a type of table in a matrix format that displays the (multivariate) frequency distribution of variables.
example
SourceData
districtname | nr_inhabitants | avg_temperature | statename | year |
---|---|---|---|---|
RegA | 100 | 17° | StateI | 2000 |
RegB | 200 | 22° | StateII | 2000 |
RegC | 150 | 25° | StateII | 2000 |
RegD | 50 | 13° | StateIII | 2000 |
… | … | … | … | … |
RegX | 250 | 19° | StateV | 2010 |
Assume this data is read from a .csv storage, the GeoDMS configuration would look like this:
unit<uint32> DistrictTimePeriod: StorageName = "%SourceDataProjDir/districs.csv" , StorageType = "gdal.vect" , StorageReadOnly = "True" { attribute<string> districtname; attribute<nr_persons> nr_inhabitants; attribute<degrees_celsius> avg_temperature; attribute<string> statename; attribute<string> year; }
result
A potential crosstab, based on the source data, could look like this (each cell value is the sum of the number inhabitants for the state in the indicated year) :
statename / year | 2000 | .. | 2010 |
---|---|---|---|
StateI | 500 | … | 800 |
.. | … | … | … |
StateV | 250 | … | 230 |
configuration steps
1) First, if not yet configured, configure state and TimePeriod as domain units.
2) Configure relations from the DistrictTimePeriod domain unit towards the configured state and TimePeriod domain units. This results in two extra attributes in the DistrictTimePeriod unit configuration:
unit<uint32> DistrictTimePeriod: ... { ... attribute<state> state_rel := rlookup(statename, state/label); attribute<TimePeriod> TimePeriod_rel := rlookup(year, TimePeriod/label); }
3) Apply the for_each function on the domain unit you would like to see as columns (in the example TimePeriod).
In the expression you aggregate the values towards the domain unit used for the rows, with a condition the data applies to the column values. The following examples show the configuration for the:
Sum of inhabitants per State and TimePeriod (see result table) :
container CrossTab_SumInhabitants := for_each_nedv( TimePeriod/name ,'sum( DistrictTimePeriod/TimePeriod_rel == ' + string(id(TimePeriod)) +'[TimePeriod] ? DistrictTimePeriod/nr_inhabitants : 0[nr_persons] , DistrictTimePeriod/state_rel)' ,state ,nr_persons );
In which TimePeriod/name is a string attribute with valid tree-item names for the TimePeriod domain unit. In the sum the number of inhabitants is applied if the condition on the TimePeriod is true, if not the value 0 is summed. This works well for quantities, but not for intensive variables like temperature. For these variables, use a missing value indication (for instance 0 / 0) in stead of zero, see next example:
Mean Temperature per State and TimePeriod
container CrossTab_MeanTemperature := for_each_nedv( TimePeriod/name ,'mean( DistrictTimePeriod/TimePeriod_rel == ' + string((id(TimePeriod)) +'[TimePeriod] ? DistrictTimePeriod/AverageTemperature : (0 / 0)[degrees_celsius] , DistrictTimePeriod/state_rel)' ,state ,degrees_celsius );