Collate Columns Based on Content

After compose_cells, this function rearranges and rename attribute-columns in order to make columns properly aligned, based on the content of the columns.

collate_columns(
  composed_data,
  combine_threshold = 1,
  rest_cols = Inf,
  retain_other_cols = FALSE,
  retain_cell_address = FALSE
)

Arguments

composed_data	output of `compose_cells` (preferably not processed)
combine_threshold	a numerical threshold (between 0-1) for content-based collation of columns. (Default 1)
rest_cols	number of rest columns (beyond `combine_threshold` joins these many numbers of columns to keep)
retain_other_cols	whether to keep other intermediate (and possibly not so important) columns. (Default `FALSE`)
retain_cell_address	whether to keep columns like (`row`, `col`, `data_block`). This may be required for `traceback` (Default `FALSE`)

Value

A column collated data.frame

Details

Dependency on stringdist: If you have stringdist installed, the approximate string matching will be enhanced. There may be variations in outcome if you have stringdist vs if you don't have it.
Possibility of randomness: If the attribute column is containing many distinct values, then a column representative sample will be drawn. Hence it is always recommended to set.seed if reproducibility is a matter of concern.

Examples


d <- system.file("extdata", "marks_cells.rds", package = "tidycells", mustWork = TRUE) %>%
  readRDS()
d <- numeric_values_classifier(d)
da <- analyze_cells(d)

dc <- compose_cells(da, print_attribute_overview = TRUE)
#> data_block = 1
#>   minor_col_top_1_1
#>      School A
#>   major_col_top_1_1
#>      Score
#>   minor_corner_topLeft_1_1
#>      Student Name
#>   major_row_left_2_1
#>      Nakshatra Gayen, Titas Gupta, Ujjaini Gayen, Utsyo Roy
#>   major_row_left_1_1
#>      Female, Male
#> data_block = 2
#>   minor_corner_topLeft_1_1
#>      School B
#>   major_col_bottom_1_1
#>      Indranil Gayen, S Gayen, Sarmistha Senapati, Shtuti Roy
#>   major_col_bottom_2_1
#>      Female, Male
#>   minor_corner_bottomLeft_1_1
#>      Student
#>   major_row_left_1_1
#>      Score
#> data_block = 3
#>   major_col_top_1_1
#>      Score
#>   minor_corner_topLeft_1_1
#>      Name
#>   major_row_left_2_1
#>      I Roy, S Ghosh, S Senapati, U Gupta
#>   major_row_left_1_1
#>      School C
#>   minor_row_right_1_1
#>      Female, Male

collate_columns(dc)
#> # A tibble: 12 x 6
#>    collated_1 collated_2 collated_3 collated_4   collated_5         value
#>    <chr>      <chr>      <chr>      <chr>        <chr>              <chr>
#>  1 Score      Male       School A   Student Name Utsyo Roy          95   
#>  2 Score      Male       School A   Student Name Nakshatra Gayen    99   
#>  3 Score      Female     School A   Student Name Titas Gupta        89   
#>  4 Score      Female     School A   Student Name Ujjaini Gayen      100  
#>  5 Score      Male       School B   Student      Indranil Gayen     70   
#>  6 Score      Male       School B   Student      S Gayen            75   
#>  7 Score      Female     School B   Student      Sarmistha Senapati 81   
#>  8 Score      Female     School B   Student      Shtuti Roy         90   
#>  9 Score      Male       School C   Name         I Roy              50   
#> 10 Score      Male       School C   Name         S Ghosh            59   
#> 11 Score      Female     School C   Name         S Senapati         61   
#> 12 Score      Female     School C   Name         U Gupta            38

Arguments

Value

Details

Examples

Contents