After compose_cells
, this function rearranges and rename attribute-columns in order to
make columns properly aligned, based on the content of the columns.
collate_columns( composed_data, combine_threshold = 1, rest_cols = Inf, retain_other_cols = FALSE, retain_cell_address = FALSE )
composed_data | output of |
---|---|
combine_threshold | a numerical threshold (between 0-1) for content-based collation of columns. (Default 1) |
rest_cols | number of rest columns (beyond |
retain_other_cols | whether to keep other intermediate (and possibly not so important) columns. (Default |
retain_cell_address | whether to keep columns like ( |
A column collated data.frame
Dependency on stringdist: If you have stringdist
installed,
the approximate string matching will be enhanced. There may be variations in outcome if you have stringdist
vs if you don't have it.
Possibility of randomness: If the attribute column is containing many distinct values, then a column representative sample will be drawn.
Hence it is always recommended to set.seed
if reproducibility is a matter of concern.
d <- system.file("extdata", "marks_cells.rds", package = "tidycells", mustWork = TRUE) %>% readRDS() d <- numeric_values_classifier(d) da <- analyze_cells(d) dc <- compose_cells(da, print_attribute_overview = TRUE)#> data_block = 1 #> minor_col_top_1_1 #> School A #> major_col_top_1_1 #> Score #> minor_corner_topLeft_1_1 #> Student Name #> major_row_left_2_1 #> Nakshatra Gayen, Titas Gupta, Ujjaini Gayen, Utsyo Roy #> major_row_left_1_1 #> Female, Male #> data_block = 2 #> minor_corner_topLeft_1_1 #> School B #> major_col_bottom_1_1 #> Indranil Gayen, S Gayen, Sarmistha Senapati, Shtuti Roy #> major_col_bottom_2_1 #> Female, Male #> minor_corner_bottomLeft_1_1 #> Student #> major_row_left_1_1 #> Score #> data_block = 3 #> major_col_top_1_1 #> Score #> minor_corner_topLeft_1_1 #> Name #> major_row_left_2_1 #> I Roy, S Ghosh, S Senapati, U Gupta #> major_row_left_1_1 #> School C #> minor_row_right_1_1 #> Female, Malecollate_columns(dc)#> # A tibble: 12 x 6 #> collated_1 collated_2 collated_3 collated_4 collated_5 value #> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 Score Male School A Student Name Utsyo Roy 95 #> 2 Score Male School A Student Name Nakshatra Gayen 99 #> 3 Score Female School A Student Name Titas Gupta 89 #> 4 Score Female School A Student Name Ujjaini Gayen 100 #> 5 Score Male School B Student Indranil Gayen 70 #> 6 Score Male School B Student S Gayen 75 #> 7 Score Female School B Student Sarmistha Senapati 81 #> 8 Score Female School B Student Shtuti Roy 90 #> 9 Score Male School C Name I Roy 50 #> 10 Score Male School C Name S Ghosh 59 #> 11 Score Female School C Name S Senapati 61 #> 12 Score Female School C Name U Gupta 38