Also Check CRAN Comments
If a user builds a continuous workflow of data wrangling starting with some functions from
tidycells package or includes
tidycells in their package, then proper precautions should be taken as
tidycells functions are heuristic-based. One can face problem like column
collated_2 has been renamed to
The package has two main functions which may raise some dependability issues in future. These functions are
tidycells::collate_columns (which are based on heuristics and internal statistical logic). The main cause for potential output variation across different releases (future CRAN releases) of this package might be due to changes in
tidycells::read_cells is dependent on these functions, it will also be affected equally.
The package has been developed observing certain types of oddly structured data available to the developer (me). However, if any user has any issues in automatic understanding of the underlying structure, it is expected that the same will be attempted to address in a future release (provided the user inform me the issue). This may be referred to as the “Heuristic Maturation” process for these two functions. As and when the tests written for these functions requires a modification (which means the output column name and other major details has been changed) I’ll bump the
After first CRAN release when next
Don’t worry I’ll try my best not to break your code intentionally. This is just a message to you so that you can build careful dependency on this package.
You are most welcome to contribute to this project in any means.
Apart from opening an issue in Github (preferably with reprex), you can also contribute mainly to the “Heuristic Maturation” process for
tidycells::collate_columns. If any issue is specific to a data which you would like to share with me, there is a friendly function to mask your data using
tidycells:::mask_data (this function is not exported to avoid possible name conflicts).
The package contains optional functionality which are written as shiny widgets. These are given to the user as
visual_* series of functions. Limited tests for these are developed and tested in a few testing environments. These tests are based on shinytest package. The covr package is not yet (at least the CRAN version) capable to track code coverages in shinytest [Ref: r-lib/covr #277]. Also note that shinytest is not yet taking widget-based functions [Ref: rstudio/shinytest #157] (at least the CRAN version). That is why a set of functions is introduced to run tests for shiny.
On these grounds, codecov is used to give full coverage (without any restrictions) (ideally this should increase provided the support for covr is introduced in shinytest (and covr). Also “brush input fractional mismatch” (mentioned below) issue gets some resolution) . While the coveralls shows the coverage excluding
visual_*.R files (showing the coverage for only main functionality)
The shiny tests are only carried out in selected testing environments because of several difficulties. The difficulties are listed below.
Difficulties during shiny test:
visual_*(which are based on shiny widgets). This is the reason a set of complicated code is used during the test.
plot brushis changing in fractional values under different OS. Most of the functionalities are recorded with brush input which slightly differs. Since the
JSONcomparison is strict now, these are resulting in test failures.
LF will be replaced by CRLFwarning.] (Full message: The file will have its original line endings in your working directory. warning: LF will be replaced by CRLF). It is solved using tar files (which untar on the fly). This is the reason the all recorded tests (includes JSON) are compress to tar in tidycells/tests/testthat/testshiny/.
Given these difficulties, the shiny tests are tested in the following environments only.
|Test Environment||OS||R Version||Screenshot Tested|
|Local||Windows 10 x86 Build 9200||R version 3.6.1 (2019-07-05)||yes|
|Local||Windows 10 x64 Build 17134||R version 3.6.0 (2019-04-26)||yes|
|AppVeyor||Windows Server 2012 R2 x64 (build 9600)||R version 3.6.1 Patched (2019-07-24 r76894)||no|
|AppVeyor||Windows Server 2012 R2 x64 (build 9600)||R version 3.6.1 (2019-07-05)||no|
|AppVeyor||Windows Server 2012 R2 x64 (build 9600)||R version 3.5.3 (2019-03-11)||no|
Note: the screen-shots are also tested (apart from JSON test).
Check trackable version here.
collate_columnsfunction to deal with similar columns in composed data.frame
purrrlike formula, e.g. ~ .x for
compatibility functionfor the “Heuristic Maturation” process (after CRAN)
See other successful builds in CRAN Comments
Result : NOTE
Reason : Packages suggested but not available for checking: ‘tabulizer’, ‘xlsx’
Result : NOTE
Reason : Packages suggested but not available for checking: ‘tidyxl’ ‘plotly’
Result : PREPERROR
Reason : xml2 and httr failed to installed due to system dependency (libxml2, libssl/openssl)
Note : Neither of these errors (or notes) are attributable to the package as they failed because of induced system dependency or optional package dependency.
|macOS 10.11 El Capitan||(R-release) R version 3.6.0 (2019-04-26)||SUCCESS|
|Oracle Solaris 10, x86, 32 bit||(R-patched) R version 3.6.0 (2019-04-26)||SUCCESS|
|Windows Server 2008 R2 SP1||(R-devel) R Under development (unstable) (2019-07-04 r76780)||SUCCESS|
|Windows Server 2008 R2 SP2||(R-oldrel) R version 3.5.3 (2019-03-11)||SUCCESS|
|Windows Server 2008 R2 SP3||(R-patched) R version 3.6.0 Patched (2019-06-21 r76731)||SUCCESS|
|Windows Server 2008 R2 SP4||(R-release) R version 3.6.1 (2019-07-05)||SUCCESS|
|Windows Server 2012||(R-devel, Rtools4.0, 32/64 bit) R version 3.6.0 Under development (Testing Rtools) (2019-02-27 r76167)||SUCCESS|
|Fedora Linux||(R-devel, GCC) R Under development (unstable) (2019-08-18 r77026)||SUCCESS|
|CentOS 6 with Redhat Developer Toolset||(R from EPEL) R version 3.5.2 (2018-12-20)||SUCCESS|
|NOTE||Reason : optional package dependency|
|Fedora Linux||R-devel, clang, gfortran||NOTE|
|CentOS 6||stock R from EPEL||NOTE|
|PREPERROR||Reason : induced system dependency|
|Debian Linux||R-devel, clang, ISO-8859-15 locale||PREPERROR|
|Debian Linux||R-devel, GCC||PREPERROR|
|Debian Linux||R-devel, GCC, no long double||PREPERROR|
|Debian Linux||R-patched, GCC||PREPERROR|
|Debian Linux||R-release, GCC||PREPERROR|
|Debian Linux||R-devel, GCC ASAN/UBSAN||PREPERROR|
|Ubuntu Linux 16.04 LTS||R-devel, GCC||PREPERROR|
|Ubuntu Linux 16.04 LTS||R-release, GCC||PREPERROR|
|Ubuntu Linux 16.04 LTS||R-devel with rchk||PREPERROR|
tidycells (with bare minimum functionality) you need do following two things
Rest packages are optional and can be installed as per your requirements.
Note the package has “Induced System Dependency” which is causing to break the code sometime in R-hub. Below table describes the same (note that below table is an indicative list and may not be complete.).
|Package||Type||Reason||Implied Critical Dependency||Induced System Dependency|
|stats||Imports||tidycells::collate_columns –> tidycells:::similarity_score|
|docxtractr||Suggests||read doc and docx||LibreOffice (Suggested Dependency)|
|DT||Suggests||for visual_traceback plots|
|miniUI||Suggests||for visual_* functions|
|plotly||Suggests||optional interactive ggplot2 in visual_* functions||httr, openssl||openssl / libssl|
|rstudioapi||Suggests||object selector in Rstudio|
|shiny||Suggests||for visual_* functions||httr, openssl||openssl / libssl|
|shinytest||Suggests||shiny module tests|
|stringdist||Suggests||tidycells::collate_columns –> tidycells:::similarity_score (Enhance)|
|xlsx||Suggests||read xls (prefered option)||rJava||Java|
|XML||Suggests||read html like files|