Anna Quaglieri | R-Ladies Melbourne Meetup | 3 Aug 2021
Art by Danielle Navarro, Silhouette in Teal (2021) Random walk, flametree L-system
Got my Bachelor and Master in Statistics between the Universities of Bologna, Glasgow and Melbourne
I'm a Bioinformatics Data Scientist at the Melbourne based startup Mass Dynamics
Mass Dynamics is on a mission to free humanity and society from the burden of disease by helping more life scientists transform proteomics data to knowledge - better, faster and easier.
I'm a Bioinformatics Data Scientist at the Melbourne based startup Mass Dynamics
Mass Dynamics is on a mission to free humanity and society from the burden of disease by helping more life scientists transform proteomics data to knowledge - better, faster and easier.
π‘π» Work in a fun team: Work with a fun interdisciplinary team of scientists, developers, marketing savvy.
π‘π» Work in a fun team: Work with a fun interdisciplinary team of scientists, developers, marketing savvy.
π Learn: Learn the intricacies and amazingness of mass spectrometry (= most used technique to quantify proteins in a sample) & what life scientists need to make the best use of their experiment.
π‘π» Work in a fun team: Work with a fun interdisciplinary team of scientists, developers, marketing savvy.
π Learn: Learn the intricacies and amazingness of mass spectrometry (= most used technique to quantify proteins in a sample) & what life scientists need to make the best use of their experiment.
π©βπ» Code in R: Assemble workflows in R to analyse mass spectrometry data.
π‘π» Work in a fun team: Work with a fun interdisciplinary team of scientists, developers, marketing savvy.
π Learn: Learn the intricacies and amazingness of mass spectrometry (= most used technique to quantify proteins in a sample) & what life scientists need to make the best use of their experiment.
π©βπ» Code in R: Assemble workflows in R to analyse mass spectrometry data.
π Open Science: Learn and strive for reproducibility and openness in what we produce.
π‘π» Work in a fun team: Work with a fun interdisciplinary team of scientists, developers, marketing savvy.
π Learn: Learn the intricacies and amazingness of mass spectrometry (= most used technique to quantify proteins in a sample) & what life scientists need to make the best use of their experiment.
π©βπ» Code in R: Assemble workflows in R to analyse mass spectrometry data.
π Open Science: Learn and strive for reproducibility and openness in what we produce.
π₯ In a nutshell: Study & Learn, think, build solutions (mainly in R packages)β debug, debug, debug, repeat.
Art by Will Chase, USA, Terrazzo, confetti (2021) Voronoi Tessellation, Poisson disc sampling
My highlights also corresponds to talks presented in our timezone
All talks and workshops will be made available online very very soon! I'll keep you posted
{lterdatasampler}
: LTER Data Sampler π¦ (LTER = Long Term Ecological Research program (LTER) Network){lterdatasampler}
: LTER Data Sampler π¦ (LTER = Long Term Ecological Research program (LTER) Network){lterdatasampler}
: LTER Data Sampler π¦ (LTER = Long Term Ecological Research program (LTER) Network)A great way to learn how to build an R π¦ is to create a data-package (package that only includes data)
I don't have super complex, cool new functions... I cannot write an R π¦
{lterdatasampler}
: LTER Data Sampler π¦ (LTER = Long Term Ecological Research program (LTER) Network)A great way to learn how to build an R π¦ is to create a data-package (package that only includes data)
I don't have super complex, cool new functions... I cannot write an R π¦
Free gift! Data packages are an enormously useful tool for teaching purposes (how many times have you used the {iris}
dataset [1]??!!)
R. A. Fisher (1936). "The use of multiple measurements in taxonomic problems". Annals of Eugenics. 7 (2): 179β188. doi:10.1111/j.1469-1809.1936.tb02137.x. hdl:2440/15227
{palmerpenguins}
is the new {iris}
π·The π{palmerpenguins}
π¦ provides a great dataset for data exploration & visualization, as an alternative to iris.
install.packages("palmerpenguins")library(palmerpenguins)
{palmerpenguins}
is the new {iris}
π·The π{palmerpenguins}
π¦ provides a great dataset for data exploration & visualization, as an alternative to iris.
install.packages("palmerpenguins")library(palmerpenguins)
Meet the penguins!
Artwork by @allison_horst
library(palmerpenguins)library(dplyr)library(DT)penguins %>% head()
## # A tibble: 6 Γ 8## species island bill_length_mm bill_depth_mm flipper_length_β¦ body_mass_g sex ## <fct> <fct> <dbl> <dbl> <int> <int> <fct>## 1 Adelie Torgeβ¦ 39.1 18.7 181 3750 male ## 2 Adelie Torgeβ¦ 39.5 17.4 186 3800 femaβ¦## 3 Adelie Torgeβ¦ 40.3 18 195 3250 femaβ¦## 4 Adelie Torgeβ¦ NA NA NA NA <NA> ## 5 Adelie Torgeβ¦ 36.7 19.3 193 3450 femaβ¦## 6 Adelie Torgeβ¦ 39.3 20.6 190 3650 male ## # β¦ with 1 more variable: year <int>
Get started with πR Packages by Hadley Wickham. Easy to read and very comprehensive.
What you need to get started:
Get started with πR Packages by Hadley Wickham. Easy to read and very comprehensive.
What you need to get started:
Code (one function is enough) AND/OR data (it doesn't have to be large!)
An R project in a new folder /path/to/myPackage
Get started with πR Packages by Hadley Wickham. Easy to read and very comprehensive.
What you need to get started:
Code (one function is enough) AND/OR data (it doesn't have to be large!)
An R project in a new folder /path/to/myPackage
Run the code usethis::create_package("/path/to/myPackage")
(more info https://r-pkgs.org/workflows101.html).
This will create the metadata and other files that you need to package the package up!
Get started with πR Packages by Hadley Wickham. Easy to read and very comprehensive.
What you need to get started:
Code (one function is enough) AND/OR data (it doesn't have to be large!)
An R project in a new folder /path/to/myPackage
Run the code usethis::create_package("/path/to/myPackage")
(more info https://r-pkgs.org/workflows101.html).
This will create the metadata and other files that you need to package the package up!
You're setup!
usethis::create_package("/path/to/myPackage")
usethis::create_package("/path/to/myPackage")
data_fake <- data.frame(First = seq(1:200), Second = rep("A", 200))usethis::use_data(data_fake)
R Packages by Hadley Wickham
{fusen}
π¦: Create a package from Rmd{fusen}
π¦: Create a package from RmdIf you know how to create a Rmarkdown file, then you know how to build a package.
{fusen}
{fusen}
π¦: Create a package from RmdIf you know how to create a Rmarkdown file, then you know how to build a package.
{fusen}
Philosophy You don't need to move around functions and files to create a package, you only need your Rmd with functions, documentation, tests, examples.
{fusen}
π¦: Create a package from RmdIf you know how to create a Rmarkdown file, then you know how to build a package.
{fusen}
Philosophy You don't need to move around functions and files to create a package, you only need your Rmd with functions, documentation, tests, examples.
install.packages("fusen")library(fusen)
Rmd
first approach to write an R π¦Write your Rmd
using some prefix to name code chunks, e.g. description
, function
, tests
, examples
These prefixes will tell {fusen}
how to create your package
Rmd
first approach to write an R π¦Write your Rmd
using some prefix to name code chunks, e.g. description
, function
, tests
, examples
These prefixes will tell {fusen}
how to create your package
Inflate!
{OpenIntro}
depends on 3 other data packages. See package πDESCRIPTION
{bayesrules}
π¦The {bayesrules}
package contains tools for teaching and learning tools for teaching (and learning) Bayesian statistics
The package accompanies the open-access πBayes Rules! An Introduction to Bayesian Modeling with R
Art by Ijeamaka Anyene, USA, Sunset (2021)
{grDevices}
π¦{grid}
π¦ is low-level system for plotting within R ({ggplot2}
π¦ is based on this)ggplot2
library(grid)
You can build illustrator like viz!
{virgo}
π¦{virgo}
π¦Allows to easily build interactive graphics for exploratory data analysis
Allows cross interactivity between plots without having to build a Shiny app
{virgo}
π¦Allows to easily build interactive graphics for exploratory data analysis
Allows cross interactivity between plots without having to build a Shiny app
virgo
plots also works within Shiny
{virgo}
in action!library(virgo)library(palmerpenguins)
selection <- select_interval()
p <- penguins %>% vega() %>% mark_circle( enc( x = bill_length_mm, y = bill_depth_mm, color = encode_if(selection, species, "black") ) )
p
p_right <- penguins %>% vega(enc(x = body_mass_g)) %>% mark_histogram(bin = list(maxbins = 20)) %>% mark_histogram(color = "purple", bin = list(maxbins = 20), selection = selection) %>% mark_rule(enc(x = vg_mean(body_mass_g)), color = "red", size = 4, selection = selection)
p_right
{microshades}
π¦πIntroduction to microshades
Provide custom color shading palettes that improves:
remotes::install_github("KarstensLab/microshades")
{microshades}
π¦πIntroduction to microshades
Provide custom color shading palettes that improves:
remotes::install_github("KarstensLab/microshades")
Two crafted colour palettes:
microshades_cvd_palettes
microshades_palettes
Total of 30 available colours per palette.
{microshades}
in action!π§ {palmerpenguins}
with {microshades}
example code: https://karstenslab.github.io/microshades/articles/non-microbiome_data.html
Art by Antonio SΓ‘nchez, Spain, Jellyfish (2018), Sines and cosines
{autotest}
π¦: Automatic testing for R packagesrOpenSci
πIntroduction to {autotest}
install.packages("autotest")
{autotest}
π¦: Automatic testing for R packagesrOpenSci
πIntroduction to {autotest}
install.packages("autotest")
{autotest}
goes into the examples of your R π¦ functions and mutates (aka changes) the inputs parameters to function calls.{autotest}
π¦: Automatic testing for R packagesrOpenSci
πIntroduction to {autotest}
install.packages("autotest")
{autotest}
goes into the examples of your R π¦ functions and mutates (aka changes) the inputs parameters to function calls.
This allows to check for robustness of the package to several inputs
{autotest}
in actionslibrary(autotest)y <- autotest_package(package = "stats", functions = "var", test = TRUE)
{tinytest}
π¦πIntroduction to {tinytest}
install.packages("tinytest")
The purpose is to facilitate the development of unit testing of R π¦
It provides you with better stats and ideas where the errors actually occurred
{tinytest}
π¦πIntroduction to {tinytest}
install.packages("tinytest")
The purpose is to facilitate the development of unit testing of R π¦
It provides you with better stats and ideas where the errors actually occurred
[1] M van der Loo (2017). tinytest: R package version 1.2.4. https://cran.r-project.org/package=tinytest
[2] MPJ van der Loo (2020) A method for deriving information from running R code. R-Journal (Accepted) https://arxiv.org/abs/2002.07472
{tinytest}
in action!πOverview of {tinytest}
functionalities
library(tinytest)addOne <- function(x) x + 1subOne <- function(x) x - 2
{tinytest}
in action!πOverview of {tinytest}
functionalities
library(tinytest)addOne <- function(x) x + 1subOne <- function(x) x - 2
# this test should passtinytest::expect_equal(addOne(1), 2 )
## ----- PASSED : <-->## call| tinytest::expect_equal(addOne(1), 2)
{tinytest}
in action!πOverview of {tinytest}
functionalities
library(tinytest)addOne <- function(x) x + 1subOne <- function(x) x - 2
# this test should passtinytest::expect_equal(addOne(1), 2 )
## ----- PASSED : <-->## call| tinytest::expect_equal(addOne(1), 2)
# this test will failtinytest::expect_equal(subOne(2), 1 )
## ----- FAILED[data]: <-->## call| tinytest::expect_equal(subOne(2), 1)## diff| Expected '1', got '0'
{validate}
π¦The purpose is to provide easy to use tools to check that you're data is valid!
[1] MPJ van der Loo and E de Jonge (2020). Data Validation Infrastructure for R. Journal of Statistical Software, Accepted for publication. https://arxiv.org/abs/1912.09759
[2] MPJ van der Loo (2020) The Data Validation Cookbook version 1.0.1. https://data-cleaning.github.io/validate
library(palmerpenguins)head(penguins)
## # A tibble: 6 Γ 8## species island bill_length_mm bill_depth_mm flipper_length_β¦ body_mass_g sex ## <fct> <fct> <dbl> <dbl> <int> <int> <fct>## 1 Adelie Torgeβ¦ 39.1 18.7 181 3750 male ## 2 Adelie Torgeβ¦ 39.5 17.4 186 3800 femaβ¦## 3 Adelie Torgeβ¦ 40.3 18 195 3250 femaβ¦## 4 Adelie Torgeβ¦ NA NA NA NA <NA> ## 5 Adelie Torgeβ¦ 36.7 19.3 193 3450 femaβ¦## 6 Adelie Torgeβ¦ 39.3 20.6 190 3650 male ## # β¦ with 1 more variable: year <int>
table(penguins$island, penguins$species)
## ## Adelie Chinstrap Gentoo## Biscoe 44 0 124## Dream 56 68 0## Torgersen 52 0 0
validator
with rulesvalidator
with rulesvalidator
with rulesSeparate multiple validations by a comma
The example shows multivariate validation including completeness validation (is_complete
) and conditional validations.
validator
with rulesSeparate multiple validations by a comma
The example shows multivariate validation including completeness validation (is_complete
) and conditional validations.
library(validate)rules <- validator(flipper_length_mm > 0, is_complete(bill_depth_mm, flipper_length_mm, bill_depth_mm), if(island %in% "Biscoe") species %in% c("Adelie"))
validator
with rulesSeparate multiple validations by a comma
The example shows multivariate validation including completeness validation (is_complete
) and conditional validations.
library(validate)rules <- validator(flipper_length_mm > 0, is_complete(bill_depth_mm, flipper_length_mm, bill_depth_mm), if(island %in% "Biscoe") species %in% c("Adelie"))
confront(penguins, rules) %>% summary()
## name items passes fails nNA error warning## 1 V1 344 342 0 2 FALSE FALSE## 2 V2 344 342 2 0 FALSE FALSE## 3 V3 344 220 124 0 FALSE FALSE## expression## 1 flipper_length_mm > 0## 2 is_complete(bill_depth_mm, flipper_length_mm, bill_depth_mm)## 3 !(island %vin% "Biscoe") | (species %vin% c("Adelie"))
Art by Will Chase, USA, Triangle disintegration (2019), Curl noise, trigonometry
R packages that Sevvandi developed to find anomalies in high-dimensional data:
R packages that Sevvandi developed to find anomalies in high-dimensional data:
Art by Ijeamaka Anyene, USA, Clouds (2021)
An RSE builds software for research
Generally writes code and teaches about software to researchers
An RSE builds software for research
Generally writes code and teaches about software to researchers
Consult researcher with any kind of software problem
An RSE builds software for research
Generally writes code and teaches about software to researchers
Consult researcher with any kind of software problem
It's a bit too much to expect that the researcher would do all of those things + their research!
The RSE comes to help!
You can become part of the community!
Art by Will Chase, USA, Bubble strings (2021), Flow fields, circle packing, perlin noise
Example from my experience.
Example from my experience.
π©βπ» I'm a data scientist at Mass Dynamics and I build R π¦ to analyse mass spectrometry data
But, Mass Dynamics wants to make the functionalities of the R packages easily available also to π©βπ¬ life scientist, reducing the barrier of having to learn to code
Example from my experience.
π©βπ» I'm a data scientist at Mass Dynamics and I build R π¦ to analyse mass spectrometry data
But, Mass Dynamics wants to make the functionalities of the R packages easily available also to π©βπ¬ life scientist, reducing the barrier of having to learn to code
The solution: the life scientist can interact with an easy user interface (UI, aka frontend) which runs my R π¦ in the background (aka backend)
Example from my experience.
π©βπ» I'm a data scientist at Mass Dynamics and I build R π¦ to analyse mass spectrometry data
But, Mass Dynamics wants to make the functionalities of the R packages easily available also to π©βπ¬ life scientist, reducing the barrier of having to learn to code
The solution: the life scientist can interact with an easy user interface (UI, aka frontend) which runs my R π¦ in the background (aka backend)
Every time a scientist interacts with the UI, the R π¦ is run -> This is R in production π!
There is a bit of engineering setup and jargon to digest & there are various of way of accomplishing this task!
There is a bit of engineering setup and jargon to digest & there are various of way of accomplishing this task!
Tricky aspects:
There is a bit of engineering setup and jargon to digest & there are various of way of accomplishing this task!
Tricky aspects:
There is a bit of engineering setup and jargon to digest & there are various of way of accomplishing this task!
Tricky aspects:
A lot of aspects are around the engineering setup (which is not my expertise!)
However, from my side I need to make sure that:
There is a bit of engineering setup and jargon to digest & there are various of way of accomplishing this task!
Tricky aspects:
A lot of aspects are around the engineering setup (which is not my expertise!)
However, from my side I need to make sure that:
all dependencies needed by my R packages are available in production
all packages are put into production with a defined version to allow reproducibility
There is a bit of engineering setup and jargon to digest & there are various of way of accomplishing this task!
Tricky aspects:
A lot of aspects are around the engineering setup (which is not my expertise!)
However, from my side I need to make sure that:
all dependencies needed by my R packages are available in production
all packages are put into production with a defined version to allow reproducibility
Managing dependencies can be really tricky!
You find them in the DESCRIPTION
file of a package, disguising under:
Depends
, Imports
, SystemRequirements
(dependencies external from R)
Example from the {sf}
π¦ https://github.com/r-spatial/sf/blob/master/DESCRIPTION
How do you determine all the dependencies needed to reproduce an R package/project environment to make it reporodcible, open, shareable, safe for production?
My summary of suggestions after discussing with speakers at useR! 2021:
{renv}
π¦:renv::snapshot()
saves the state of the project library to the lockfile (called renv.lock
)How do you determine all the dependencies needed to reproduce an R package/project environment to make it reporodcible, open, shareable, safe for production?
My summary of suggestions after discussing with speakers at useR! 2021:
renv::snapshot()
saves the state of the project library to the lockfile (called renv.lock
)Hard code your dependency, start minimal and grow:
imports
(looking at the DESCRIPTION
file) and then look for system dependencies r-hub/sysreqs π¦ provides a database with API to quickly find out which packages or other software needs to be available to build and use R packages.
sysreqs::sysreq_commands(desc = "path/to/a/DESCRIPTION/file")
runs all the commands to install the necessary runtime system dependencies.r-hub/sysreqs π¦ provides a database with API to quickly find out which packages or other software needs to be available to build and use R packages.
sysreqs::sysreq_commands(desc = "path/to/a/DESCRIPTION/file")
runs all the commands to install the necessary runtime system dependencies.rstudio/r-system-requirements: RStudio independently maintained catalogue of dependencies, used to power the RStudio package manager
r-hub/sysreqs π¦ provides a database with API to quickly find out which packages or other software needs to be available to build and use R packages.
sysreqs::sysreq_commands(desc = "path/to/a/DESCRIPTION/file")
runs all the commands to install the necessary runtime system dependencies.rstudio/r-system-requirements: RStudio independently maintained catalogue of dependencies, used to power the RStudio package manager
{maketools} π¦ To get runtime dependencies (only for Linux)
maketools::package_sysdeps("stringi")
## # A tibble: 1 Γ 6## shlib package headers source version url ## <chr> <chr> <chr> <chr> <chr> <chr>## 1 libc++.1.dylib <NA> <NA> <NA> <NA> <NA>
Suggestions from speakers at useR! 2021:
Other resources:
(Not from useR!) But I found it really useful!
Phylosphy: Wrapping up everything with Docker π³ and using {plumber}
π¦ to generate and API for R.
Yonder 1831, 2021 by Thomas Lin Pedersen (Denmark). Flow lines, nearest neighbour, texture blending
Meet Xaringan: Making slides in R Markdown
by Alison Hill, learn how to make beautiful slides using {xaringan}
.{xaringan}
π¦ by Yihui Xie (2021). xaringan: Presentation Ninja. R package version 0.22. https://CRAN.R-project.org/package=xaringan Art by Ijeamaka Anyene, USA, Arcs IV (2020)
You can fine me at:
Keyboard shortcuts
β, β, Pg Up, k | Go to previous slide |
β, β, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |