background-image: url(images/gbg.png), url(images/R_logo.png) background-position: 0% 100%, 100% 0% background-size: 40%, 10% class: title-page, center, middle ## RStudio, R packages, and R project ### A typical data science workflow in R
--- class: about-me-slide, inverse, middle, center ## About the trainer <img style="border-radius: 80%;" src="images/ezekiel.jpeg" width="180px"/> ### Ezekiel Adebayo Ogundepo #### Data Scientist, Statistician .fade[Virus Outbreak Data Network (VODAN Africa & Asia)<br> Nigeria Chapter] [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> https://bit.ly/gbganalyst](https://bit.ly/gbganalyst) [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> @gbganalyst](https://twitter.com/gbganalyst) [<svg viewBox="0 0 496 512" style="position:relative;display:inline-block;top:.1em;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> @gbganalyst](https://github.com/gbganalyst) ??? class: about-me-slide, inverse, middle, center --- class: middle .w-100.lh-copy[ Goal: > The goal of this training is to help you learn everything about R packages and how to import any data into R programming. ] --- class: inverse, middle name: toc # Table of content .w-100.lh-copy[ - [R and RStudio](#beg1) - [R packages and library](#beg2) - [RStudio project](#beg3) - [Reading and writing data in R](#beg4) ] --- class: middle, center, inverse name: beg1 # R and RStudio --- ## What is R programming .w-100.lh-copy[ R is a statistical programming language for data cleaning, analysis, visualization, and modelling. ] <img src="images/R.PNG" width="85%" height="87%" /> --- ## What about RStudio? .w-100.lh-copy[ RStudio is an integrated development environment (IDE) for R programming. R Studio makes programming easier and friendly in R. ] <img src="images/R_studio.PNG" width="85%" height="90%" /> --- class: middle, center, inverse name: beg2 # R packages and library <img src="images/packages.png" width="2667" /> --- layout: true ## R packages and library --- .w-100.lh-copy[ A package is a collection of R functions that extends basic R functionality (`base::functions`). ] -- .w-100.lh-copy[ A package can contain a set of functions relating to a specific topic or tasks. ] -- .w-100.lh-copy[ For example, data wrangling packages include `tidyr`, `janitor`, etc. ] -- .w-100.lh-copy[ The location where the packages are stored is called a **library**. If there is a particular package that you need, you can install the package from the Comprehensive R Archive Network (**CRAN**) by using: ] -- ```r install.packages("pkg_name") ``` -- For example: ```r install.packages("tidyverse") ``` -- .w-100.lh-copy[ Please note that the package name must be put on double quotes (**" "**) or a single quote (**' '**). ] --- .w-100.lh-copy[ Other packages that are not yet on `CRAN` can also be installed from an external repository such as **GitHub** or **GitLab** by using `devtools` or `remotes` packages. ] -- For example, package `fakir` is not yet on `CRAN`. -- To install `fakir` from the `GitHub` repository, -- use -- ```r devtools::install_github("ThinkR-open/fakir") ``` -- or -- ```r remotes::install_github("ThinkR-open/fakir") ``` -- .w-100.lh-copy[ You can also use `devtools` or `remotes` to install development version of a package. ] -- ```r remotes::install_github("datalorax/equatiomatic") ``` --- layout: false ## Import or load a package .w-100.lh-copy[ Before you can use any installed package, you will need to import or load them by using the command: ] -- ```r library(pkg_name) ``` -- .w-100.lh-copy[ which makes that package functions available for you in the R session or environment. ] -- For example: -- ```r library(tidyverse) library(janitor) library(ralger) ``` --- background-image: url(images/package.png) background-size: contain background-position: 60% 60% ### Think of R package as this: .w-100.lh-copy[ You only need to install a package once, but you need to reload it every time you start a new session. ] --- class: middle ## R Library .w-100.lh-copy[ Library is a directory where the packages are stored. You can have multiple libraries on your hard disk. ] -- To see which libraries are available (which paths are searched for packages), use: -- ```r .libPaths() ``` ``` [1] "C:/Users/Ezekiel Adebayo/AppData/Local/R/win-library/4.2" [2] "C:/Program Files/R/R-4.2.1/library" ``` --- class: middle ## Remove installed packages Remove installed packages/bundles and updates index information as necessary. ```r remove.packages("pkg_name") ``` --- ## Use a function from an external package without loading it .w-100.lh-copy[ There are two ways to make use of a function in a package. You can load the package with `library(pkg_name)` and then use any of its `functions`. For example: ```r library(install.load) install_load(c("tidyverse", "janitor", "ralger")) ``` ] -- .w-100.lh-copy[ Or you can use the `::` operator to attach a function to a library i.e. `mypackage::myfunction()`. For example: ```r install.load::install_load(c("tidyverse", "janitor", "ralger")) ``` ] -- .w-100.lh-copy[ It is often common to see people using `mypackage::myfunction()` so that the reader of a script can know which function belongs to a particular package. ] --- class: middle ### Example 1 ```r library(janitor) first_5_iris <- head(iris, 5) clean_names(first_5_iris) ``` <table> <thead> <tr> <th style="text-align:right;"> sepal_length </th> <th style="text-align:right;"> sepal_width </th> <th style="text-align:right;"> petal_length </th> <th style="text-align:right;"> petal_width </th> <th style="text-align:left;"> species </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 5.1 </td> <td style="text-align:right;"> 3.5 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.9 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 3.2 </td> <td style="text-align:right;"> 1.3 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.6 </td> <td style="text-align:right;"> 3.1 </td> <td style="text-align:right;"> 1.5 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 3.6 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> </tbody> </table> --- class: middle ### Example 2 ```r first_5_iris <- head(iris, 5) janitor::clean_names(first_5_iris) ``` <table> <thead> <tr> <th style="text-align:right;"> sepal_length </th> <th style="text-align:right;"> sepal_width </th> <th style="text-align:right;"> petal_length </th> <th style="text-align:right;"> petal_width </th> <th style="text-align:left;"> species </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 5.1 </td> <td style="text-align:right;"> 3.5 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.9 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 3.2 </td> <td style="text-align:right;"> 1.3 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.6 </td> <td style="text-align:right;"> 3.1 </td> <td style="text-align:right;"> 1.5 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 3.6 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> </tbody> </table> --- class: middle, center, inverse name: beg3 # RStudio project .w-100.lh-copy[ Data Analysis Reproducibility with R and RStudio Project. ] <img src="images/reproduce.png" width="1277" /> --- layout: true ## Where Does Your Analysis Live? --- .w-100.lh-copy[ The working directory is where R looks for files that you ask it to load, and where it will put any files that you ask it to save. ] -- RStudio shows your current working directory at the top of the console: <img src="images/console.png" width="1563" /> -- <br> and you can also print this out by using: -- ```r getwd() ``` ``` [1] "C:/Users/Ezekiel Adebayo/Desktop/R-training-modules/R-packages-R-project" ``` --- class: middle .w-100.lh-copy[ If you have specific directory that you want to use as your working directory, in `R` you can do that with the command `setwd()` e.g. `setwd("/path/to/my/data_analysis")` ] -- .w-100.lh-copy[ or by using the keyboard shortcut `Ctrl+Shift+H` and choose that specific directory (Folder). ] --- layout: false ## Paths and Directories - .w-100.lh-copy[**Absolute paths**: This looks different in every computer. In Windows they start with a drive letter (e.g., `C:`). In my R working directory I have `C:/Users/OGUNDEPO EZEKIEL .A/Desktop/R-training-modules/R-packages-R-project/data/covid19.csv` as absolute path. ] -- .w-100.lh-copy[ You should never use *absolute paths* in your scripts, because they hinder sharing and no one else will have exactly the same directory configuration as you. ] -- - .w-100.lh-copy[**Relative paths**: With the help of function `here::here()` or `R project` we can have a relative path like `data/covid19.csv` that allows for file sharing and collaboration. ] --- ## RStudio Projects .w-100.lh-copy[ For a typical data science workflow, you should use Rstudio project. R experts keep all the files associated with a project together—like data folder, R scripts folder, analytical results folder, figures folder. This is such a wise and common practice. ] -- <img src="images/rproj.png" width="2045" /> --- ## Creating a new R project Click `File → New Project`, then choose Existing Directory: <img src="images/step1.PNG" width="704" /> --- Browse for that specific directory (Folder). -- <img src="images/step2.png" width="1389" /> --- class: middle <img src="images/step3.png" width="100%" height="100%" /> -- Hurray! We are in the `RStudio project`. --- class: middle <img src="images/rproj.png" width="2045" /> Henceforth, you will click `.Rproj` to open RStudio project. --- class: middle, center, inverse name: beg4 # Reading and writing data in R <img src="images/export.png" width="60%" height="50%" /> --- layout: true ## Reading and writing data in R --- .w-100.lh-copy[ Creating a dataframe from scratch is so tedious. In the data science world, data will be available for you on a spreadsheet such as MS-Excel. Our job as a data scientist is to import those datasets into R using any data import packages such as `readr` (.csv), `readxl` (.xlsx), `haven` (.sav, .dta), `rio` (any data file format), or `ralger` (web data). ] -- .pull-left[ <img src="images/import.png" width="100%" height="100%" /> ] -- .pull-right[ .w-100.lh-copy[ Please note that `readr`, `readxl`, and `haven` are part of `tidyverse` set of packages. You can see all the packages in the tidyverse by using: ```r tidyverse::tidyverse_packages() ``` ] ] --- - `readr` package: - `read_csv()` import a `.csv` file to R - `write_csv()` export a dataframe as `.csv` file out of R -- - `readxl` package: - `read_xlsx()` import a `.xlsx` file to R -- - `writexl` package: - `write_xlsx()` export a dataframe as`.xlsx` file out of R -- - `haven` package: - `read_sav()` import a `.sav` file to R - `write_sav()` export a dataframe as `.sav` file out of R - `read_dta()` import a `.dta` file to R - `write_dta()` export a dataframe as `.dta` file out of R --- - `rio` package - `import()` import any file format to R - `export()` export a dataframe as any file format out of R For more information on `rio` package, please visit this [resource](https://www.rdocumentation.org/packages/rio/versions/0.5.26). -- ### RStudio Project .w-100.lh-copy[ To import and export data in R, we will make use of the `RStudio project`, to automatically set up the working directory, and utilize the relative path for the data file path. ] --- layout: false ### Lab session <img src="images/lab.png" width="2320" /> --- layout: false class: middle ## Summary .w-100.lh-copy[ Data science workflow can be done in Rstudio project. This enables you to organize your files i.e. keep data files, the script, save the outputs and by using only relative path. ] -- .w-100.lh-copy[ Everything you need is in one place, and cleanly separated from all other projects you are working on. ] -- .w-100.lh-copy[ You can comfortably install any R packages be it on the `CRAN` or `GitHub` and load them to the R environment. ] -- .w-100.lh-copy[ Now, you can import any file format to R and also export it out. ] --- class: center, middle, inverse # The end -- **Thank you**