--- title: "Introduction to Trelliscope" output: rmarkdown::html_vignette: self_contained: false vignette: > %\VignetteIndexEntry{Introduction to Trelliscope} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Trelliscope provides a simple mechanism to make a collection of visualizations and display them as interactive [small multiples](https://en.wikipedia.org/wiki/Small_multiple). This is a useful general visualization technique for many scenarios, particularly when looking at a somewhat large dataset comprised of many natural subsets. However, where Trelliscope differentiates itself from traditional faceting is in layout and interactivity. Traditionally, facets appear in one large plot that becomes overwhelming with too many groups. Trelliscope allows you to generate a large number of plots in an interactive window where the plots can be filtered and sorted based on metadata and paged through with a limited number of plots at a time. The Trelliscope R package provides utilities to create the visualizations, specify metadata about the visualizations that can be used to interactively navigate them, and specify other aspects of how the viewer should behave. ## Data frames of visualizations The basic principle behind the design of the R package is that you specify a collection of visualizations as a data frame, with one or more columns representing the plots (either as a plot object such as ggplot or as a reference to an image such as a png, svg, or even html file), and the other columns representing metadata about each visualization. We refer to each plot (row) of a given visualization (column) as a **panel**, and hence will often refer to a visualization column as a collection of panels. This package provides utilities to help build these data frames and then explore them in an interactive viewer. ### Pre-generated images The simplest way to illustrate what is meant by a data frame of visualizations is to start with an example where the images have already been generated. An example dataset that comes with the package contains images captured by the Mars rover Curiosity. ```{r} library(trelliscope) mars_rover ``` This data frame has a column that references images on the web, `img_src`. The other columns contain metadata about these images. We can create a Trelliscope data frame from this with the following: ```{r mars0} d <- as_trelliscope_df(mars_rover, name = "mars rover") d ``` This is simply the same data frame but with additional information for how to render the Trelliscope viewer app. At a minimum we provide a `name` for the resulting display, but we can also specify additional information such as a `path` to where we want the output to be written (a temporary directory if not specified), and a `description` and `tags`. You can also specify the `key_cols` which are the columns that combined, uniquely identify each row of the data. This is inferred if not provided but sometimes might not be what you would like it to be, as there often are many possibilities. To see more information about trelliscope-specific settings, you can use `show_info()`: ```{r} show_info(d) ``` Now to view this in the Trelliscope viewer app: ```{r mars, out.width="100%", out.height="520px", scale=0.7} view_trelliscope(d) ``` You can use this viewer to interactively explore the images through filtering, sorting, and paging through the panels. Feel free to try out the example above! You can click the icon in the upper-right corner for a full-screen view of the viewer. The behavior of the Trelliscope viewer can be customized in many ways, either through augmenting the data frame with new data, or by specifying additional visualizations, default sorting and filtering, and many other options. We will cover these throughout this document. ### R-generated visualizations A likely more common use case when visualizing data during an analysis will be that the images do not yet exist and we will be generating them from subsets of the data we are analyzing. Trelliscope has utilities to make this convenient for visualization packages like ggplot2. As a simple example, let's consider the `gap` dataset which is a modified version of data that originally comes with the [`gapminder`](https://cran.r-project.org/web/packages/gapminder/index.html) package. This modified version contains some extra columns such as ISO country code and country centroid latitude and longitude. ```{r} gap ``` This data provides statistics such as life expectancy annually for 123 countries. Suppose we want to visualize life expectancy vs. year for each country. With ggplot2, you would do something like this: ```{r echo=FALSE} library(trelliscope) suppressPackageStartupMessages( library(ggplot2, warn.conflicts = FALSE) ) ``` ```{r} library(ggplot2) ggplot(gap, aes(year, life_exp)) + geom_point() + facet_wrap(vars(country, continent)) ``` There are too many panels to view on one page, making this a good candidate for Trelliscope. Trelliscope provides a function `facet_panels()` that is the first step in turning a ggplot object into a Trelliscope data frame. You can swap out `facet_wrap()` for this function. ```{r} p <- ggplot(gap, aes(year, life_exp)) + geom_point() + facet_panels(vars(country, continent)) class(p) ``` As you can see, `facet_panels()` simply modifies your ggplot object. If you print the resulting object `p`, a Trelliscope display will be written and displayed. To take this and turn it into a data frame of visualizations with one row for each country/continent, we can apply the function `as_panels_df()`. ```{r} p_df <- as_panels_df(p) p_df ``` Here, just as in the Mars rover example, we have a data frame of visualizations. However, in this case, the visualizations are ggplot objects instead of image references. Note that `as_panels_df()` has options such as `as_plotly = TRUE` that will convert the ggplot objects to plotly objects. You can view the plot for any one row by calling the panel column and the index number of the row you want to look at. For example, if you wanted to see the generated plot for the second row, Albania, you would run the following code. ```{r message=FALSE} p_df$panel[[2]] ``` Note that this nested data frame of visualizations can be a useful object to work with outside of using it with Trelliscope. Just as in the Mars rover example, we can view this data frame of visualizations and cast it as a Trelliscope data frame with `as_trelliscope_df()` and view it with `view_trelliscope()`. ```{r gap0, out.width="100%", out.height="520px", scale=0.7} tdf <- as_trelliscope_df(p_df, name = "gapminder life expectancy") view_trelliscope(tdf) ``` Note that there are several benefits to using `facet_panels()` and `as_panels_df()`. First, it fits more naturally into the ggplot2 paradigm, where you can build a Trelliscope visualization exactly as you would with building a ggplot2 visualization. Second, you can make use of the `scales` argument in `facet_panels()` (which behaves similarly to the same argument in `facet_wrap()`) to ensure that the x and y axis ranges of your plots behave the way you want. The default is for all plots to have the same `"fixed"` axis ranges. This is an important consideration in visualizing small multiples because if you are making visual comparisons you often want to be comparing things on the same scale. The remainder of this tutorial will cover customizations that you can apply to Trelliscope data frames to provide more powerful interactions when viewing them in the app. ## Customizing your Trelliscope app So far we have seen a few ways to get to a Trelliscope data frame that we can use to create a Trelliscope interactive visualization app. As we've seen, you can simply pass a Trelliscope data frame to `view_trelliscope()` to get to an immediate output. However, there are many other operations you can perform on a Trelliscope data frame to customize how the app behaves. Let's revisit the gapminder example. Here we re-build a data frame of ggplot panels in one code block. ```{r gap1} library(dplyr, warn.conflicts = FALSE) tdf <- ( ggplot(gap, aes(year, life_exp)) + geom_point() + facet_panels(vars(country, continent, iso_alpha2)) ) |> as_panels_df(panel_col = "lexp_time") |> as_trelliscope_df(name = "gapminder life expectancy") tdf ``` Here we also added `iso_alpha2` (country code) as a redundant facetting variable so that it is available in our data frame for later use. ### Adding panels Our data frame already has a panel column, but we can add more if we would like. The following functions are available to add panels to a Trelliscope data frame. - `panel_url()`: Add a panel column with URLs to images - `panel_local()`: Add a panel column with local image files - `panel_lazy()`: Add a panel column by specifying a plot function that will be used to generate panels Here we will show an example of using `panel_url()` to add a country flag images to our data frame. In another article we will [provide more examples of using these functions](panels.html). Note that a variation of `panel_lazy()` is used underneath the hood when you use `facet_panels()`. A database of country flags is available [here](https://raw.githubusercontent.com/hafen/countryflags/master/png/512/) and flag images can be referenced by their 2-letter country code. ```{r} flag_base_url <- "https://raw.githubusercontent.com/hafen/countryflags/master/png/512/" tdf <- mutate(tdf, flag_url = panel_url(paste0(flag_base_url, iso_alpha2, ".png")) ) tdf ``` We can view a flag for any country by looking at a single entry from the column, e.g. `tdf$flag_url[[1]]`. This will open up the image in a web browser. ### Adding variables One of the most useful things you can do to customize your Trelliscope app is to add additional variables to the data frame. These variables can be used to control how the panels are explored in the viewer through sorting, filtering, and labels. For example, suppose we want to be able to explore countries based on summary statistics such as their mean life expectancy, etc. We can do this by computing a summaries of the gapminder data and joining this with `tdf`. ```{r} gsumm <- gap |> mutate(pct_chg = 100 * (life_exp - lag(life_exp)) / lag(life_exp)) |> summarise( mean_lexp = mean(life_exp), mean_gdp = mean(gdp_percap), max_lexp_pct_chg = max(pct_chg, na.rm = TRUE), dt_lexp_max_pct_chg = as.Date(paste0(year[which.max(pct_chg)], "-01-01")), .by = country ) tdf <- left_join(tdf, gsumm, by = "country") tdf ``` Trelliscope makes use of variable types to determine how data is displayed in the viewer as well as how it can be interacted with. Built-in R types such as "character", "factor", "numeric", "Date", and "POSIXct", are all supported. Character and factor variables have a filter interaction that allows you to filter the data by the values of the variable (with factors, the natural order of these values is according to the factor levels). Numeric, date and POSIXct variables have a range filter interaction that allows you to filter the data by a range of numbers/dates/times. ### Special variable types Trelliscope provides some additional variable types that can be used to provide special functionality in the viewer. Currently, the following are provided: - `number()`: Specifies a numeric type that allows specification of number of digits to display and whether to show the variable on the log scale. - `currency()`: Specifies a numeric type that represents a currency and can have a currency symbol prepended to it. - `href()`: Specifies a character type that represents a URL to link to. These types make use of the [vctrs](https://vctrs.r-lib.org) package. You can create variables of these types by simply wrapping a vector with these functions and any additional paramters. For example, below we add an example of each of these variables to our gapminder data frame: ```{r} tdf <- tdf |> mutate( mean_lexp = number(mean_lexp, digits = 1), mean_gdp = currency(mean_gdp, code = "USD"), wiki_link = href(paste0("https://en.wikipedia.org/wiki/", country)), ) tdf ``` More special variable types will come in the future as supporting filter interactions are added for them in the viewer. Some types we anticipate include geographic coordinates, network graph links, and more. ### Updating display attributes with pipe functions A Trelliscope data frame is simply a data frame that also keeps track of attributes about the Trelliscope display. We can modify these attributes by applying pipe functions [pipe functions](https://r4ds.had.co.nz/pipes.html) that take a Trelliscope data frame as its primary argument and return a modified Trelliscope data frame. The following pipe functions are available: - Fine-tune how panels and variables are handled in the app - `set_panel_options()` - `set_var_labels()` - `set_tags()` - Set the default viewing state of the app - `set_default_panel()` - `set_default_filters()` - `set_default_labels()` - `set_default_layout()` - `set_default_sort()` - Additional features - `add_inputs()`: specify input variables that capture user feedback for each panel - `add_view()`: add a pre-defined "view" that allows users to navigate to specified states of the display - `set_info_html()`: specify HTML to display in the info panel of the viewer - `set_show_info_on_load()`: specify whether to show the info panel on load - `add_charm()`: simple password protection for the generated app - Writing and viewing - `write_trelliscope()` - `view_trelliscope()` We will show examples of several of these in the following sections. ### Setting variable labels and tags To help a user have a better understanding of what the variables represent and how they are associated, we can use variable labels and tags. Variable labels can be added to a Trelliscope data frame using `set_var_labels()`. This function takes a named set of parameters as input, with the names indicating the variable name and the values indicating the labels. For example: ```{r} tdf <- tdf |> set_var_labels( mean_lexp = "Mean life expectancy", mean_gdp = "Mean GDP per capita", max_lexp_pct_chg = "Max % year-to-year change in life expectancy", dt_lexp_max_pct_chg = "Date of max % year-to-year change in life expectancy", wiki_link = "Link to country Wikipedia entry" ) ``` Note that this function simply adds a "label" attribute to each specified column, which is a common practice in R for handling labels in data frames. If your data frame is already labeled or you have other means of adding these attributes, you do not need to use this function. When there are many variables in a display, it can be useful to add tags to variables that help the user investigate variables associated with concepts of interest. Tags can be added to a Trelliscope data frame using `set_tags()`. This function takes a named set of parameters as input, with the names indicating the tag name and the values indicating the variable names to associate with that tag. For example, below we have a tag indicating variables representing computed country "stats" and a tag that indicates variables containing "info" about a country. ```{r} tdf <- tdf |> set_tags( stats = c("mean_lexp", "mean_gdp", "max_lexp_pct_chg"), info = c("country", "continent") ) ``` ### Setting panel options Trelliscope has defaults for how it will write out and show panel columns in a data frame. In our example, if we were to write out our `tdf` data frame, it would write our "lexp_time" panel column as 500x500 pixel png files. Suppose we wish to render these as 600x400 pixel svg files instead. We can do this with `set_panel_options()`. This function takes a named set of parameters as input, with the names indicating a panel column name (there can be more than one) and the values a call to `panel_options()` which allows us to specify a `width`, `height`, and `format` for "lexp_time". We also set the aspect ratio for the "flag_url" panel by specifying a width and height ratio (5:3 being the most common aspect ratio for flags). Note that for panels that already exist as files, the units of width and height do not matter as the panels are dynamically sized in the viewer and the only thing that matters is the aspect ratio at which they are displayed. ```{r} tdf <- tdf |> set_panel_options( lexp_time = panel_options(width = 600, height = 400, format = "svg"), flag_url = panel_options(width = 5, height = 3) ) ``` ### Setting the default state of the app Trelliscope apps by default display all panels in the order as they appear in the data frame. Often it makes sense to start the user off at a specific point in the app, such as pre-defining a sorting or filtering state, or defining which panel labels you want the user to see initially. #### `set_default_labels()` By default, the "key columns" will be shown as labels. If we'd like to change what labels are shown when the display is opened, we can use `set_default_labels()`, e.g.: ```{r} tdf <- tdf |> set_default_labels(c("country", "continent", "wiki_link")) ``` #### `set_default_layout()` We can also set the default panel layout, for example that we wish to see 5 columns of panels on the initial view of the app (number of rows is computed based on the size of the user's browser window and the aspect ratio of the panels). ```{r} tdf <- tdf |> set_default_layout(ncol = 4) ``` #### `set_default_sort()` We can set the default sort order with `set_default_sort()`. For this, we provide a vector of variable names and a vector of "asc" or "desc" values inidicatingm an ascending or descending sort order. ```{r} tdf <- tdf |> set_default_sort(c("continent", "mean_lexp"), dir = c("asc", "desc")) ``` #### `set_default_filters()` We can set the default filter state with `set_default_filters()`. Currently there are two different kinds of filters: - `filter_range(varname, min = ..., max = ...)`: works with numeric, date, or datetime variables - `filter_string(varname, values = ...)`: works with factor or string variables ```{r} tdf <- tdf |> set_default_filters( filter_string("continent", values = "Africa"), filter_range("mean_lexp", max = 50) ) ``` More types of filters are planned in the future. ### Defining "views" Views are predefined sets of state that are made available in the viewer to help the user conveniently get to regions of the display that are interesting in different ways. You can add a view chaining the display through the `add_view()` function. `add_view()` takes a `name` as its first argument, and then any number of state specifications. The functions available to set the state are the following: - `state_layout()` - `state_labels()` - `state_sort()` - `filter_string()` - `filter_range()` The `state_*()` functions have the same parameters as and behave similarly to their `set_*()` counterparts except that unlike those, these do not receive a Trelliscope data frame and return a Trelliscope data frame, but instead just specify a state. The `filter_*()` functions we have seen already. For example, suppose we wish to add a view that only shows countries with median life expectancy greater than or equal to 60, sorted from highest to lowest median life expectancy: ```{r} tdf <- tdf |> add_view( name = "Countries with high life expectancy (mean >= 60)", filter_range("mean_lexp", min = 60), state_sort("mean_lexp", dir = "desc") ) ``` You can add as many views as you would like by chaining more calls to `add_view()`. ### Specifying user inputs You can add user inputs that are attached to each panel of the display using the `add_inputs()` function. This function takes any number of arguments created by any of the following functions: - `input_radio(name, label, options)` - `input_text(name, label, width, height)` - `input_checkbox(name, label, options)` - `input_select(name, label, options)` - `input_multiselect(name, label, options)` - `input_number(name, label)` These specify different input types. For example, if we want a free text input for comments as well as yes/no question asking if the data looks correct for the panel, we can do the following. ```{r} tdf <- tdf |> add_inputs( input_text(name = "comments", label = "Comments about this panel", height = 6), input_radio(name = "looks_correct", label = "Does the data look correct?", options = c("no", "yes")) ) ``` Since the Trelliscope app is not backed by a server, persistent storage of user inputs is currently not supported. If you need to get inputs back from a user, an optional `email` argument can be provided which will help the user know how to get these back to you. Let's see how all of these operations are reflected in our Trelliscope data frame: ```{r} show_info(tdf) ``` ### Writing and viewing the app Now that we have built up our Trelliscope data frame, we can write it out as specified before with `write_trelliscope()`. ```{r} tdf <- write_trelliscope(tdf) ``` This writes the panels if they haven't been written yet and then writes out a JSON representation of all of the other specifications we have made for the app to consume. Here is the final output. ```{r gap, out.width="100%", out.height="630px", scale=0.6} view_trelliscope(tdf) ``` Note that we can bypass `write_trelliscope()` by going straight to `view_trelliscope()` but `write_trelliscope()` allows us to do things like force already-written panels to re-render. ### Putting it all together This example illustrates most of the features available in Trelliscope. As seen in the initial examples, Trelliscope displays can be created with minimal code, but additional functionality can be added with more code. A good amount of this code is already natural to you as we are in many cases simply updating a data frame with new columns, etc. The rest of the code is simply specifying the desired state of the app. ```{r eval=FALSE} # create initial Trelliscope data frame tdf <- ( ggplot(gap, aes(year, life_exp)) + geom_point() + facet_panels(vars(country, continent)) ) |> as_panels_df(panel_col = "lexp_time") |> as_trelliscope_df(name = "gapminder life expectancy") # add variables gsumm <- gap |> mutate(pct_chg = 100 * (life_exp - lag(life_exp)) / lag(life_exp)) |> summarise( mean_lexp = number(mean(life_exp), digits = 1), mean_gdp = currency(mean(gdp_percap), code = "USD"), max_lexp_pct_chg = max(pct_chg, na.rm = TRUE), dt_lexp_max_pct_chg = as.Date(paste0(year[which.max(pct_chg)], "-01-01")), wiki_link = href(paste0("https://en.wikipedia.org/wiki/", country)), .by = country ) tdf <- left_join(tdf, gsumm, by = "country") # set variable labels tdf <- tdf |> set_var_labels( mean_lexp = "Mean life expectancy", mean_gdp = "Mean GDP per capita", max_lexp_pct_chg = "Max % year-to-year change in life expectancy", dt_lexp_max_pct_chg = "Date of max % year-to-year change in life expectancy", wiki_link = "Link to country Wikipedia entry" ) # set tags tdf <- tdf |> set_tags( stats = c("mean_lexp", "mean_gdp", "max_lexp_pct_chg"), info = c("country", "continent") ) # set panel options tdf <- tdf |> set_panel_options( lexp_time = panel_options_lazy(width = 600, height = 400, format = "svg") ) # set default state tdf <- tdf |> set_default_labels(c("country", "continent", "wiki_link")) |> set_default_layout(ncol = 4) |> set_default_sort(c("continent", "mean_lexp"), dir = c("asc", "desc")) |> set_default_filters( filter_string("continent", values = "Africa"), filter_range("mean_lexp", max = 50) ) # add a view tdf <- tdf |> add_view( name = "Countries with high life expectancy (mean >= 60)", filter_range("mean_lexp", min = 60), state_sort("mean_lexp", dir = "desc") ) # add user inputs tdf <- tdf |> add_inputs( input_text(name = "comments", label = "Comments about this panel", height = 6), input_radio(name = "looks_correct", label = "Does the data look correct?", options = c("no", "yes")) ) # view the display view_trelliscope(tdf) ``` A few additional features are available in Trelliscope that we have not covered here and can be found by visiting other articles in this documentation: - [A deeper dive into creating panel columns](panels.html) - [Sharing and embedding Trelliscope displays](embed.html) - [Visualizing very large datasets with Trelliscope](bigdata.html)