---
title: "Accessing the RESTful API"
description: >
  Accessing submission data via the REST pathway.
output:
  rmarkdown::html_vignette:
    self_contained: false
vignette: >
  %\VignetteIndexEntry{Accessing the RESTful API}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

# Scope
This vignette provides a guided walk-through of the "getting data out"
functions of the RESTful API endpoints which list and view details.

`ruODK` users would mix and match parts of the demonstrated workflows to build
their own data pipelines, e.g.:

* to build a quick analysis from all data, freshly downloaded from a smaller
  project, or
* to build an interactive ETL pipeline to selectively download only new submissions
  for further processing and upload into downstream data warehouses.

A typical and more stream-lined workflow is provided in the RMarkdown template
"ODK Central via OData" which is supplied by `ruODK`.

## Three ways to happiness

ODK Central offers no less than three different ways to access data:

* viewing ODK Central data in MS PowerBI, MS Excel, Tableau, or `ruODK`
  through the OData service endpoints, or
* downloading all submissions including attachments as one (possibly gigantic)
  zip archive either through the "Export Submissions" button in the ODK Central
  form submissions page or through `ruODK`, or
* viewing ODK Central data through `ruODK`'s RESTful API functions.

While the `vignette("odata", package="ruODK")`
(online [here](https://docs.ropensci.org/ruODK/articles/odata-api.html))
illustrates the first option, this vignette demonstrates the remaining two.

Not implemented (yet) are the "managing ODK Central" functions which create,
update, and delete projects, forms, users, roles, and permissions.
We haven't yet found a strong use case to automate those functions -
ODK Central (driven by humans) does those jobs beautifully on an expected scale.

# Setup ruODK
See [`vignette("Setup", package = "ruODK")`](https://docs.ropensci.org/ruODK/articles/setup.html)
for detailed options to configure `ruODK`.

Here, we'll grab the OData service URL from the form used in this vignette,
plus username and password of a web user of ODK Central with access to that form.

`ruODK::ru_setup()` will populate the default url, project ID, and form ID which
are used by `ruODK`'s other functions (unless specified otherwise).

```{r ru_setup}
library(ruODK)
# ruODK::ru_setup(
#   svc = "Form OData Service URL",
#   un = Sys.getenv("ODKC_TEST_UN"),
#   pw = Sys.getenv("ODKC_TEST_PW"),
#   tz = "Australia/Perth",
#   verbose = TRUE
# )
t <- fs::dir_create("media")
```

```{r load_canned_data, echo=FALSE}
# We load canned data, so end users can build vignettes without authenticated
# calls to ODK Central
data("fq_project_list")
data("fq_project_detail")
data("fq_form_list")
data("fq_form_xml")
data("fq_form_schema")
data("fq_zip_data")
data("fq_zip_strata")
data("fq_zip_taxa")
data("fq_submission_list")
data("fq_submissions")
data("fq_attachments")
```

# Projects

List projects. We see the project ID, a name, the number of forms and app users,
dates of last form submissions plus project management timestamps (created,
updated).

The important bit here is the project ID.

```{r project_list, eval=F}
fq_project_list <- ruODK::project_list()
```

```{r project_list_show}
fq_project_list %>% knitr::kable(.)
```

Inspect a project using its ID. We receive a tibble with exactly one row,
all the columns of `ruODK::project_list` plus a column `verbs`, which contains all
available API actions for a project.

```{r project_details, eval=FALSE}
fq_project_detail <- ruODK::project_detail()
```

```{r project_detail_show}
# Project details (without verbs)
fq_project_detail %>%
  dplyr::select(-"verbs") %>%
  knitr::kable(.)

# Available verbs
fq_project_detail$verbs[[1]] %>% unlist(.)
```

Nothing apart from the verbs is new compared to the data returned by
`ruODK::project_list`.

To learn more about the functionality behind the verbs, refer to the interactive
[ODK Central API documentation](https://docs.getodk.org/central-api/).

To retrieve data from ODK Central, the functions shown below will suffice.

# Forms
## List forms for a project

To download form submissions, we need to know project ID and form ID.

There are several ways of retrieving the form ID:

* Browsing forms in the ODK Central's project overviews,
* Stealing the form ID from the OData service endpoint URL as shown on
  ODK Central's form submission page,
* Listing form metadata for a given project ID with `ruODK::form_list()`.

```{r form_list, eval=FALSE}
fq_form_list <- ruODK::form_list()
```

```{r form_list_show}
fq_form_list %>% knitr::kable(.)
```

Further to the metadata shown here, a column `xml` contains the entire XForms
definition (originally XML) as nested list.

If the original XML is needed rather than the R equivalent (nested list),
we can use `ruODK::form_xml` with parameter `parse=FALSE`:

```{r form_xml, eval=F}
fq_form_xml <- ruODK::form_xml(parse = FALSE)
```

```{r form_xml_show}
if (require(listviewer)) {
  listviewer::jsonedit(fq_form_xml)
} else {
  ru_msg_warn("Install package listviewer to browse the form XML.")
}
```

## Inspect form schema

The `form_schema` represents all form fields of the XForms definition.

See the
[ODK Central API docs](https://docs.getodk.org/central-api-form-management/#getting-form-schema-fields)
and the examples of `??ruODK::form_schema()` for more detail.


```{r form_schema, eval=FALSE}
fq_form_schema <- ruODK::form_schema()
```

```{r form_schema_view}
fq_form_schema %>% knitr::kable(.)
```

## Show details of one form

The details of a form are exactly the same as the output of `ruODK::form_list()`.

```{r form_detail, eval=F}
fq_form_detail <- ruODK::form_detail()
```

```{r form_detail_view}
fq_form_detail %>% knitr::kable(.)
```

# Submissions
We are getting closer to the actual data! This section shows two of the options
for data access: dump all submissions, or extract a subset.

## Get all submissions for one form
Smaller datasets lend themselves to be exported in one go.
ODK Central offers one giant zip file containing all submissions, any
repeating groups, and any attachments both on the form submission page, and as
API endpoint which is provided as `ruODK::submission_export()`.

The default behaviour of `ruODK::submission_export()` is to write the zip file
to the project root (`here::here()`), and to overwrite existing previous downloads.
See `?ruODK::submission_export()` for alternative download and retention options.

In the following chuck, we illustrate common tasks:

* Download the zip file.
* Unpack the zip file.
* Join repeating form group data `data_taxon` to main data `data_quadrat` to
  annotate `data_taxon` with data from `data_quadrat`.
* Sanitise the column names.
* Prepend all attachment filenames (e.g. `data_quadrat$location_quadrat_photo`,
  `data_taxon$photo_in_situ`) with `media/`.


```{r submission_export, eval=F}
# Predict filenames (with knowledge of form)
fid <- ruODK::get_test_fid()
fid_csv <- fs::path(t, glue::glue("{fid}.csv"))
fid_csv_veg <- fs::path(t, glue::glue("{fid}-vegetation_stratum.csv"))
fid_csv_tae <- fs::path(t, glue::glue("{fid}-taxon_encounter.csv"))

# Download the zip file
se <- ruODK::submission_export(local_dir = t, overwrite = FALSE, verbose = TRUE)

# Unpack the zip file
f <- unzip(se, exdir = t)
fs::dir_ls(t)

# Prepend attachments with media/ to turn into relative file paths
fq_zip_data <- fid_csv %>%
  readr::read_csv(na = c("", "NA", "na")) %>% # form uses "na" for NA
  janitor::clean_names(.) %>%
  dplyr::mutate(id = meta_instance_id) %>%
  ruODK::handle_ru_datetimes(fq_form_schema) %>%
  ruODK::handle_ru_geopoints(fq_form_schema) %>%
  ruODK::attachment_link(fq_form_schema)

fq_zip_strata <- fid_csv_veg %>%
  readr::read_csv(na = c("", "NA", "na")) %>%
  janitor::clean_names(.) %>%
  dplyr::mutate(id = parent_key) %>%
  # ruODK::handle_ru_datetimes(fq_form_schema) parent_key%>% # no dates
  # ruODK::handle_ru_geopoints(fq_form_schema) %>%  # no geopoints
  # ruODK::ruODK::attachment_link(fq_form_schema) %>% # no att.
  dplyr::left_join(fq_zip_data, by = c("parent_key" = "meta_instance_id"))

fq_zip_taxa <- fid_csv_tae %>%
  readr::read_csv(na = c("", "NA", "na")) %>%
  janitor::clean_names(.) %>%
  dplyr::mutate(id = parent_key) %>%
  # ruODK::handle_ru_datetimes(fq_form_schema) %>%
  # ruODK::handle_ru_geopoints(fq_form_schema) %>%
  # ruODK::ruODK::attachment_link(fq_form_schema) %>%
  dplyr::left_join(fq_zip_data, by = c("parent_key" = "meta_instance_id"))
```

```{r zip_view}
head(fq_zip_data)
head(fq_zip_strata)
head(fq_zip_taxa)
# Further: create map with popups, see vignette "odata"
```

## List submissions for one form
Not always is it appropriate to download all submissions and all attachments
at once.

If forms feed into downstream data warehouses, the typical ETL workflow is to

* List all submissions from ODK Central
* Select the subset of new submissions to download, e.g.
  * Submissions younger than the oldest submission date in the data warehouse.
  * Submissions whose `instance_id` is not already present in the data warehouse.
* Download only the selected submissions.
* Download attachments of only the selected submissions.

```{r submission_list, eval=F}
fq_submission_list <- ruODK::submission_list()
```

```{r submission_list_view}
fq_submission_list %>% knitr::kable(.)
```

The list of submissions critically contains each submission's unique ID in
`instance_id`. If the submissions shall be downloaded and uploaded into another
data warehouse, the `instance_id` can be used to determine whether a record
already exists in the downstream warehouse or not.
This workflow is preferable where the majority of submissions is already
imported into another downstream data warehouse, and we only want to add new
submissions, as in submissions which are not already imported into the data
warehouse.

Furthermore, the `instance_id`s can now be used to retrieve the actual
submissions.

## Get submission data

In order to import each submission, we need to retrieve the data by
`instance_id`.

```{r submission_data, eval=F}
# One submission
fq_one_submission <- ruODK::get_one_submission(
  fq_submission_list$instance_id[[1]]
)

# Multiple submissions
fq_submissions <- ruODK::submission_get(fq_submission_list$instance_id)
```

## Parse submissions
The data in `sub` is one row of the bulk downloaded submissions in
`data_quadrat`.
The data in `submissions` represents all (or let's pretend, the selected)
submissions in `data_quadrat`.
The field `xml` contains the actual submission data including repeating groups.

The structure is different to the output of `ruODK::odata_submission_get`,
therefore `ruODK::odata_submission_rectangle` does not work for those, as
here we might have repeating groups included in a submission.

This structure could be used for upload into data warehouses accepting nested
data as e.g. JSON.

```{r view_submission_data, fig.width=7}
if (requireNamespace("listviewer")) {
  listviewer::jsonedit(fq_submissions, mode = "code")
} else {
  ru_msg_info("Please install package listviewer!")
}
```

# Outlook
The approach shown here yields nested and stand-alone records, which is useful
if the subsequent use requires records in nested JSON or XML format.
Complex forms with repeating sub-groups will result in highly nested lists,
whose structure heavily depends on the completeness of the submissions.

The other approach shown in
[`vignette("odata-api", package="ruODK")`](https://docs.ropensci.org/ruODK/articles/odata-api.html)
yields rectangled data in several normalised tables, which is useful for
analysis and visualisation.