4 Application

Here we choose the Robust Adaptive MCMC (Vihola 2012) algorithm as default method when estimating the parameters of the \(\text{CAViaR}\) family of models. The parameter vector estimate \(\boldsymbol{\hat{\theta_t}}\) is subsequently used to get a forecast of quantile \(q_{\alpha}\) at time \(t + 1\). In this section we explain how to set up a computing architecture to serve and manage access to a web application available. This application obtains real time prices from a cryptocurrency exchange through a Websocket API and uses the shiny package (Chang et al. 2021) to visualize this data dynamically on the web. The same web application uses a standard HTTP API to estimate the \(\text{VaR}_\alpha\) of multiple countries using ETF data as a proxy.

Data

4.1 High frequency

The WebSocket protocol is a modern technology that allows two-way communication between a server and a client. It is designed to address important issues that arise when abusing the HTTP protocol with multiple calls, required when working with high frequency data. Instead of making a different call every \(k\) seconds, the server will notify the client whenever new data arrives through the same communication channel.

We will estimate the parameters of the T-CAViaR model (2.13) using an input vector \(\mathbf{y}\) corresponding to the daily returns of the Bitcoin/USD crypto currency pair. However, our forecast quantile \(\hat{q}_\alpha\) will not be available in real time since it takes time to get our estimate vector \(\hat{\theta}\). The Bitstamp API offers a maximum of \(1000\) data points of historical data through an HTTP API alongside the above-mentioned real time WebSocket API. Historical data is obtained at a given frequency \(k\), the highest frequency available being \(60\) seconds. We will hence start by timing the estimation of \(\hat{\theta}\) using the microbenchmark package (Mersmann 2019).

The specific alphanumeric pattern of the unique identifiers (symbols or tickers) depend on the data source. For instance, the cryptocurrency database follows a currency1currency2 pattern while the ETF database uses additional symbols such as ^ = . depending on the asset class. It’s important to be aware of these differences when working with multiple data sources since these unique identifiers are usually unique and far from standardized.

Historical OHLC data is received in JSON format through the https://www.bitstamp.net/api/v2/ endpoint. The following R code downloads a JSON file with the latest 1000 hourly prices for btcusd:

api_call <- sprintf(
  "https://www.bitstamp.net/api/v2/ohlc/%s/?step=%s&limit=%s",
  "btcusd", 86400, 1000
)
out <- jsonlite::fromJSON(api_call)$data

Where the JSON data out has been parsed into an R list of 2 elements: pair (BTC/USD) and ohlc (the prices).

Table 4.1: The input data structure.
Length Class Mode
timestamp 1000 -none- character
open 1000 -none- character
high 1000 -none- character
low 1000 -none- character
close 1000 -none- character
volume 1000 -none- character

This data gathering step has been summarized into the get_price_hist function, which returns by default the same list of two elements. We can then compute the log-returns vector \(\mathbf{y}\) which will be passed as first argument to caviar_methods when estimating the parameters of the model.

4.2 Low frequency

The HTTP protocol uses a different TCP connection every time a GET request is made to the server. The highest frequency available through the Yahoo! Finance endpoint is one day.

Yahoo! Finance data can be easily obtained through their website. Alternatively, the following R code downloads the CSV file of daily data for BTC=F:

btc_futures <- as.data.frame(tseries::get.hist.quote(
  "BTC=F",
  start = "2017-12-18",
  end = Sys.Date(),
  quote = c("Open", "High", "Low", "Close", "Volume")
))
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Warning: BTC=F contains missing values. Some functions will not work if objects
## contain missing values in the middle of the series. Consider using na.omit(),
## na.approx(), na.fill(), etc to remove or replace them.
## time series ends   2026-01-30
# Convert row names to a 'Date' column for consistency
btc_futures$Date <- as.Date(rownames(btc_futures))
summary.default(btc_futures)
##        Length Class  Mode   
## Open   2045   -none- numeric
## High   2045   -none- numeric
## Low    2045   -none- numeric
## Close  2045   -none- numeric
## Volume 2045   -none- numeric
## Date   2045   Date   numeric

The resulting data.frame has a total of 2045 rows and 6 columns as of 2026-02-01 05:12:35.060242. However, several rows need to be ignored since they include no data but a null message. 12

idx <- which(btc_futures$Close != "null")
btc_futures <- btc_futures[idx, ]
dim(btc_futures)
## [1] 2043    6

After removing the lines with null data, we end up with a total of 2043 observations.

Figure 4.1 uses a plotly candlestick chart (Sievert et al. 2021) to better help us visualize the data.

Figure 4.1: Spot and futures prices of the BTC/USD currency pair.

The output of applying the summary function to the btc_spot dataset is shown in Table 4.2. Notice that we first convert the numeric data from character type by applying the as.numeric function to each column of btc_spot, excluding the timestamp column.

Table 4.2: Summary of the BTC/USD spot prices.
open high low close volume
Min. : 25125 Min. : 25729 Min. : 24756 Min. : 25127 Min. : 205
1st Qu.: 43119 1st Qu.: 43843 1st Qu.: 42402 1st Qu.: 43170 1st Qu.:1060
Median : 68363 Median : 69543 Median : 67100 Median : 68390 Median :1717
Mean : 71455 Mean : 72693 Mean : 70175 Mean : 71506 Mean :1929
3rd Qu.: 96835 3rd Qu.: 98312 3rd Qu.: 95232 3rd Qu.: 96837 3rd Qu.:2463
Max. :124728 Max. :126272 Max. :123148 Max. :124728 Max. :9294

We can also collect additional information such as the mean and quantiles for different values of \(\alpha \in {0.0, 0.1, 0.2, 0.3, 0.4, 0.5}\), where the quantile at level \(\alpha = 0.5\) corresponds to the median of a given vector such as btc_spot$close. 13 The mean value of this vector is 7.151^{4} while the different quantile levels can be found in Table 4.3.

Table 4.3: Closing price quantiles, BTC/USD spot.
0% 10% 20% 30% 40% 50%
25127 27993 37671 55986 63316 68390

This information can also be represented graphically using a plotly histogram as shown in Figure 4.2.

btc_close <- as.numeric(btc_spot$close)
close_hist <- plotly::plot_ly(
  x = btc_close,type = "histogram", name = "BTC/USD Spot close"
) %>%
add_segments(
  x = quantile(btc_close, probs = seq(0.1, 0.9, 0.1)),
  xend = quantile(btc_close, probs = seq(0.1, 0.9, 0.1)),
  y = -10, yend=10, name = paste("BTC/USD - \U03B1:", seq(0.1, 0.9, 0.1)),
  line = list(width = 1)
)
close_hist

Figure 4.2: Histogram of BTC/USD price quantiles.

And we can also compare this histogram to the corresponding Close price of the btc_futures table as seen in Figure 4.3.

fig <- plotly::plot_ly(alpha = 0.5) %>%
  add_histogram(x = as.numeric(btc_spot$close),
               name = "BTC/USD Spot") %>%
  add_histogram(x = as.numeric(btc_futures$Close),
                name = "BTC/USD Futures") %>%
  layout(barmode = "overlay")

fig

Figure 4.3: Histogram of BTC/USD spot prices overlaying futures prices.

4.3 Modelling

For this step we start by computing the log-returns shown in Figure 4.4 as follows:

out <- get_price_hist("btcusd", crypto = T)
y <- diff(log(as.numeric(out$ohlc$CLOSE)))

Figure 4.4: Log-returns for the BTC/USD pair, prices (close) from Yahoo! Finance.

After which we pass y to the MCMC sampler:

out <- caviarma::caviar(y, nsim = 1e5)

This step is the bottleneck of the data analysis since it must be repeated multiple times as explained in the Simulation Study section.

4.4 Forecasting

After getting our estimate \(\boldsymbol{\hat{\theta_t}}\), we obtain our forecast for \(t = t + 1\) using Equations (2.9) to (2.13):

quantile_forecast <- caviarma::get_forecast(y, out)

Figure 4.5: VaR (0.05) forecasts using different models.

Figure 4.5 shows sample VaR forecasts obtained using different CAViaR models. We also include the corresponding tGARCH and sGARCH forecasts for reference (Ardia et al. 2019). 14 And finally, we show in Table 4.4 the results from the VaR backtest obtained through the GAS::BacktestVaR() function.

Table 4.4: DQ Test statistic for different CAViaR models (GAS package).
Statistic P-value
Symmetric Absolute Value 1.79 0.971
Asymmetric Slope 6.10 0.528
GARCH 4.79 0.686
Adaptive 4.00 0.779

Web app

We make use of the shiny and shinyMobile R packages to build a progressive web app15 (PWA) that allows a user to gather, analyze and visualize the data introduced in the Data section using the models and methods studied in the Forecasting Methods section. To run this application locally, one must first install the simulr package (C. Sepulveda 2021) and then run the app:

remotes::install_gitlab("cacsfre/simulr")
simulr::run_simulr()

4.5 Reactivity

R is a powerful data-centred programming language offering a mathematically intuitive interactive environment which makes data analysis a joyful experience. The shiny package offers an intuitive framework for developing web applications using the R programming language. A shiny application seamlessly connect user interface (UI) input to the back end running R. This makes it straightforward to create a UI to allow users of our code to interact with it without requiring any programming.

This relationship between the input received from the browser (the client) and the output produced by the server is summarized in Figure 4.6 shiny handles all the required css/html/javascript16 to create a communication channel between the browser and the server through R.

Shiny offers a reactive programming model which makes it easy to make use of an R function whenever the input object changes. The logic behind each element rendered in the UI is summarized using shiny modules. For instance, the code below17 creates a plotly map (the UI element) whenever input$update_map changes, i.e., it’s clicked.

map_cardUI <- function(id) {
  uiOutput(NS(id, "map_card"))
}

map_cardServer <- function(id) {
  moduleServer(id, function(input, output, session) {
    values <- shiny::reactiveValues(map_data = NULL)
    output$map_plot <- plotly::renderPlotly(get_map(values$map_data))
    observeEvent(input$update_map, {values$map_data <- get_map_data()})

    output$map_card <- renderUI({
      shinyMobile::f7Card(
        title = shiny::actionButton("update_map"),
        plotly::plotlyOutput("map_plot")
      )
    })
  })
}

Figure 4.6: Relationship between user input, server output and reactive values.

In the example above, the server recreates the output$map_plot session object whenever the user clicks on input$update_map. This relationship is due to the observeEvent(input$update_map, {values$map_data <- get_map_data()}) expression, which invalidates the reactive relationship between values$map_data and output$map_plot whenever input$update_map is clicked by the user.

4.6 Deployment

Deploying a shiny application can be pretty simple using the https://www.shinyapps.io/ service by RStudio. This works fine for demos consuming limited resources. However, it requires an upgrade once we need to handle multiple users or greater computing resources. The two typical approaches to scale a shiny app in an enterprise context are Shiny Server and ShinyProxy, where shinyproxy18 tries to fill the gaps of the open-source version of shiny-server relying exclusively on the open-source shiny package. Another advantage of ShinyProxy is its use of docker, offering each user its own independent shiny app environment and R process. This isolation obtained through docker containers is important both for security and performance reasons. This architecture is outlined in Figure 4.7.

Figure 4.7: Outline of the shinyproxy architecture.

Compared to the open-source version of shiny-server in Figure 4.8.

Figure 4.8: Outline of the open-source shiny-server architecture.