introduction.qmd

# Introduction and Background {#sec-introduction}

As a component of sustainable urban mobility, bike-sharing is on the rise in cities around the world. Empirical research as exemplified by @teixeira2021empirical and @blazanin2022scooter has been dedicated to highlighting requirements and beneficial impacts of the related modal shifts in urban transport. As pointed out in the related references, careful planning is required to make so-called shared micromobility systems more attractive than less environmentally friendly alternatives, such as cars. While the concept of shared micromobility also includes e-scooters and dockless bike-shares, here, we focus on systems that rely on a set of dedicated stations. As, for example, @luo2019comparative point out, while dockless systems offer more convenience and equity to users, their CO2 footprint is significantly higher due to the reduced lifespan of vehicles. On the other hand, station-based sharing systems require carefully planned station locations and well-balanced inventory levels to ensure adequate service provision. When, for example, a station is mostly used to pick-up bikes, re-balancing ensures a steady supply and avoids service denials.

Planning for bike-sharing operations means determining, for example, the best distribution of stations across the service area [@Ciancio2017], the best distribution of bikes across stations [@Zhu2021], and the best path for truck drivers to take when re-distributing bikes each day [@Schuijbroek2017]. When taking a pro-active approach to planning, optimisation procedures that determine stock levels per station rely on predicted demand.  When taking a re-active approach, quick online decision-making is crucial to maintain a good service level. Since inefficient re-balancing operations are a major cost driver for operators [@Schuijbroek2017], identifying demand outliers to improve efficiency in bike-sharing systems is highly important. Unaccounted-for outliers can affect bike-sharing systems in two ways: (i) outliers in historic data contaminate the forecasts used in future inventory management, and (ii) on the day demand levels may indicate that the schedule is non-optimal for the current day and drivers should be re-routed. 

Therefore, identification of outlier demand has several potential benefits for bike-sharing forecasting and planning: 1) Detecting outliers early in the day, through online analysis as proposed in @Rennie2021, allows for rapid interventions to better re-allocate bikes on a given day; 2) Removing any detected outliers from training data for demand forecasting would improve results on predicting reference demand curves; 3) If outliers can be attributed to specific events previously unknown, extending future forecast models to include such events can improve forecasts; 4) Even when explanatory factors for outliers cannot be determined, if such outliers are concentrated spatio-temporally in certain stations, this knowledge can better support planning decisions; and 5) Identifying changes in the underlying reference model when patterns in the detected outliers are observed can trigger a review of the current forecasting method.

We define outlier demand as a short-term change in demand, resulting in usage levels which deviate from *regular* usage. Note that, to count as an outlier, a demand shift has to exceed the general degree of random variation observed in demand over time. In this paper, we focus on demand observed at bike-sharing stations, as these are the target of inventory rebalancing efforts. In contrast to other classical mobility problems, such as those related to buses or trains, bike-sharing capacity is on the vertices of the transport network, rather than on the edges. 

As an example of existing work in this area, @NeumannSaavedra2021 discuss the problem of variability in bike-sharing demand and propose a rule-based method to adjust the redistribution plan when demand differs from the forecast. In a simulation study, they show that service levels can be improved when adjustments are made to the optimal redistribution plan. Wider literature on outlier detection in transport planning is scarce -- e.g., @Rennie2021b consider identifying and correcting for outliers in revenue management systems in railways.

@Talvitie1978 find that outliers can have a substantial effect on the predictions of usage of different urban transport modes, but only apply a simplistic trimming method to identify outliers. In the road traffic domain, @Guo2015 suggest a procedure for identifying outliers in real time based on the conditional variance of predictions, and determine that incorporating information on such outliers into future predictions increases the systems performance. 

Furthermore, as indicated, e.g., in @basole2021visualization, to account for demand outliers and adjust planning, experts require meaningful visualisations. Therefore, we propose a set of visualisations to help identify and analyse spatial and temporal patterns in the detected outliers. For example, given shifts in the availability of urban infrastructure, a subset of stations may be predisposed to outliers and as such, this area would be a good target for a temporary "pop-up" station. Throughout this paper, we assume that bike-sharing companies employ analysts who are in charge of strategic decisions, such as where to locate stations, tactical decisions, such as what number of bikes should be available on any given day at those stations, and operational decisions, such as on-the-fly rebalancing of bikes. In that, we follow previous research, such as @aswang2016modeling, who expect analysts to evaluate bike share programs and station locations, or @orma2021investigating, who consider analysts or dispatchers to be in charge of rebalancing operations.

To combine automated outlier detection, manual analysis, demand forecasts, and planning, we suggest the following process for analysing bike-sharing demand data (see @fig-process_map): First, a baseline demand forecast supports anticipative planning, e.g. of inventory levels. Second, this baseline can be used to normalise observed usage data. Using the resulting observations, analysts can cluster stations with similar usage patterns to support both planning adjustments and outlier detection. When detecting outliers in a cluster's usage patterns, these are visualised to enable manual outlier evaluation. Insights from this analysis can be used to both clean the data that underlies the baseline forecast and to extend the baseline forecasting model.

![Flowchart of process for analysing bike-sharing demand data. Figure adapted from @Rennie_thesis.](Images/Fig_01.pdf){#fig-process_map}

In this paper, we analyse the Capital Bikeshare data set, which is publicly available at @CaBi. This data set is commonly used to test forecasting approaches for bike-sharing [@Ma2015, @Hamilton2018], yet these methods typically do not account for outliers. In Section @sec-data, we introduce the data set and perform an exploratory analysis. Section @sec-st_patterns and Section @sec-clustering_ch3 then model the temporal and spatial patterns in demand for bike-sharing. In Section @sec-outliers_method, we provide a methodology for identifying outlying demand for bike-sharing services. The results of applying the outlier detection method to the Capital Bikeshare data are then discussed in Section @sec-discussion. 

In summary, this paper contributes (i) an in-depth analysis of temporal patterns in usage of Capital Bikeshare services; (ii) a method for spatial clustering of bike-sharing stations based on geographic proximity and similarity of usage patterns; (iii) an investigation of temporal trends in detected outliers and the factors that may cause them; and (iv) an analysis of spatial patterns of the outliers detected. Our methodology is data-driven and general by design, and not tailored to specifics related to Washington D.C., and can thus be readily applied to all bike-sharing data sets around the world.