-
-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathvpin_measures.Rd
186 lines (153 loc) · 6.87 KB
/
vpin_measures.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/model_vpin.R
\name{vpin_measures}
\alias{vpin_measures}
\alias{vpin}
\alias{ivpin}
\title{Estimation of Volume-Synchronized PIN model (vpin) and the improved
volume-synchronized PIN model (ivpin)}
\usage{
vpin(
data,
timebarsize = 60,
buckets = 50,
samplength = 50,
tradinghours = 24,
verbose = TRUE
)
ivpin(
data,
timebarsize = 60,
buckets = 50,
samplength = 50,
tradinghours = 24,
grid_size = 5,
verbose = TRUE
)
}
\arguments{
\item{data}{A dataframe with 3 variables:
\code{{timestamp, price, volume}}.}
\item{timebarsize}{An integer referring to the size of timebars
in seconds. The default value is \code{60}.}
\item{buckets}{An integer referring to the number of buckets in a
daily average volume. The default value is \code{50}.}
\item{samplength}{An integer referring to the sample length
or the window size used to calculate the \code{VPIN} vector.
The default value is \code{50}.}
\item{tradinghours}{An integer referring to the length of daily
trading sessions in hours. The default value is \code{24}.}
\item{verbose}{A logical variable that determines whether detailed
information about the steps of the estimation of the VPIN (IVPIN) model is
displayed. No output is produced when \code{verbose} is set to \code{FALSE}.
The default value is \code{TRUE}.}
\item{grid_size}{An integer between \code{1}, and \code{20};
representing the size of the grid used in the estimation of IVPIN. The
default value is \code{5}. See more in details.}
}
\value{
Returns an object of class \code{estimate.vpin}, which
contains the following slots:
\describe{
\item{\code{@improved}}{ A logical variable that takes the value \code{FALSE}
when the classical VPIN model is estimated (using \code{vpin()}), and \code{TRUE}
when the improved VPIN model is estimated (using \code{ivpin()}).}
\item{\code{@bucketdata}}{ A data frame created as in
\insertCite{abad2012;textual}{PINstimation}.}
\item{\code{@vpin}}{ A vector of VPIN values.}
\item{\code{@ivpin}}{ A vector of IVPIN values, which remains empty when
the function \code{vpin()} is called.}
}
}
\description{
Estimates the Volume-Synchronized Probability of Informed
Trading as developed in \insertCite{Easley2011;textual}{PINstimation}
and \insertCite{Easley2012;textual}{PINstimation}. \cr
Estimates the improved Volume-Synchronized Probability of Informed
Trading as developed in \insertCite{ke2017improved;textual}{PINstimation}.
}
\details{
The dataframe data should contain at least three variables. Only the
first three variables will be considered and in the following order
\code{{timestamp, price, volume}}.
The argument \code{timebarsize} is in seconds enabling the user to implement
shorter than \code{1} minute intervals. The default value is set to \code{1} minute
(\code{60} seconds) following Easley et al. (2011, 2012).
The argument \code{tradinghours} is used to correct the duration per
bucket if the market trading session does not cover a full day \code{(24 hours)}.
The duration of a given bucket is the difference between the
timestamp of the last trade \code{endtime} and the timestamp of the first trade
\code{stime} in the bucket. If the first and last trades in a bucket occur
on different days, and the market trading session is shorter than
\verb{24 hours}, the bucket's duration will be inflated. For example, if the daily
trading session is 8 hours \code{(tradinghours = 8)}, and the start time of a
bucket is \code{2018-10-12 17:06:40} and its end time is
\code{2018-10-13 09:36:00}, the straightforward calculation gives a duration
of \code{59,360 secs}. However, this duration includes 16 hours when the
market is closed. The corrected duration considers only the market activity
time: \code{duration = 59,360 - 16 * 3600 = 1,760 secs}, approximately
\verb{30 minutes}.
The argument \code{grid_size} determines the size of the grid for the variables
\code{alpha} and \code{delta}, used to generate the initial parameter sets
that prime the maximum-likelihood estimation step of the
algorithm by \insertCite{ke2017improved;textual}{PINstimation} for estimating
\code{IVPIN}. If \code{grid_size} is set to a value \code{m}, the algorithm creates a
sequence starting from \code{1 / (2m)} and ending at \code{1 - 1 / (2m)}, with a
step of \code{1 / m}. The default value of \code{5} corresponds to the grid size used by
\insertCite{Yan2012;textual}{PINstimation}, where the sequence starts at
\code{0.1 = 1 / (2 * 5)} and ends at \code{0.9 = 1 - 1 / (2 * 5)}
with a step of \code{0.2 = 1 / 5}. Increasing the value of \code{grid_size}
increases the running time and may marginally improve the accuracy of the
IVPIN estimates
}
\examples{
# The package includes a preloaded dataset called 'hfdata'.
# This dataset is an artificially created high-frequency trading data
# containing 100,000 trades and five variables: 'timestamp', 'price',
# 'volume', 'bid', and 'ask'. For more information, type ?hfdata.
xdata <- hfdata
### Estimation of the VPIN model ###
# Estimate the VPIN model using the following parameters:
# - timebarsize: 5 minutes (300 seconds)
# - buckets: 50 buckets per average daily volume
# - samplength: 250 for the VPIN calculation
estimate <- vpin(xdata, timebarsize = 300, buckets = 50, samplength = 250)
# Display a description of the VPIN estimate
show(estimate)
# Display the parameters of the VPIN estimates
show(estimate@parameters)
# Display the summary statistics of the VPIN vector
summary(estimate@vpin)
# Store the computed data of the different buckets in a dataframe 'buckets'
# and display the first 10 rows of the dataframe.
buckets <- estimate@bucketdata
show(head(buckets, 10))
# Display the first 10 rows of the dataframe 'dayvpin'.
dayvpin <- estimate@dailyvpin
show(head(dayvpin, 10))
### Estimation of the IVPIN model ###
# Estimate the IVPIN model using the same parameters as above.
# The grid_size parameter is unspecified and will default to 5.
iestimate <- ivpin(xdata, timebarsize = 300, samplength = 250, verbose = FALSE)
# Display the summary statistics of the IVPIN vector
summary(iestimate@ivpin)
# The output of ivpin() also contains the VPIN vector in the @vpin slot.
# Plot the VPIN and IVPIN vectors in the same plot using the iestimate object.
# Define the range for the VPIN and IVPIN vectors, removing NAs.
vpin_range <- range(c(iestimate@vpin, iestimate@ivpin), na.rm = TRUE)
# Plot the VPIN vector in blue
plot(iestimate@vpin, type = "l", col = "blue", ylim = vpin_range,
ylab = "Value", xlab = "Bucket", main = "Plot of VPIN and IVPIN")
# Add the IVPIN vector in red
lines(iestimate@ivpin, type = "l", col = "red")
# Add a legend to the plot
legend("topright", legend = c("VPIN", "IVPIN"), col = c("blue", "red"),
lty = 1,
cex = 0.6, # Adjust the text size
x.intersp = 1.2, # Adjust the horizontal spacing
y.intersp = 2, # Adjust the vertical spacing
inset = c(0.05, 0.05)) # Adjust the position slightly
}
\references{
\insertAllCited
}