Rohan Prabhune, Naman Goel
This vignette shows how to work with API. This will demonstrate how to
fetch data from multiple API endpoints and read it in a tibble(data
frame). This is followed by some basic exploratory data analysis (EDA)
to produce some plots to derive insights from the data fetched.
The API that we have chosen for this project is Financial Market
Data.The Polygon.io Stocks API provides
REST endpoints that let you query the latest market data from all US
stock exchanges.
We used the following packages in the creation of the vignette:
httr
: This is used to access the REST API endpoint.jsonlite
: This is used to parse the fetched data into a data frame.tidyverse
: This provides two important packagesdplyr
andggplot
which are used for data manipulation and plotting respectively.kableExtra
: This provides better printing properties for contingency tables in markdown.
To get started, install(if these are not installed already) and load the following packages:
library(httr)
library(jsonlite)
library(tidyverse)
library(kableExtra)
This section describes the functions created by us to interact with the API endpoints to fetch data as well as some metadata required for making the plots more descriptive.
Aggregates (Bars) Endpoint
Get financial data for a stock within a given time frame
This function has four modifications from the user. The user can provide the following inputs to the functions:
- ticker and company_name: The ticker symbol and registered
name of the company. If the user does not have this information,
this can be fetched using
get_ticker
function described ahead. - start_date: The start of the time window (A date with the format YYYY-MM-DD).
- end_date: The end of the time window (A date with the format YYYY-MM-DD).
- limit: Limits the number of entries queried to create the aggregate results.
This function returns a data frame with the close price, open price, highest price, lowest price etc for the stock over the given date range.
get_stocks_agg <- function(ticker,company_name,start_date="2022-01-01",
end_date="2022-08-31",limit=50){
url=paste0("https://api.polygon.io/v2/aggs/ticker/",ticker,
"/range/1/day/",start_date,"/",end_date,
"?adjusted=true&sort=asc&limit=",limit,"&apiKey=EdkA7_m2JhjS5POrGuXJbVlA4AjSl_4F")
response_obj <- GET(url)
parsed <- fromJSON(rawToChar(response_obj$content))
df <- as_tibble(parsed$results)
df <- df %>% rename(close_price=c,highest_price=h,lowest_price=l,num_transactions=n,
open_price=o,timestamp=t,vol=v,weighted_avg_price=vw)
df$ticker <- rep(ticker,limit)
df$company_name <- rep(company_name,limit)
df$start_date <- start_date
df$end_date <- end_date
df <- df %>% select(ticker,company_name,everything())
return(df)
}
Tickers Endpoint
Query all ticker symbols which are supported by Polygon.io. This API currently includes Stocks/Equities, Cryptocurrencies, and Currencies/Foreign Exchange.
This function searches only the stock market. The user can provide the
name of the company which they would like the ticker information for For
example: If a user wants the ticker, the ticker information for Apple,
the user can call the function as get_ticker(name="Apple")
. This
function will return the ticker symbol AAPL and the registered
company name Apple Inc. as a list. The user can pass on the contents
of this list to get_stocks_agg
function mentioned above to get the
aggregate bars over a date range. If there are multiple matches for a
given name, the function returns the first ticker information from the
list of matches.
get_ticker <- function(name){
url=paste0("https://api.polygon.io/v3/reference/tickers?market=stocks&search=",name,"&active=true&sort=ticker&order=asc&limit=1000&apiKey=EdkA7_m2JhjS5POrGuXJbVlA4AjSl_4F")
response_obj <- GET(url)
parsed<- fromJSON(rawToChar(response_obj$content))
df <- as_tibble(parsed$results)
return(list(df$ticker[[1]],df$name[[1]]))
}
This function provides details of the ticker for a given market. The user can provide the value for market to this function. The possible values of market can be stocks, crypto, fx or otc. This function returns the primary exchange and type of ticker information for each security in each market.
get_ticker_info <- function(market){
url=paste0("https://api.polygon.io/v3/reference/tickers?market=",market,"&active=true&sort=ticker&order=asc&limit=1000&apiKey=EdkA7_m2JhjS5POrGuXJbVlA4AjSl_4F")
response_obj <- GET(url)
parsed<- fromJSON(rawToChar(response_obj$content))
df <- as_tibble(parsed$results) %>% select(ticker,name,primary_exchange,type)
return(df)
}
Grouped Daily (Bars) Endpoint
Get the daily open, high, low, and close (OHLC) for the entire universe of stocks
This function takes in date as an input from the user and returns the open, high, low, and close (OHLC) for the entire stocks markets for that particular date.
get_grouped_daily <- function(date="2020-10-14"){
url=paste0("https://api.polygon.io/v2/aggs/grouped/locale/us/market/stocks/",date,"?adjusted=true&include_otc=true&apiKey=EdkA7_m2JhjS5POrGuXJbVlA4AjSl_4F")
response_obj <- GET(url)
parsed <- fromJSON(rawToChar(response_obj$content))
df <- as_tibble(parsed$results)
df <- df %>% rename(Ticker=T,volume=v,weighted_avg_price=vw,open_price=o,
close_price=c,highest_price=h,lowest_price=l,
num_transactions=n,timestamp=t)
df$date <- date
return(df)
}
Ticker Types Endpoint
This function is used to get the metadata information of all the ticker types that Polygon.io has data for.
get_ticker_type_details <- function(){
response_obj <- GET("https://api.polygon.io/v3/reference/tickers/types?apiKey=EdkA7_m2JhjS5POrGuXJbVlA4AjSl_4F")
parsed <- fromJSON(rawToChar(response_obj$content))
df <- as_tibble(parsed$results)
return(df)
}
Exchanges Endpoint
This function is used to get metadata information of all the stock exchanges that Polygon.io has data for.
get_exchanges_details <- function(){
response_obj <- GET("https://api.polygon.io/v3/reference/exchanges?asset_class=stocks&apiKey=EdkA7_m2JhjS5POrGuXJbVlA4AjSl_4F")
parsed <- fromJSON(rawToChar(response_obj$content))
df <- as_tibble(parsed$results)
return(df)
}
Here we have called the data of 3 stocks namely Apple, Tesla and Nvidia
from 1 Jan 2022 to 31 August 2022.
To find the stock information for Apple, we have passed “Apple” as an
input argument to get_ticker
function. This function returns a list
ticker_symbol1
which consists of ticker symbol “AAPL” and name of the
company which is “Apple Inc.”. The same was repeated for the other 2
symbols as well.
This information along with start date and end date is passed to the
function get_stocks_agg
. The limit argument is not passed, so the
function takes the default value of 50. This function fetches the stock
information for Apple in the given date range and returns a data frame
df1
. Similarly, this is done to get the stocks information for Tesla
and Nvidia in df2
and df3
respectively.
ticker_symbol1 <- get_ticker(name="Apple")
df1 <- get_stocks_agg(ticker_symbol1[[1]],ticker_symbol1[[2]],start_date="2022-01-01",end_date="2022-08-31")
df1
## # A tibble: 50 x 12
## ticker company_name vol weighted_avg_price open_p~1 close~2 highe~3 lowes~4 times~5 num_t~6 start~7 end_d~8
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <chr> <chr>
## 1 AAPL Apple Inc. 104677470 181. 178. 182. 183. 178. 1.64e12 772691 2022-0~ 2022-0~
## 2 AAPL Apple Inc. 99110438 181. 183. 180. 183. 179. 1.64e12 831890 2022-0~ 2022-0~
## 3 AAPL Apple Inc. 94535602 177. 180. 175. 180. 175. 1.64e12 848513 2022-0~ 2022-0~
## 4 AAPL Apple Inc. 96882954 173. 173. 172 175. 172. 1.64e12 960340 2022-0~ 2022-0~
## 5 AAPL Apple Inc. 86709147 172. 173. 172. 174. 171. 1.64e12 716881 2022-0~ 2022-0~
## 6 AAPL Apple Inc. 106754551 170. 169. 172. 172. 168. 1.64e12 956337 2022-0~ 2022-0~
## 7 AAPL Apple Inc. 76138312 174. 172. 175. 175. 171. 1.64e12 649652 2022-0~ 2022-0~
## 8 AAPL Apple Inc. 74805173 176. 176. 176. 177. 175. 1.64e12 642756 2022-0~ 2022-0~
## 9 AAPL Apple Inc. 84405760 174. 176. 172. 177. 172. 1.64e12 692343 2022-0~ 2022-0~
## 10 AAPL Apple Inc. 80440778 172. 171. 173. 174. 171. 1.64e12 672552 2022-0~ 2022-0~
## # ... with 40 more rows, and abbreviated variable names 1: open_price, 2: close_price, 3: highest_price,
## # 4: lowest_price, 5: timestamp, 6: num_transactions, 7: start_date, 8: end_date
ticker_symbol2 <- get_ticker(name="Tesla")
df2 <- get_stocks_agg(ticker_symbol2[[1]],ticker_symbol2[[2]],start_date="2022-01-01",end_date="2022-08-31")
df2
## # A tibble: 50 x 12
## ticker company_name vol weigh~1 open_~2 close~3 highe~4 lowes~5 times~6 num_t~7 start~8 end_d~9
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <chr> <chr>
## 1 TSLA Tesla, Inc. Common Stock 104686035 390. 383. 400. 400. 379. 1.64e12 1162844 2022-0~ 2022-0~
## 2 TSLA Tesla, Inc. Common Stock 99798258 387. 397. 383. 403. 374. 1.64e12 1051467 2022-0~ 2022-0~
## 3 TSLA Tesla, Inc. Common Stock 80119797 376. 382. 363. 390. 360. 1.64e12 811988 2022-0~ 2022-0~
## 4 TSLA Tesla, Inc. Common Stock 90324504 353. 359 355. 363. 340. 1.64e12 880974 2022-0~ 2022-0~
## 5 TSLA Tesla, Inc. Common Stock 83999748 346. 360. 342. 360. 337. 1.64e12 823560 2022-0~ 2022-0~
## 6 TSLA Tesla, Inc. Common Stock 91814877 339. 333. 353. 353. 327. 1.64e12 971558 2022-0~ 2022-0~
## 7 TSLA Tesla, Inc. Common Stock 66045210 353. 351. 355. 359. 346. 1.64e12 644108 2022-0~ 2022-0~
## 8 TSLA Tesla, Inc. Common Stock 83739015 365. 360. 369. 372. 358. 1.64e12 761538 2022-0~ 2022-0~
## 9 TSLA Tesla, Inc. Common Stock 96909162 356. 370. 344. 372. 342. 1.64e12 924351 2022-0~ 2022-0~
## 10 TSLA Tesla, Inc. Common Stock 72916011 345. 340. 350. 351. 338. 1.64e12 710334 2022-0~ 2022-0~
## # ... with 40 more rows, and abbreviated variable names 1: weighted_avg_price, 2: open_price, 3: close_price,
## # 4: highest_price, 5: lowest_price, 6: timestamp, 7: num_transactions, 8: start_date, 9: end_date
ticker_symbol3 <- get_ticker(name="Nvidia")
df3 <- get_stocks_agg(ticker_symbol3[[1]],ticker_symbol3[[2]],start_date="2022-01-01",end_date="2022-08-31")
df3
## # A tibble: 50 x 12
## ticker company_name vol weighted_avg_price open_pr~1 close~2 highe~3 lowes~4 times~5 num_t~6 start~7 end_d~8
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <chr> <chr>
## 1 NVDA Nvidia Corp 39240294 302. 298. 301. 307. 298. 1.64e12 585469 2022-0~ 2022-0~
## 2 NVDA Nvidia Corp 52715440 291. 303. 293. 305. 283. 1.64e12 894297 2022-0~ 2022-0~
## 3 NVDA Nvidia Corp 49806388 283. 289. 276. 294. 275. 1.64e12 836624 2022-0~ 2022-0~
## 4 NVDA Nvidia Corp 45418636 280. 276. 282. 284. 271. 1.64e12 725603 2022-0~ 2022-0~
## 5 NVDA Nvidia Corp 40993851 275. 281. 272. 284. 271. 1.64e12 639610 2022-0~ 2022-0~
## 6 NVDA Nvidia Corp 59461560 264. 266. 274 275. 256. 1.64e12 991811 2022-0~ 2022-0~
## 7 NVDA Nvidia Corp 40408929 275. 273. 278. 281. 268. 1.64e12 572165 2022-0~ 2022-0~
## 8 NVDA Nvidia Corp 38341346 281. 281. 280. 286. 276. 1.64e12 562208 2022-0~ 2022-0~
## 9 NVDA Nvidia Corp 53857879 271. 284. 266. 285. 265. 1.64e12 845316 2022-0~ 2022-0~
## 10 NVDA Nvidia Corp 39583233 268. 263 269. 272. 262. 1.64e12 620045 2022-0~ 2022-0~
## # ... with 40 more rows, and abbreviated variable names 1: open_price, 2: close_price, 3: highest_price,
## # 4: lowest_price, 5: timestamp, 6: num_transactions, 7: start_date, 8: end_date
Here we have combined df1, df2 and df3 into vertically into a data frame
df_combined
. This gives us all the stock information for the 3
companies in a single data frame. This data frame is further used to
plot the close_price for the 3 companies in a given date range.
For this the time stamp on x-axis is in Unix Msec. We tried to convert
it into Human readable datetime format using multiple ways but we were
unable to do it due deadline for the project. We are sure we would have
gotten a breakthrough had we worked more on this.
df_combined <- bind_rows(df1, df2, df3)
start_date = as.character(unique(df_combined$start_date))
end_date = as.character(unique(df_combined$end_date))
#Plot
ggplot(df_combined,aes(x=timestamp,y=close_price)) +
geom_line(aes(color=company_name),size=1) +
labs(x="Time",y="Closing price",
title="Closing stock price over time for Apple, Nvidia and Tesla") +
scale_color_discrete(name = "Company Name")+
theme(plot.title = element_text(hjust = 0.5))
Here we have called get_grouped_daily
function to get the open, high,
low, and close (OHLC) for the entire stocks markets on 16 Nov 2020 (a
random date which user can select).
We have added a new variable percent_change which is the percent
rise/decline in the stock price throughout that day (Considered
open_price and close_price for calculating this). We have arranged the
data frame in the descending order of percent_change, and hence at the
top of the data frame we have tickers which have the maximum gain in
stock price and at the bottom we have tickers which have the maximum
loss.
We have also added percent_change_chr variable, where we have
coerced percent_change as character which used to make the plot ahead
more descriptive.
df_grouped <- get_grouped_daily("2020-11-16")
df_grouped <- df_grouped %>%
mutate(percent_change = round(((close_price-open_price)/open_price)*100,2)) %>%
arrange(desc(percent_change))
df_grouped$percent_change_chr <- paste(as.character(df_grouped$percent_change),'%')
df_grouped %>% select(Ticker,open_price,close_price,percent_change_chr,everything())
## # A tibble: 9,084 x 12
## Ticker open_price close_price percent_change_chr volume weight~1 highe~2 lowes~3 times~4 num_t~5 date perce~6
## <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <chr> <dbl>
## 1 ZXZZT 10.3 20.0 93.31 % 39298 13.3 20.0 10.3 1.61e12 332 2020~ 93.3
## 2 AIRTW 0.0261 0.0395 51.34 % 49231 0.0374 0.0399 0.0261 1.61e12 19 2020~ 51.3
## 3 WWR 4.31 6.27 45.48 % 36001534 5.43 6.3 4.22 1.61e12 117274 2020~ 45.5
## 4 CBAT 7.9 11.3 43.04 % 107542895 9.22 11.4 7.15 1.61e12 463824 2020~ 43.0
## 5 PPSI 3.02 4.29 42.05 % 12071799 4.02 4.48 3.02 1.61e12 43723 2020~ 42.0
## 6 ITACW 0.350 0.48 37.1 % 16290 0.401 0.48 0.350 1.61e12 10 2020~ 37.1
## 7 BLNKW 5.74 7.86 36.89 % 283685 7.35 8.23 5.74 1.61e12 1493 2020~ 36.9
## 8 NBACW 0.51 0.66 29.41 % 565422 0.587 0.680 0.51 1.61e12 1162 2020~ 29.4
## 9 SGOC 0.92 1.19 29.35 % 33103 1.06 1.19 0.92 1.61e12 98 2020~ 29.4
## 10 KLR.WS 1.06 1.36 28.29 % 3978 1.10 1.36 1.06 1.61e12 7 2020~ 28.3
## # ... with 9,074 more rows, and abbreviated variable names 1: weighted_avg_price, 2: highest_price,
## # 3: lowest_price, 4: timestamp, 5: num_transactions, 6: percent_change
Here we have used head()
and tail()
to get the stock information of
10 tickers having the highest percent gain in df_top10
and 10 tickers
having the highest percent loss in df_bottom10
. This information is
plotted using geom_col()
.
As mentioned above percent_change_chr is used to add text on top
of the columns using geom_text()
.
On the x-axis we could not map the ticker symbol to the company name
(which would have made the plot clearer) because there is a max limit of
1000 on the Ticker
Endpoint. So
we do not get all the data and hence there is a possibility that we have
a ticker symbol in df_grouped
for which we have not been able to fetch
the company name.
#Top-10 gains
df_top10 <- head(df_grouped, 10)
df_top10$Ticker <- factor(df_top10$Ticker,
levels=df_top10$Ticker[order(-df_top10$percent_change)])
# Plot
date <- unique(df_top10$date)
ggplot(df_top10, aes(x=Ticker, y=percent_change)) +
geom_col(width=0.3, color='steelblue', fill='steelblue') +
theme(axis.text.x=element_text(angle=90), text=element_text(size=12),
plot.title = element_text(hjust = 0.5)) +
labs(y="Percent increase", x ="Stock ticker",
title = paste0("Highest stock price increase on ",date)) +
geom_text(aes(label = percent_change_chr), vjust = -0.5, size=3)
#Top-10 losses
df_bottom10 <- tail(df_grouped, 10)
df_bottom10$percent_change <- abs(df_bottom10$percent_change)
df_bottom10$Ticker <- factor(df_bottom10$Ticker,
levels=df_bottom10$Ticker[order(-df_bottom10$percent_change)])
# Plot
date = unique(df_bottom10$date)
ggplot(df_bottom10, aes(x=Ticker, y=percent_change)) +
geom_col(width=0.3, color='red', fill='red') +
theme(axis.text.x = element_text(angle=90),text = element_text(size=12),
plot.title = element_text(hjust = 0.5)) +
labs(y="Percent decrease", x ="Stock ticker",
title = paste0("Highest stock price decrease on ",date)) +
geom_text(aes(label=percent_change_chr), vjust =-0.5, size=3)
From the plot above we can see that on 16th Nov 2020, the stock price
for ZXZZT(NASDAQ TEST STOCK) noticed maximum gain of 93.91%. After that,
the highest gain was noticed by AIRTW (Air T, Inc.) which was 51.34% and
WWR(Westwater Resources, Inc.) which was 45.48%.
On similar lines, the stock price for KTOVW(Kitov Pharma Ltd. Warrants)
noticed highest loss of -43.93%. Thus this plot allows the user to find
out the biggest gainers and losers on any given day.
Here we have used get_ticker_info
function to get ticker information
such as its type and its primary exchange for all the tickers supported
by Polygon.io. in the stock market. Hence the input argument to the
function call is “stocks”.
Here we have created a contingency table using ticker type information
for the stock market. When fetching data from the Ticker
Endpoint in
df_info
, we get the abbrevations of ticker types. In order to fetch
the descriptions of ticker type (For example: CS means Common
Stocks) we have fetched data from the Ticker Types
Endpoint
in df_tickertype_metadata
. Then we took the left_join
of the two
tibbles in order to get all the information in a single tibble which is
then used to create the contingency table.
df_info <- get_ticker_info("stocks")
Sys.sleep(5)
df_tickertype_metadata <- get_ticker_type_details() %>% select(code,description)
Sys.sleep(5)
df_tables <- left_join(df_info,df_tickertype_metadata,by=c("type"="code"))
tab1 <- table(df_tables$description,dnn=c("Ticker Types"))
tab1 %>%
kbl(caption="Table for Ticker Types") %>%
kable_classic(full_width = F)
Ticker.Types | Freq |
---|---|
American Depository Receipt Common | 40 |
Common Stock | 561 |
Exchange Traded Fund | 110 |
Exchange Traded Note | 5 |
Fund | 15 |
Preferred Stock | 56 |
Rights | 9 |
Structured Product | 16 |
Unit | 72 |
Warrant | 106 |
From the table we can see that the number of CS (Common Stock) is 561 and the number of ARDC (American Depository Receipt Common) is 40.
Similarly here we have created a two way contingency table for the
number of ticker types for each stock exchange. When fetching data from
the Ticker
Endpoint in
df_info
, we get the abbrevations stock exchanges. Here we tried to use
Exchange
Endpoint to
get the full name of the stock exchanges (For example: XNYS is
NYSE American, LLC) but there we multiple exchange information for
each exchange which made the table very complicated to understand. For
that reason we let the exchange abbrevations be.
tab2 <- table(df_tables$description,df_info$primary_exchange,dnn=c("Ticker Types","Exchanges"))
tab2 %>%
kbl(caption="Table for Ticker Types and Exchanges") %>%
kable_classic(full_width = F)
ARCX | BATS | XASE | XNAS | XNYS | |
---|---|---|---|---|---|
American Depository Receipt Common | 0 | 0 | 1 | 23 | 16 |
Common Stock | 0 | 0 | 19 | 359 | 183 |
Exchange Traded Fund | 71 | 17 | 0 | 22 | 0 |
Exchange Traded Note | 5 | 0 | 0 | 0 | 0 |
Fund | 0 | 0 | 1 | 0 | 14 |
Preferred Stock | 0 | 0 | 0 | 8 | 48 |
Rights | 0 | 0 | 0 | 9 | 0 |
Structured Product | 0 | 0 | 0 | 0 | 16 |
Unit | 0 | 0 | 2 | 51 | 19 |
Warrant | 0 | 0 | 3 | 78 | 25 |
From the table above we can see that there are 359 Common Stock type tickers in the XNYS (New York stock exchange). And similarly this table helps us to identify the total number of each ticker type in each exchange.
Here we have created summary Statistics for open_price and close_price for each company between 2022-01-01 and 2022-08-31. The summary statistics includes minimum, maximum, median, mean, quartile and standard deviation of prices for the 3 stocks.
# Open Price
df_combined_open <- df_combined %>%
group_by(company_name) %>%
summarise("Min." = min(open_price),
"1st Quartile" = quantile(open_price,0.25),
"Median." = median(open_price),
"Mean."=mean(open_price),
"3rd Quartile" = quantile(open_price,0.75),
"Max."= max(open_price),
"Std. Dev." = sd(open_price))
# Close Price
df_combined_close <- df_combined %>%
group_by(company_name) %>%
summarise("Min." = min(close_price),
"1st Quartile" = quantile(close_price,0.25),
"Median." = median(close_price),
"Mean."=mean(close_price),
"3rd Quartile" = quantile(close_price,0.75),
"Max."= max(close_price),
"Std. Dev." = sd(close_price))
df_combined_open %>%
kbl(caption=paste0("Summary Statistics for Open Price per Company between ",start_date," and ",end_date)) %>%
kable_classic()
company\_name | Min. | 1st Quartile | Median. | Mean. | 3rd Quartile | Max. | Std. Dev. |
---|---|---|---|---|---|---|---|
Apple Inc. | 150.9000 | 163.5850 | 169.4500 | 167.8854 | 172.6075 | 182.6300 | 7.00711 |
Nvidia Corp | 210.1500 | 231.9650 | 243.2500 | 247.4482 | 260.5800 | 302.7700 | 22.16117 |
Tesla, Inc. Common Stock | 233.4633 | 284.2208 | 302.3166 | 308.5580 | 333.0283 | 396.5167 | 35.48938 |
df_combined_close %>%
kbl(caption=paste0("Summary Statistics for Close Price per Company between ",start_date," and ",end_date)) %>%
kable_classic()
company\_name | Min. | 1st Quartile | Median. | Mean. | 3rd Quartile | Max. | Std. Dev. |
---|---|---|---|---|---|---|---|
Apple Inc. | 150.62 | 162.7925 | 168.7600 | 167.6362 | 172.7300 | 182.0100 | 6.987583 |
Nvidia Corp | 213.30 | 231.0350 | 242.4350 | 246.5286 | 263.4700 | 301.2100 | 21.101257 |
Tesla, Inc. Common Stock | 254.68 | 280.3517 | 302.1667 | 306.7592 | 327.5708 | 399.9267 | 34.140396 |
The above numerical summaries can be visualized using a box plot. Here we have created a box plot for highest stock price and **lowest stock *price** for each company between 2022-01-01 and 2022-08-31.
ggplot(df_combined, aes(x=company_name, y=highest_price)) +
geom_boxplot(color="blue",fill="grey") +
labs(y="Highest price", x ="Company Name",
title=paste0("Boxplot for highest stock price between ",start_date," and ",end_date))
ggplot(df_combined, aes(x=company_name, y=lowest_price)) +
geom_boxplot(color="red",fill="grey") +
labs(y="Lowest price", x ="Company Name",
title=paste0("Boxplot for lowest stock price between ",start_date," and ",end_date))
Here we have used the df_combined
to create a histogram for the
weighted average stock price for each company.
wrapper <- function(x, ...)
{
paste(strwrap(x, ...), collapse = "\n")
}
my_title <- paste0("Histogram of weighted average price per company between ",start_date," and ",end_date)
ggplot(df_combined, aes(x=weighted_avg_price)) +
geom_histogram(aes(fill=company_name),binwidth=8) +
labs(x ="Weighted average price") +
scale_fill_discrete(name = "Company Name") +
ggtitle(wrapper(my_title, width=80))
Here we have used the one way contingency table created in above sections to create a bar plot with the type of tickers on the x axis.
ggplot(df_tables, aes(x=description)) +
geom_bar(fill="steelblue") +
labs(x ="Type of tickers",
title="Bar plot for number of stock tickers for each type") +
theme(text=element_text(size=14), plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 90)) +
geom_text(aes(label = ..count..), stat = "count", vjust = -0.5)
The scatter plot has been used to plot the percentage change in returns for the stocks Apple, Tesla and Nvidia for a date range and can be differentiated based on the different colours. From the plot, we can see how varied returns have been on a daily basis but if we look carefully we can see that there is a certain correlation in the returns and on most days the movement for all 3 technology stocks is in the same direction of either gaining or losing
df_comb <- df_combined %>% mutate(percent_change = round(((close_price-open_price)/open_price)*100,2))
ggplot(df_comb, aes(x=timestamp, y=percent_change,color=company_name)) +
geom_point()+
facet_grid(cols = vars(company_name)) +
scale_color_discrete(name = "Company Name")
This vignette mainly focuses on communicating with the REST API Endpoints, creating functions in R to fetch data from the endpoints and basic data exploration- including numerical summaries, contingency tables and plots.