README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# ptools

<!-- badges: start -->
<!-- badges: end -->

The library ptools is a set of helper functions I have used over time to help with analyzing count data, e.g. crime counts per month. 

## Installation

To install the most recent version from CRAN, it is simply:

    install.packages('ptools')

You can install the current version on github using devtools:

    library(devtools)
    install_github("apwheele/ptools", build_vignettes = TRUE)
    library(ptools) # Hopefully works!

## Examples

Here is checking the difference in two Poisson means using an e-test:

```{r example1}
library(ptools)
e_test(6,2)
```

Here is the Wheeler & Ratcliffe WDD test (see `help(wdd)` for academic references):

```{r example2}
wdd(c(20,20),c(20,10))
```

Here is a quick example applying a small sample Benford's analysis:

```{r example3}
# Null probs for Benfords law
f <- 1:9
p_fd <- log10(1 + (1/f)) #first digit probabilities
# Example 12 purchases on my credit card
purch <- c( 72.00,
           328.36,
            11.57,
            90.80,
            21.47,
             7.31,
             9.99,
             2.78,
            10.17,
             2.96,
            27.92,
            14.49)
#artificial numbers, 72.00 is parking at DFW, 9.99 is Netflix
fdP <- substr(format(purch,trim=TRUE),1,1)
totP <- table(factor(fdP, levels=paste(f)))
resG_P <- small_samptest(d=totP,p=p_fd,type="G")
print(resG_P) # I have a nice print function
```

Here is an example checking the Poisson fit for a set of data:

```{r example4}
x <- rpois(1000,0.5)
check_pois(x,0,max(x),mean(x))
```

Here is an example extracting out near repeat strings (this is improved version [from an old blog post](https://andrewpwheeler.com/2017/04/12/identifying-near-repeat-crime-strings-in-r-or-python/) using kdtrees):

```{r example5}
# Not quite 15k rows for burglaries from motor vehicles
bmv <- read.csv('https://dl.dropbox.com/s/bpfd3l4ueyhvp7z/TheftFromMV.csv?dl=0')
print(Sys.time()) 
BigStrings <- near_strings2(dat=bmv,id='incidentnu',x='xcoordinat',
                            y='ycoordinat',tim='DateInt',DistThresh=1000,TimeThresh=3)
print(Sys.time()) #very fast, only a few seconds on my machine
print(head(BigStrings))
```

## Contributing

Always feel free to contribute either directly on Github, or email me with thoughts/suggestions. For citations for functions used, feel free to cite the original papers I reference in the functions instead of the package directly. 

Things on the todo list:

 - Tests for spatial feature engineering
 - Figure out [no long doubles issues](https://andrewpwheeler.com/2022/07/22/my-journey-submitting-to-cran/) for small sample tests
 - Conversion so functions can take both sp/sf objects
 - Poisson z-score and weekly aggregation functions
 - Potential geo functions
    - HDR raster
    - Leaflet helpers