Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bibliography API and listing functionality #320

Merged
merged 13 commits into from
Jan 10, 2025
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
- V 1.6.1.0:
- Added a feature to list bibliography information via `trident list --bibliography`.
- Added a new Server API `/bibliography` to serve bibliography information via HTTP.
- V 1.6.0.0:
- Added support to write gzipped EIGENSTRAT and PLINK files with `genoconvert` and `forge`. Both commands get a new option `-z` which creates gzipped output.
- V 1.5.7.4:
Expand Down
6 changes: 3 additions & 3 deletions poseidon-hs.cabal
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: poseidon-hs
version: 1.6.0.0
version: 1.6.1.0
synopsis: A package with tools for working with Poseidon genotype data
description: The tools in this package read and analyse Poseidon-formatted genotype databases, a modular system for storing genotype data from thousands of individuals.
license: MIT
Expand Down Expand Up @@ -28,11 +28,11 @@ library
build-depends: base >= 4.7 && < 5, sequence-formats>=1.6.1, text, time, pipes-safe,
exceptions, pipes, bytestring, filepath, yaml, aeson, directory, parsec,
vector, pipes-ordered-zip, table-layout<1.0.0.0, mtl, split, warp, warp-tls, wai-cors,
-- scotty is set to < 0.20.0, because the current stackage resolver 21.17 does not have the latest version, but cabal uses it in the release action, if this constraint is not set
scotty < 0.20.0, cassava, pureMD5, wai, githash,
http-conduit, conduit, http-types, zip-archive,
unordered-containers, network-uri, optparse-applicative, co-log, regex-tdfa,
scientific, country, generics-sop, containers, process, deepseq, template-haskell
scientific, country, generics-sop, containers, process, deepseq, template-haskell,
MissingH
default-language: Haskell2010

executable trident
Expand Down
30 changes: 27 additions & 3 deletions src/Poseidon/BibFile.hs
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
{-# LANGUAGE OverloadedStrings #-}
module Poseidon.BibFile (dummyBibEntry, readBibTeXFile, writeBibTeXFile, BibTeX, BibEntry(..)) where
module Poseidon.BibFile (dummyBibEntry, readBibTeXFile, writeBibTeXFile, BibTeX, BibEntry(..), parseAuthors, authorAbbrvString) where

import Poseidon.Utils (PoseidonException (..),
showParsecErr)

import Control.Exception (throwIO)
import Control.Monad (forM_, liftM2, liftM3)
import Control.Monad (forM, forM_, liftM2,
liftM3)
import Control.Monad.Catch (MonadThrow, throwM)
import Data.List (intercalate)
import Data.List.Split (splitOn)
import Data.String.Utils (strip)
import System.IO (IOMode (..), hPutStrLn,
withFile)
import Text.Parsec (between, char, many, many1,
Expand Down Expand Up @@ -53,7 +58,7 @@
readBibTeXFile bibPath = do
res <- parseFromFile bibFileParser bibPath
case res of
Left err -> throwIO $ PoseidonBibTeXException bibPath $ showParsecErr err
Left err -> throwIO . PoseidonBibTeXException $ "In file " ++ bibPath ++ ": " ++ showParsecErr err

Check warning on line 61 in src/Poseidon/BibFile.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/BibFile.hs#L61

Added line #L61 was not covered by tests
Right res_ -> return res_

{-
Expand Down Expand Up @@ -143,3 +148,22 @@
comma :: CharParser st String
comma = T.comma lexer

parseAuthors :: (MonadThrow m) => String -> m [(String, String)] -- parses a string of authors to a list of first names and last names
parseAuthors authorString = forM (splitOn " and " (intercalate " " . map strip . lines $ authorString)) $ \singleAuthorStr -> do
case splitOn ", " singleAuthorStr of
[lastName, firstName] -> return (firstName, lastName)
[firstName] -> return (firstName, "") -- in some cultures there are single first names, e.g. "Nini"
_ -> throwM . PoseidonBibTeXException $ "cannot parse bib-author " ++ singleAuthorStr

Check warning on line 156 in src/Poseidon/BibFile.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/BibFile.hs#L156

Added line #L156 was not covered by tests

authorAbbrvString :: [(String, String)] -> String
authorAbbrvString [] = ""
authorAbbrvString [firstAuthor] = renderAuthor True firstAuthor
authorAbbrvString [firstAuthor, secondAuthor] = renderAuthor False firstAuthor ++ " and " ++ renderAuthor False secondAuthor
authorAbbrvString (firstAuthor : _) = renderAuthor False firstAuthor ++ " et al."

-- first argument is whether to render an author in full, or abbreviated.
renderAuthor :: Bool -> (String, String) -> String
renderAuthor True (first, "") = first -- there are edge cases where only first names exist, like in Consortia
renderAuthor True (first, family) = first ++ " " ++ family
renderAuthor False (first, "") = first

Check warning on line 168 in src/Poseidon/BibFile.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/BibFile.hs#L168

Added line #L168 was not covered by tests
renderAuthor False (first, family) = [head first] ++ " " ++ family
77 changes: 68 additions & 9 deletions src/Poseidon/CLI/List.hs
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,27 @@

module Poseidon.CLI.List (runList, ListOptions(..), ListEntity(..), RepoLocationSpec(..)) where

import Poseidon.BibFile (authorAbbrvString, parseAuthors)
import Poseidon.EntityTypes (HasNameAndVersion (..))
import Poseidon.Package (PackageReadOptions (..),
defaultPackageReadOptions,
getAllGroupInfo,
getAllGroupInfo, getBibliographyInfo,
getExtendedIndividualInfo,
packagesToPackageInfos,
readPoseidonPackageCollection)
import Poseidon.ServerClient (AddJannoColSpec (..),
ApiReturnData (..),
import Poseidon.ServerClient (AddColSpec (..), ApiReturnData (..),
ArchiveEndpoint (..),
BibliographyInfo (..),
ExtendedIndividualInfo (..),
GroupInfo (..), PackageInfo (..),
processApiResponse, qDefault)
import Poseidon.Utils (PoseidonIO, logInfo, logWarning)

import Control.Monad (forM_, when)
import Control.Monad.IO.Class (liftIO)
import Data.List (intercalate, sortOn)
import Data.List (intercalate, nub, sortOn)
import Data.Maybe (catMaybes, fromMaybe)
import qualified Data.Text as T
import Data.Version (Version, showVersion)

import Text.Layout.Table (asciiRoundS, column, def, expandUntil,
Expand All @@ -39,7 +41,8 @@
-- | A datatype to represent the options what to list
data ListEntity = ListPackages
| ListGroups
| ListIndividuals AddJannoColSpec
| ListIndividuals AddColSpec
| ListBibliography AddColSpec

-- | The main function running the list command
runList :: ListOptions -> PoseidonIO ()
Expand Down Expand Up @@ -95,9 +98,9 @@
RepoRemote (ArchiveEndpoint remoteURL archive) -> do
logInfo "Downloading individual data from server"
let addJannoColFlag = case addJannoColSpec of
AddJannoColAll -> "&additionalJannoColumns=ALL"
AddJannoColList [] -> ""
AddJannoColList moreJannoColumns -> "&additionalJannoColumns=" ++ intercalate "," moreJannoColumns
AddColAll -> "&additionalJannoColumns=ALL"
AddColList [] -> ""

Check warning on line 102 in src/Poseidon/CLI/List.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/CLI/List.hs#L102

Added line #L102 was not covered by tests
AddColList moreJannoColumns -> "&additionalJannoColumns=" ++ intercalate "," moreJannoColumns
apiReturn <- processApiResponse (remoteURL ++ "/individuals" ++ qDefault archive ++ addJannoColFlag) False
case apiReturn of
ApiReturnExtIndividualInfo indInfo -> return indInfo
Expand All @@ -114,7 +117,7 @@
-- we only output this warning if the columns were requested explicitly. Not if
-- all columns were requested. We consider such an "all" request to mean "all columns that are present".
case addJannoColSpec of
AddJannoColList (_:_) -> do
AddColList (_:_) -> do
forM_ (zip [0..] addJannoCols) $ \(i, columnKey) -> do
-- check entries in all individuals for that key
let nonEmptyEntries = catMaybes [snd (entries !! i) | ExtendedIndividualInfo _ _ _ _ entries <- extIndInfos]
Expand All @@ -129,6 +132,57 @@
showMaybeVersion (getPacVersion i), show isLatest] ++
map (fromMaybe "n/a" . snd) addColumnEntries
return (tableH, tableB)
ListBibliography addColSpec -> do
bibInfos <- case repoLocation of
RepoRemote (ArchiveEndpoint remoteURL archive) -> do
logInfo "Downloading bibliography data from server"
let addJannoColFlag = case addColSpec of
AddColAll -> "&additionalBibColumns=ALL"
AddColList [] -> ""
AddColList moreBibFields -> "&additionalBibColumns=" ++ intercalate "," moreBibFields
apiReturn <- processApiResponse (remoteURL ++ "/bibliography" ++ qDefault archive ++ addJannoColFlag) False
case apiReturn of
ApiReturnBibInfo bibInfo -> return bibInfo
_ -> error "should not happen"

Check warning on line 146 in src/Poseidon/CLI/List.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/CLI/List.hs#L137-L146

Added lines #L137 - L146 were not covered by tests
RepoLocal baseDirs -> do
pacCollection <- readPoseidonPackageCollection pacReadOpts baseDirs
getBibliographyInfo pacCollection addColSpec

let addBibFieldNames = case addColSpec of
AddColAll -> nub . concatMap (map fst . bibInfoAddCols) $ bibInfos
AddColList names -> names

Check warning on line 153 in src/Poseidon/CLI/List.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/CLI/List.hs#L153

Added line #L153 was not covered by tests

-- warning in case the additional Columns do not exist in the entire janno dataset,
-- we only output this warning if the columns were requested explicitly. Not if
-- all columns were requested. We consider such an "all" request to mean "all columns that are present".
case addColSpec of
AddColList (_:_) -> do
forM_ addBibFieldNames $ \bibFieldKey -> do

Check warning on line 160 in src/Poseidon/CLI/List.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/CLI/List.hs#L159-L160

Added lines #L159 - L160 were not covered by tests
-- check entries in all individuals for that key
let nonEmptyEntries = do
bibInfo <- bibInfos
Just (Just _) <- return $ bibFieldKey `lookup` bibInfoAddCols bibInfo
return ()
when (null nonEmptyEntries) . logWarning $
"Bibliography field " ++ bibFieldKey ++ "is not present in any bibliography entry"

Check warning on line 167 in src/Poseidon/CLI/List.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/CLI/List.hs#L162-L167

Added lines #L162 - L167 were not covered by tests
_ -> return ()

let tableH = ["BibKey", "Title", "Author", "Year", "DOI",
"Nr of samples"] ++ addBibFieldNames
tableB = do
bibInfo <- bibInfos
let addBibFieldColumns = do
bibFieldName <- addBibFieldNames
case bibFieldName `lookup` bibInfoAddCols bibInfo of
Just (Just v) -> return v
_ -> return ""

Check warning on line 178 in src/Poseidon/CLI/List.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/CLI/List.hs#L174-L178

Added lines #L174 - L178 were not covered by tests
authors <- parseAuthors . curateBibField . bibInfoAuthor $ bibInfo
return $ [bibInfoKey bibInfo, curateBibField $ bibInfoTitle bibInfo, authorAbbrvString authors,
curateBibField $ bibInfoYear bibInfo,
curateBibField $ bibInfoDoi bibInfo, show (bibInfoNrSamples bibInfo)] ++
addBibFieldColumns

Check warning on line 183 in src/Poseidon/CLI/List.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/CLI/List.hs#L183

Added line #L183 was not covered by tests
return (tableH, tableB)

if rawOutput then
liftIO $ putStrLn $ intercalate "\n" [intercalate "\t" row | row <- tableH:tableB]
else do
Expand All @@ -139,3 +193,8 @@
showMaybe = fromMaybe "n/a"
showMaybeVersion :: Maybe Version -> String
showMaybeVersion = maybe "n/a" showVersion

-- this function is necessary, as BibTeX sometimes has arbitrary line breaks within fields,
-- which we need to get rid of to avoid down-stream problems
curateBibField :: Maybe String -> String
curateBibField = T.unpack . T.intercalate " " . map T.strip . T.lines . T.pack . fromMaybe ""
22 changes: 17 additions & 5 deletions src/Poseidon/CLI/OptparseApplicativeParsers.hs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
GenotypeDataSpec (..),
GenotypeFileSpec (..),
SNPSetSpec (..))
import Poseidon.ServerClient (AddJannoColSpec (..),
import Poseidon.ServerClient (AddColSpec (..),
ArchiveEndpoint (..))
import Poseidon.Utils (ErrorLength (..), LogMode (..),
TestMode (..),
Expand Down Expand Up @@ -636,7 +636,10 @@
OP.value Nothing)

parseListEntity :: OP.Parser ListEntity
parseListEntity = parseListPackages <|> parseListGroups <|> (parseListIndividualsDummy *> parseListIndividualsExtraCols)
parseListEntity = parseListPackages <|>
parseListGroups <|>
(parseListIndividualsDummy *> parseListIndividualsExtraCols) <|>
(parseListBibliographyDummy *> parseListBibliographyExtraFields)

Check warning on line 642 in src/Poseidon/CLI/OptparseApplicativeParsers.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/CLI/OptparseApplicativeParsers.hs#L639-L642

Added lines #L639 - L642 were not covered by tests
where
parseListPackages = OP.flag' ListPackages (
OP.long "packages" <>
Expand All @@ -648,14 +651,23 @@
parseListIndividualsDummy = OP.flag' () (
OP.long "individuals" <>
OP.help "List all individuals/samples.")
parseListIndividualsExtraCols = ListIndividuals <$> (parseAllJannoCols <|> (AddJannoColList <$> OP.many parseExtraCol))
parseAllJannoCols = OP.flag' AddJannoColAll (OP.long "fullJanno" <> OP.help "output all Janno Columns")
parseListIndividualsExtraCols = ListIndividuals <$> (parseAllJannoCols <|> (AddColList <$> OP.many parseExtraCol))
parseAllJannoCols = OP.flag' AddColAll (OP.long "fullJanno" <> OP.help "output all Janno Columns")

Check warning on line 655 in src/Poseidon/CLI/OptparseApplicativeParsers.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/CLI/OptparseApplicativeParsers.hs#L654-L655

Added lines #L654 - L655 were not covered by tests
parseExtraCol = OP.strOption (
OP.short 'j' <>
OP.long "jannoColumn" <>
OP.metavar "COLNAME" <>
OP.help "List additional fields from the janno files, using the .janno column heading name, such as \
\\"Country\", \"Site\", \"Date_C14_Uncal_BP\", etc..")
\\"Country\", \"Site\", \"Date_C14_Uncal_BP\", etc... Can be given multiple times")
parseListBibliographyDummy = OP.flag' () (
OP.long "bibliography" <> OP.help "output bibliography information for packages")
parseListBibliographyExtraFields = ListBibliography <$>
(parseAllBibFields <|> (AddColList <$> OP.many parseExtraBibFields))
parseAllBibFields = OP.flag' AddColAll (OP.long "fullBib" <> OP.help "output all bibliography fields found in any bibliography item")
parseExtraBibFields = OP.strOption (
OP.short 'b' <> OP.long "bibField" <> OP.metavar "BIB-FIELD" <>
OP.help "List information from the given bibliography field, for example \"abstract\" or \"publisher\". Can \

Check warning on line 669 in src/Poseidon/CLI/OptparseApplicativeParsers.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/CLI/OptparseApplicativeParsers.hs#L662-L669

Added lines #L662 - L669 were not covered by tests
\be given multiple times.")

parseRawOutput :: OP.Parser Bool
parseRawOutput = OP.switch (
Expand Down
22 changes: 18 additions & 4 deletions src/Poseidon/CLI/Serve.hs
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,12 @@
PoseidonPackage (..),
defaultPackageReadOptions,
getAllGroupInfo,
getBibliographyInfo,
getExtendedIndividualInfo,
packagesToPackageInfos,
readPoseidonPackageCollection)
import Poseidon.PoseidonVersion (minimalRequiredClientVersion)
import Poseidon.ServerClient (AddJannoColSpec (..),
import Poseidon.ServerClient (AddColSpec (..),
ApiReturnData (..),
ServerApiReturnType (..))
import Poseidon.Utils (LogA, PoseidonIO, envLogAction,
Expand Down Expand Up @@ -123,14 +124,27 @@
pacs <- getItemFromArchiveStore archiveStore
maybeAdditionalColumnsString <- (Just <$> param "additionalJannoColumns") `rescue` (\_ -> return Nothing)
indInfo <- case maybeAdditionalColumnsString of
Just "ALL" -> getExtendedIndividualInfo pacs AddJannoColAll -- Nothing means all Janno Columns
Just "ALL" -> getExtendedIndividualInfo pacs AddColAll -- Nothing means all Janno Columns
Just additionalColumnsString ->
let additionalColumnNames = splitOn "," additionalColumnsString
in getExtendedIndividualInfo pacs (AddJannoColList additionalColumnNames)
Nothing -> getExtendedIndividualInfo pacs (AddJannoColList [])
in getExtendedIndividualInfo pacs (AddColList additionalColumnNames)
Nothing -> getExtendedIndividualInfo pacs (AddColList [])
let retData = ApiReturnExtIndividualInfo indInfo
return $ ServerApiReturnType [] (Just retData)

get "/bibliography" . conditionOnClientVersion $ do
logRequest logA
pacs <- getItemFromArchiveStore archiveStore
maybeAdditionalBibFieldsString <- (Just <$> param "additionalBibColumns") `rescue` (\_ -> return Nothing)
bibInfo <- case maybeAdditionalBibFieldsString of
Just "ALL" -> getBibliographyInfo pacs AddColAll -- Nothing means all Janno Columns

Check warning on line 140 in src/Poseidon/CLI/Serve.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/CLI/Serve.hs#L136-L140

Added lines #L136 - L140 were not covered by tests
Just additionalBibFieldsString ->
let additionalBibFields = splitOn "," additionalBibFieldsString
in getBibliographyInfo pacs (AddColList additionalBibFields)
Nothing -> getBibliographyInfo pacs (AddColList [])
let retData = ApiReturnBibInfo bibInfo
return $ ServerApiReturnType [] (Just retData)

Check warning on line 146 in src/Poseidon/CLI/Serve.hs

View check run for this annotation

Codecov / codecov/patch

src/Poseidon/CLI/Serve.hs#L142-L146

Added lines #L142 - L146 were not covered by tests

-- API for retreiving package zip files
when (isJust maybeZipPath) . get "/zip_file/:package_name" $ do
logRequest logA
Expand Down
Loading
Loading