crawlee-one / Exports / CrawleeOneArgs
Args object passed to crawleeOne
Name | Type |
---|---|
TType |
extends CrawlerType |
T |
extends CrawleeOneCtx <CrawlerMeta <TType >["context" ]> |
- crawlerConfig
- crawlerConfigDefaults
- hooks
- input
- inputDefaults
- io
- mergeInput
- name
- proxy
- router
- routes
- telemetry
- type
• Optional
crawlerConfig: Omit
<CrawlerMeta
<TType
, CrawlingContext
<unknown
, Dictionary
>, Record
<string
, any
>>["options"
], "requestHandler"
>
Crawlee crawler configuration that CANNOT be overriden via input
and crawlerConfigDefaults
• Optional
crawlerConfigDefaults: Omit
<CrawlerMeta
<TType
, CrawlingContext
<unknown
, Dictionary
>, Record
<string
, any
>>["options"
], "requestHandler"
>
Crawlee crawler configuration that CAN be overriden via input
and crawlerConfig
• Optional
hooks: Object
Name | Type |
---|---|
onAfterHandler? |
CrawleeOneRouteHandler <T , CrawleeOneActorRouterCtx <T >> |
onBeforeHandler? |
CrawleeOneRouteHandler <T , CrawleeOneActorRouterCtx <T >> |
onReady? |
(actor : CrawleeOneActorInst <T >) => MaybePromise <void > |
validateInput? |
(input : null | AllActorInputs ) => MaybePromise <void > |
• Optional
input: Partial
<AllActorInputs
>
Input configuration that CANNOT be overriden via inputDefaults
and io.getInput()
• Optional
inputDefaults: Partial
<AllActorInputs
>
Input configuration that CAN be overriden via input
and io.getInput()
• Optional
io: T
["io"
]
Provide an instance that is responsible for state management:
- Adding scraped data to datasets
- Adding and removing requests to/from queues
- Cache storage
This is an API based on Apify's Actor
utility class, which is also
the default.
You don't need to override this in most of the cases.
By default, the data is saved and kept locally in
./storage
directory. And if the cralwer runs in Apify's platform
then it will use Apify's cloud for storage.
See CrawleeOneIO
• Optional
mergeInput: boolean
| (sources
: { defaults
: Partial
<AllActorInputs
> ; env
: Partial
<AllActorInputs
> ; overrides
: Partial
<AllActorInputs
> }) => MaybePromise
<Partial
<AllActorInputs
>>
If mergeInput
is truthy, will merge input settings from inputDefaults
, input
,
and io.getInput()
.
{ ...inputDefaults, ...io.getInput(), ...input }
If mergeInput
is falsy, io.getInput()
is ignored if input
is provided. So the input is either:
{ ...inputDefaults, ...io.getInput() } // If `input` is not defined
OR
{ ...inputDefaults, ...input } // If `input` is defined
Alternatively, you can supply your own function that merges the sources:
{
// `mergeInput` can be also async
mergeInput: ({ defaults, overrides, env }) => {
// This is same as `mergeInput: true`
return { ...defaults, ...env, ...overrides };
},
}
• Optional
name: string
Unique name of the crawler instance. The name may be used in codegen and logging.
• Optional
proxy: MaybeAsyncFn
<ProxyConfiguration
, [CrawleeOneActorDefWithInput
<T
>]>
Configure the Crawlee proxy.
See ProxyConfiguration
• Optional
router: MaybeAsyncFn
<RouterHandler
<T
["context"
]>, [CrawleeOneActorDefWithInput
<T
>]>
Provide a custom router instance.
By default, router is created as:
import { Router } from 'crawlee';
Router.create(),
See Router
• routes: Record
<T
["labels"
], CrawleeOneRoute
<T
, CrawleeOneActorRouterCtx
<T
>>>
• Optional
telemetry: MaybeAsyncFn
<T
["telemetry"
], [CrawleeOneActorDefWithInput
<T
>]>
Provide a telemetry instance that is used for tracking errors.
• type: "basic"
| "http"
| "cheerio"
| "jsdom"
| "playwright"
| "puppeteer"
Type specifying the Crawlee crawler class, input options, and more.