Crawler API (1.0.0)
Download OpenAPI specification:Download
API to configure and manage the Algolia Crawler.
List available Crawlers.
List available Crawlers.
Authorizations:
query Parameters
itemsPerPage | integer [ 1 .. 100 ] Default: 20 Change the number of items per page. |
page | integer [ 1 .. 100 ] Default: 1 Change the page number. |
name | string Example: name=MyCrawlerName Filter by crawler name. |
appId | string Example: appId=XXXXXXX123 Filter by Application ID. |
Responses
Response samples
- 200
- 400
{- "items": [
- {
- "id": "e0f6db8a-24f5-4092-83a4-1b2c6cb6d809",
- "name": "My Crawler"
}
], - "itemsPerPage": 20,
- "page": 1,
- "total": 100
}
Create a new Crawler with the given config.
Create a new Crawler with the given config.
Authorizations:
Request Body schema: application/json
name required | string (CrawlerName) <= 64 characters The name of the Crawler. |
required | object (Configuration) A Crawler configuration object. See the Crawler documentation to have more details about it. |
Responses
Request samples
- Payload
{- "name": "My Crawler",
- "config": {
- "appId": "ABC9DEFGHI",
- "apiKey": "c69564c68bad256f8d11399bf2048f82",
- "indexPrefix": "crawler_",
- "rateLimit": 8,
- "actions": [
- {
- "indexName": "algolia_website",
- "selectorsToMatch": [
- ".products",
- "!.featured"
], - "fileTypesToMatch": [
- "html",
- "pdf"
], - "recordExtractor": {
- "__type": "function",
- "source": "() => {}"
}
}
]
}
}
Response samples
- 200
- 400
{- "id": "string"
}
Get information about the specified Crawler and its configuration.
Get information about the specified Crawler and its configuration.
Authorizations:
path Parameters
id required | string The Id of the targeted Crawler. |
query Parameters
withConfig | boolean Whether or not the configuration should be returned in the response (in the 'config' field). |
Responses
Response samples
- 200
- 400
{- "name": "My Crawler",
- "createdAt": "2019-05-10T07:58:41.146Z",
- "updatedAt": "2019-05-10T08:16:47.920Z",
- "running": true,
- "reindexing": true,
- "blocked": false,
- "blockingError": "Error: Failed to fetch external data for source 'testCSV': 404",
- "blockingTaskId": "string",
- "lastReindexStartedAt": "2019-05-10T08:16:47.920Z",
- "lastReindexEndedAt": null,
- "config": {
- "appId": "ABC9DEFGHI",
- "apiKey": "c69564c68bad256f8d11399bf2048f82",
- "indexPrefix": "crawler_",
- "rateLimit": 8,
- "actions": [
- {
- "indexName": "algolia_website",
- "selectorsToMatch": [
- ".products",
- "!.featured"
], - "fileTypesToMatch": [
- "html",
- "pdf"
], - "recordExtractor": {
- "__type": "function",
- "source": "() => {}"
}
}
]
}
}
Update parts of the Crawler, either its name, its config, or both.
Update parts of the Crawler, either its name, its config, or both.
Authorizations:
path Parameters
id required | string The Id of the targeted Crawler. |
Request Body schema: application/json
name | string (CrawlerName) <= 64 characters The name of the Crawler. |
object (Configuration) A Crawler configuration object. See the Crawler documentation to have more details about it. |
Responses
Request samples
- Payload
{- "name": "My Crawler",
- "config": {
- "appId": "ABC9DEFGHI",
- "apiKey": "c69564c68bad256f8d11399bf2048f82",
- "indexPrefix": "crawler_",
- "rateLimit": 8,
- "actions": [
- {
- "indexName": "algolia_website",
- "selectorsToMatch": [
- ".products",
- "!.featured"
], - "fileTypesToMatch": [
- "html",
- "pdf"
], - "recordExtractor": {
- "__type": "function",
- "source": "() => {}"
}
}
]
}
}
Response samples
- 200
- 400
{- "taskId": "e0f6db8a-24f5-4092-83a4-1b2c6cb6d809"
}
Update parts of the Crawler configuration.
Update parts of the Crawler configuration.
Authorizations:
path Parameters
id required | string The Id of the targeted Crawler. |
Request Body schema: application/json
A partial config object that will be injected into the current one.
Responses
Request samples
- Payload
{- "rateLimit": 10,
}
Response samples
- 200
- 400
{- "taskId": "e0f6db8a-24f5-4092-83a4-1b2c6cb6d809"
}
Get a summary of the current status of crawled URLs for the specified Crawler.
Get a summary of the current status of crawled URLs for the specified Crawler.
Authorizations:
path Parameters
id required | string The Id of the targeted Crawler. |
Responses
Response samples
- 200
{- "count": 0,
- "data": [
- {
- "status": "SKIPPED",
- "reason": "forbidden_by_robotstxt",
- "category": "fetch",
- "nbUrls": 3,
- "readable": "Forbidden by robots.txt"
}
]
}
Test an URL against the crawler's config.
Test an URL against the given Crawler's config and see what will be processed. You can also override parts of the configuration to try your changes before updating the configuration.
Authorizations:
path Parameters
id required | string The Id of the targeted Crawler. |
Request Body schema: application/json
url required | string The URL to test. |
config | object A partial configuration object, that will be merged with the configuration saved. This allows to tests changes in a configuration before saving it. Note that it's not a deep merge, we will simply override all top level fields with the ones that you will pass. |
Responses
Request samples
- Payload
{- "config": { }
}
Response samples
- 200
- 400
{- "startDate": "2019-05-21T09:04:33.742Z",
- "endDate": "2019-05-21T09:04:33.923Z",
- "logs": [
- [
- "Processing url 'https://www.algolia.com/blog'"
]
], - "records": [
- {
- "indexName": "testIndex",
- "recordsPerExtractor": [
]
}
], - "externalData": {
- "externalData1": {
- "data1": "val1",
- "data2": "val2"
}, - "externalData2": {
- "data1": "val1",
- "data2": "val2"
}
}, - "error": { }
}
Cancel a blocking action on your Crawler.
Cancel a blocking action on your Crawler.
Authorizations:
path Parameters
id required | string The Id of the targeted Crawler. |
tid required | string The Id of the targeted Task. |
Responses
Response samples
- 400
{- "error": {
- "code": "malformed_id"
}
}
List crawler versions.
List crawler config versions.
Authorizations:
path Parameters
id required | string The Id of the targeted Crawler. |
query Parameters
itemsPerPage | integer [ 1 .. 100 ] Default: 20 Change the number of versions per page. |
page | integer [ 1 .. 5000 ] Default: 1 Change the page number. |
Responses
Response samples
- 200
{- "items": [
- {
- "version": 1,
- "createdAt": "string",
- "authorId": "string"
}
], - "itemsPerPage": 20,
- "page": 1,
- "total": 100
}
Get a specific version of the configuration of a crawler.
Get a specific version of the configuration of a crawler.
Authorizations:
path Parameters
id required | string The Id of the targeted Crawler. |
version required | integer The version of the targeted Crawler revision. |
Responses
Response samples
- 200
{- "version": 1,
- "config": {
- "appId": "ABC9DEFGHI",
- "apiKey": "c69564c68bad256f8d11399bf2048f82",
- "indexPrefix": "crawler_",
- "rateLimit": 8,
- "actions": [
- {
- "indexName": "algolia_website",
- "selectorsToMatch": [
- ".products",
- "!.featured"
], - "fileTypesToMatch": [
- "html",
- "pdf"
], - "recordExtractor": {
- "__type": "function",
- "source": "() => {}"
}
}
]
}, - "createdAt": "string",
- "authorId": "string"
}
Immediately crawl some URLs and update the live index.
The passed URLs will be crawled immediately, and the generated records will be pushed to the live index if no reindex is currently running. If a reindex is running, the records will be pushed to the temporary index.
Authorizations:
path Parameters
id required | string The Id of the targeted Crawler. |
Request Body schema: application/json
urls required | Array of strings |
save | boolean If true, the given URLs will be added to the |
Responses
Request samples
- Payload
{- "save": true
}
Response samples
- 200
- 400
{- "taskId": "e0f6db8a-24f5-4092-83a4-1b2c6cb6d809"
}
List registered Domains.
List registered Domains.
Authorizations:
query Parameters
itemsPerPage | integer [ 1 .. 100 ] Default: 20 Change the number of items per page. |
page | integer [ 1 .. 100 ] Default: 1 Change the page number. |
appId | string Example: appId=XXXXXXX123 Filter by Application ID. |
Responses
Response samples
- 200
- 400
- 403
{- "items": [
- {
- "appId": "string",
- "domain": "string",
- "validated": true
}
], - "itemsPerPage": 20,
- "page": 1,
- "total": 100
}