Tika - Clean Contents
Clean associated content binary when removing a file.

Apache Tika is an independant, open source, content extractor that supports a very wide range of file formats. It can even support OCR for extracting text from images (see the dedicated job for content extraction).
This job is here to clean textual content that is stored on the Cells-side when deleting a file.
Parameters
Name | Type | Default | Mandatory | Description |
---|---|---|---|---|
Extensions | text | pdf,doc,docx,html,xlsx,xls,pptx,key | false | Files extensions to consider. |
Trigger Type
Event-based
JSON Representation
{
"Label": "Tika - Clean Contents||Clean associated content binary when removing a file||mdi mdi-delete-forever",
"Owner": "pydio.system.user",
"Custom": true,
"EventNames": [
"NODE_CHANGE:5"
],
"Actions": [
{
"ID": "actions.script.anko",
"Parameters": {
"script": "output = input\noutput.Nodes[0].Path = \"pydio-binaries/tika-\" + input.Nodes[0].Uuid + \".gz\""
},
"ChainedActions": [
{
"ID": "actions.tree.delete",
"Parameters": {
"ignoreNonExisting": "true"
}
}
]
}
],
"MaxConcurrency": 20,
"NodeEventFilter": {
"Query": {
"SubQueries": [
{
"type_url": "type.googleapis.com/service.Query",
"value": "CiwKHnR5cGUuZ29vZ2xlYXBpcy5jb20vdHJlZS5RdWVyeRIKOgYucHlkaW9wARAB"
},
{
"type_url": "type.googleapis.com/service.Query",
"value": "CiQKHnR5cGUuZ29vZ2xlYXBpcy5jb20vdHJlZS5RdWVyeRICMAEQAQ=="
},
{
"type_url": "type.googleapis.com/tree.Query",
"value": "Uh17ey5Kb2JQYXJhbWV0ZXJzLkV4dGVuc2lvbnN9fQ=="
}
],
"Operation": 1
},
"Label": "Restricted Extensions",
"Description": "Keep only files, excluding .pydio hidden files"
},
"Parameters": [
{
"Name": "Extensions",
"Description": "Files extensions to consider.",
"Value": "pdf,doc,docx,html,xlsx,xls,pptx,key",
"Type": "text"
}
]
}
Back to top