Recoll-Local-Search-Engine
Setting Up Recoll as a Local Full-Text Search Engine
Last updated: 2026-03-07 | Environment: Internal server | Access: [[1]]
| Property | Value |
|---|---|
| Recoll version | 1.43.13 + Xapian 1.4.22 |
| Server | [your-internal-server] |
| Search URL | [2] |
| Companion app URL | [3] (unchanged) |
| Files indexed | /var/projects/data (same folder as companion app) |
| Index database | /home/file-search/recoll_index |
| WebUI install path | /opt/recoll-webui |
| Config file | /root/.recoll/recoll.conf |
| SSL cert | [your-ssl-cert-path] |
| SSL key | [your-ssl-key-path] |
| Systemd service | /etc/systemd/system/recoll-webui.service |
| Nginx config | /etc/nginx/sites-available/your-app (appended) |
| Cron job | /etc/cron.d/recoll-index |
| Index log | /var/log/recoll-index.log |
If you've ever needed fast, reliable keyword search over a large collection of local files — PDFs, Word docs, spreadsheets, the works — Recoll is an excellent tool for the job. This guide walks through how we deployed it on an internal server alongside an existing application, giving us a clean search UI without disrupting anything that was already running.
Step 1 — Install Recoll
The version in Ubuntu's default repos tends to lag behind, so we add the official Recoll PPA to grab the latest release. We also install a handful of format helpers that let Recoll extract text from PDFs, DOCX files, RTF, and more:
# Add PPA for latest Recoll version sudo add-apt-repository ppa:recoll-backports/recoll-1.15-on sudo apt update # Install Recoll command-line (no GUI needed on server) sudo apt install recoll python3-recoll # Install format helpers for PDF, DOC, RTF etc. sudo apt install poppler-utils antiword unrtf python3-mutagen \ libimage-exiftool-perl catdoc python3-docx # Verify installation recollindex --version # Expected: Recoll 1.43.13 ...
Next, install the WebUI. One important note here — there are two repositories floating around. Use the framagit one:
# Install WebUI dependencies sudo apt install git python3-waitress # Clone the maintained fork (NOT the outdated koniu GitHub repo) git clone https://framagit.org/medoc92/recollwebui.git /opt/recoll-webui
Step 2 — Configure to Use the Same Data Folder as Your Companion App
In our setup, Recoll indexes the same folder already used by another application. This keeps things simple — one source of truth, two ways to search it. Edit /root/.recoll/recoll.conf:
# /root/.recoll/recoll.conf # ── INDEXING PATHS ────────────────────────────────────── # Same folder that the companion app ingests from topdirs = /var/projects/data # Separate index DB --- does not interfere with the companion app's index dbdir = /home/file-search/recoll_index # Skip hidden/temp/junk files skippedNames = .* *.tmp *.log *.bak ~* thumbs.db skippedPaths = /var/projects/data/.trash /var/projects/data/thumbnails # ── PERFORMANCE ───────────────────────────────────────── maxfsoccuppc = 0 # no disk % limit filtermaxmbytes = 100 # skip files over 100 MB idxflushmb = 40 # flush to disk every 40 MB nthreads = 8 # parallel indexing threads # ── SEARCH QUALITY ────────────────────────────────────── # Store case and diacritics for precise searches indexStripChars = 0 # Terms in 80%+ of documents are auto-suppressed in phrase queries. # This is Recoll's built-in stop word equivalent. # It works WITHOUT breaking phrase searches like 'installation of nessus' # (unlike traditional noindex stop word lists which would break this). commontermspercent = 80 # Stemming: 'installing' also matches 'install', 'installation' etc. indexstemminglanguages = english
Create the index directory and run the first full index:
mkdir -p /home/file-search/recoll_index # First full index --- this takes time depending on file count HOME=/root recollindex -z # Validate config paths before running HOME=/root recollindex -E
Step 3 — Run WebUI as a Systemd Service
The WebUI binds to localhost only. Nginx handles external access and SSL termination — more on that in the next step. Create /etc/systemd/system/recoll-webui.service:
[Unit] Description=Recoll WebUI - Keyword Search After=network.target [Service] Type=simple User=root WorkingDirectory=/opt/recoll-webui ExecStart=/usr/bin/python3 /opt/recoll-webui/webui-standalone.py -a 127.0.0.1 -p 8180 Restart=on-failure RestartSec=5 Environment=HOME=/root [Install] WantedBy=multi-user.target
systemctl daemon-reload systemctl enable recoll-webui systemctl start recoll-webui systemctl status recoll-webui # Should show: Active: active (running)
Step 4 — Configure Nginx for HTTPS on Port 8443
Port 8443 was previously used by a different app in our Nginx config. That server block was completely replaced with the Recoll block. The port 443 block for the companion app was left entirely untouched.
The Recoll block reuses the existing SSL certificate — no new cert needed. Add or replace in your Nginx sites config:
# ── PORT 443: Companion App (UNCHANGED) ─────────────────
server {
listen 443 ssl;
server_name your-server.local;
ssl_certificate [your-ssl-cert-path];
ssl_certificate_key [your-ssl-key-path];
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
location / {
proxy_pass http://127.0.0.1:8001;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto https;
}
}
# ── PORT 8443: Recoll keyword search (REPLACES previous app) ─
server {
listen 8443 ssl;
server_name your-server.local;
ssl_certificate [your-ssl-cert-path];
ssl_certificate_key [your-ssl-key-path];
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
# Static files served directly (faster, reduces Python load)
location /static/ {
alias /opt/recoll-webui/static/;
expires 1d;
}
# Proxy everything else to recoll-webui on localhost
location / {
proxy_pass http://127.0.0.1:8180;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 120s;
}
}
# Test nginx config --- must show 'syntax is ok' before reloading nginx -t # Reload nginx (zero downtime --- companion app stays up) systemctl reload nginx # Verify port is listening ss -tlnp | grep 8443
Step 5 — WebUI Customisations
Out of the box, Recoll's WebUI is functional but rough around the edges for our setup. We made three targeted customisations.
5a — webui.py: Path Translation
The default WebUI shows raw index paths like file:///var/projects/data/... in results. We added a helper function to strip the server prefix and display clean, user-friendly relative paths instead.
Added to webui.py just before the routes section:
#{{{ cloud path helper
# Strip server prefix and prefix with a friendly label
# _SERVER_PREFIX must match the 'topdirs' root in recoll.conf
_SERVER_PREFIX = '/var/projects/data'
_LABEL = 'SharedFiles'
def _friendly_path(url):
path = url[7:] if url.startswith('file://') else url
if path.startswith(_SERVER_PREFIX):
path = path[len(_SERVER_PREFIX):]
return _LABEL + '/' + path.lstrip('/')
#}}}
And in the recoll_search() function, after the d['time'] line:
d['friendly_path'] = _friendly_path(d['url'])
5b — views/result.tpl: UI Changes
The default template shows Open/Download/Preview action buttons (not useful in our setup) and no easy path copy. Here's what we changed:
- Removed Open, Download and Preview action links entirely
- Added a full-width path bar below each result title showing the clean friendly path
- Added a Copy button using
document.execCommand('copy')— this works on plain HTTP, unlikenavigator.clipboardwhich requires HTTPS - Copy logic lives in
static/extra.jsvia a jQuery delegated event listener — this avoids inline onclick escaping issues with template-interpolated paths
5c — static/extra.js: Copy Button Handler
The Copy button handler uses jQuery's delegated event listener pattern, which avoids the inline onclick breakage that occurs when paths containing slashes, dots, or special characters get interpolated into HTML attributes by the template engine:
$(document).on('click', '.ocp-copy-btn', function() {
var btn = $(this);
var path = btn.prev('.ocp-path-bar').find('.ocp-path-text').text().trim();
var ta = document.createElement('textarea');
ta.value = path;
ta.style.position = 'fixed'; ta.style.opacity = '0';
document.body.appendChild(ta);
ta.focus(); ta.select();
document.execCommand('copy');
document.body.removeChild(ta);
btn.text('✓ Copied!');
btn.css({'background':'#1a6640','color':'#7feba0','border-color':'#2d8a5a'});
setTimeout(function() {
btn.text('Copy');
btn.css({'background':'#2a2a4a','color':'#888','border-color':'#3c3c5c'});
}, 2000);
});
Step 6 — Cron Job for Automatic Indexing
Create /etc/cron.d/recoll-index to keep the index up to date automatically. The -z flag on the nightly job forces a full incremental scan — visiting every file but only re-indexing changed ones. Without -z, newly deleted files can linger in the index.
# /etc/cron.d/recoll-index # Recoll automatic indexing schedule # Full incremental index at 2:00 AM daily # -z: reset index before starting (catches deleted files too) 0 2 * * * root HOME=/root /usr/bin/recollindex -z >> /var/log/recoll-index.log 2>&1 # Quick incremental at 8am, 2pm, 8pm --- picks up files changed during working hours # No -z: only processes new/changed files, very fast 0 8,14,20 * * * root HOME=/root /usr/bin/recollindex >> /var/log/recoll-index.log 2>&1 # Trim log file every Sunday at 3am (keep last 1000 lines) 0 3 * * 0 root tail -1000 /var/log/recoll-index.log > /var/log/recoll-index.log.tmp \ && mv /var/log/recoll-index.log.tmp /var/log/recoll-index.log
# Deploy cron job chmod 644 /etc/cron.d/recoll-index # Test a manual run to confirm it works HOME=/root recollindex >> /var/log/recoll-index.log 2>&1 tail -20 /var/log/recoll-index.log
Quick Reference
| Task | Command |
|---|---|
| Force full re-index | HOME=/root recollindex -z
|
| Quick incremental update | HOME=/root recollindex
|
| Validate config paths | HOME=/root recollindex -E
|
| View index log | tail -f /var/log/recoll-index.log
|
| Restart WebUI | systemctl restart recoll-webui
|
| Check WebUI status | systemctl status recoll-webui
|
| WebUI logs | journalctl -u recoll-webui -f
|
| Test nginx config | nginx -t
|
| Reload nginx | systemctl reload nginx
|
Troubleshooting
| Symptom | Cause & Fix |
|---|---|
| 500 NameError: name 'r' is not defined | Template uses 'r' but should use 'd'. Check views/result.tpl — you're likely using the wrong WebUI fork.
|
| Path shows full server path | _SERVER_PREFIX in webui.py doesn't match topdirs in recoll.conf
|
| Copy button does nothing | Check extra.js has the .ocp-copy-btn handler. Check browser console for JS errors.
|
| Path widget not showing | Widget is inside %if len(d['ipath']) > 0: block — move it outside.
|
| "conflicting server name" nginx warning | Two server blocks using the same port. Remove the old app block. |
| Search returns no results after config change | Re-index with HOME=/root recollindex -z (new stop word settings require a full re-index)
|
| Files not found in index | Check topdirs in recoll.conf. Run recollindex -E to validate paths.
|
| nginx 502 Bad Gateway | recoll-webui service not running. Check: systemctl status recoll-webui
|