Re-indexing the archives.json file in Wazuh
Home > Wazuh > Re-indexing the archives.json file in Wazuh
What Is archive.json?
archive.json
contains the **original raw logs** collected from Wazuh agents or syslog.- These logs are stored **before any correlation, alerting, or rule evaluation** is applied.
- This is different from
alerts.json.gz
, which only contains logs that matched Wazuh rules and triggered alerts. - The file is typically generated when JSON-format archiving is enabled in
/var/ossec/etc/ossec.conf
.
Why Does This Matter?
archive.json
is valuable for audit, forensics, and compliance, as it captures logs in their original form.- However, it does not match Wazuh’s alert schema, so it cannot be viewed directly in Wazuh dashboards or
wazuh-alerts-*
indices without additional processing.
Reindexing archive.json into OpenSearch
Can You Do This?
Yes. However, Wazuh dashboards will not recognize this data because the data structure differs from Wazuh alerts. You need to create a separate index for this raw data and define a dedicated index pattern.
What You Can Do with archive.json
Use Case | Supported? | Why or Why Not? |
---|---|---|
Recreate Wazuh alerts | No | Archived logs cannot re-trigger rules or alerts retroactively |
View in OpenSearch / Kibana | Yes | Data can be reindexed into a custom index for visibility |
Show in Wazuh dashboards | No | Wazuh dashboards expect alert-formatted indices only |
Visualize in Discover tab | Yes | Using a custom index and pattern, logs can be searched and visualized |
How to Reindex archive.json into OpenSearch
Step 1: Prepare NDJSON (Bulk Format)
Why this step?
OpenSearch requires a specific **bulk format (NDJSON)** for importing large amounts of JSON data. Each JSON log must be paired with an `index` instruction to tell OpenSearch which index to write to.
Actions
Extract the archived file:
gunzip -c ossec-archive-03.json.gz > archive.json
- This extracts the compressed logs to
archive.json
.
Split large files into manageable chunks (optional but recommended):
mkdir -p chunks split -l 10000 archive.json chunks/raw_chunk_
- This splits
archive.json
into smaller files of 10,000 lines each for safer and more reliable bulk uploads.
Convert chunks into OpenSearch bulk format (NDJSON):
for file in chunks/raw_chunk_*; do echo "Converting $file" jq -c '{ "index": { "_index": "restored-wazuh-archives" } }, .' "$file" | \ sed 'N;s/\n/\ /' > "${file}.ndjson" done
What is happening here?
jq
adds a bulk API action line before each JSON log, instructing OpenSearch to write it torestored-wazuh-archives
.sed
ensures OpenSearch's requirement that **each pair of lines** (index instruction + data) are grouped properly.
Step 2: Upload to OpenSearch
Why this step?
Bulk API is the most efficient method for ingesting many records into OpenSearch. It avoids the overhead of sending one request per log entry.
Actions
OS_URL="https://<INDEXER-IP>:9200" AUTH="USERNAME:PASSWORD" for f in chunks/*.ndjson; do echo "Uploading $f" curl -s -X POST "$OS_URL/_bulk" \ -H "Content-Type: application/x-ndjson" \ -u "$AUTH" \ --data-binary @"$f" \ -k | jq '.errors' done
What is happening here?
curl
sends each.ndjson
file to the OpenSearch bulk API.- The
-k
flag allows connections to untrusted SSL certificates (optional, depending on your setup). jq '.errors'
helps verify whether the bulk operation succeeded.
Step 3: Create Index Pattern in Wazuh Dashboard
Why this step?
To search and visualize the data in Wazuh (or Kibana), you must define an index pattern so the UI knows how to query the data.
Actions
- Go to Wazuh Dashboard → Dashboard Management → Index Patterns
- Click on Create index pattern
- Input:
restored-wazuh-archives*
- Select the appropriate timestamp field, typically:
timestamp
@timestamp
- Or use the ingestion date if none exists
What is happening here?
- This tells OpenSearch Dashboards (Wazuh/Kibana) to recognize your newly ingested data.
- It enables search, filtering, and visualization in the Discover tab.
What Happens After Reindexing?
Once complete, you will be able to:
- Use the Discover tab to search through raw Wazuh logs.
- Apply filters for agent ID, IP addresses, log types, etc.
- Build custom visualizations for:
- Activity over time
- Sources of logs
- Distribution by agent, etc.
Important Limitations
- These logs are for visibility, not alerting. They will not appear in Wazuh’s normal dashboards or rules.
- Wazuh does not reprocess archived logs for alerts after reindexing.
- This is for compliance, audit, or forensic review purposes only.