Re-indexing the archives.json file in Wazuh

From Notes_Wiki
Revision as of 08:46, 14 July 2025 by Sunilvarma (talk | contribs) (Created page with " Home > Wazuh > Re-indexing the archives.json file in Wazuh = What Is archive.json? = * <code>archive.json</code> contains the **original raw logs** collected from Wazuh agents or syslog. * These logs are stored **before any correlation, alerting, or rule evaluation** is applied. * This is different from <code>alerts.json.gz</code>, which only contains logs that matched Wazuh rules and triggered alerts. * The file is typically generated when JSON...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Home > Wazuh > Re-indexing the archives.json file in Wazuh

What Is archive.json?

  • archive.json contains the **original raw logs** collected from Wazuh agents or syslog.
  • These logs are stored **before any correlation, alerting, or rule evaluation** is applied.
  • This is different from alerts.json.gz, which only contains logs that matched Wazuh rules and triggered alerts.
  • The file is typically generated when JSON-format archiving is enabled in /var/ossec/etc/ossec.conf.

Why Does This Matter?

  • archive.json is valuable for audit, forensics, and compliance, as it captures logs in their original form.
  • However, it does not match Wazuh’s alert schema, so it cannot be viewed directly in Wazuh dashboards or wazuh-alerts-* indices without additional processing.


Reindexing archive.json into OpenSearch

Can You Do This?

Yes. However, Wazuh dashboards will not recognize this data because the data structure differs from Wazuh alerts. You need to create a separate index for this raw data and define a dedicated index pattern.


What You Can Do with archive.json

Use Case Supported? Why or Why Not?
Recreate Wazuh alerts No Archived logs cannot re-trigger rules or alerts retroactively
View in OpenSearch / Kibana Yes Data can be reindexed into a custom index for visibility
Show in Wazuh dashboards No Wazuh dashboards expect alert-formatted indices only
Visualize in Discover tab Yes Using a custom index and pattern, logs can be searched and visualized


How to Reindex archive.json into OpenSearch

Step 1: Prepare NDJSON (Bulk Format)

Why this step?

OpenSearch requires a specific **bulk format (NDJSON)** for importing large amounts of JSON data. Each JSON log must be paired with an `index` instruction to tell OpenSearch which index to write to.

Actions

Extract the archived file:

gunzip -c ossec-archive-03.json.gz > archive.json
  • This extracts the compressed logs to archive.json.

Split large files into manageable chunks (optional but recommended):

mkdir -p chunks
split -l 10000 archive.json chunks/raw_chunk_
  • This splits archive.json into smaller files of 10,000 lines each for safer and more reliable bulk uploads.

Convert chunks into OpenSearch bulk format (NDJSON):

for file in chunks/raw_chunk_*; do
  echo "Converting $file"
  jq -c '{ "index": { "_index": "restored-wazuh-archives" } }, .' "$file" | \
    sed 'N;s/\n/\
/' > "${file}.ndjson"
done

What is happening here?

  • jq adds a bulk API action line before each JSON log, instructing OpenSearch to write it to restored-wazuh-archives.
  • sed ensures OpenSearch's requirement that **each pair of lines** (index instruction + data) are grouped properly.


Step 2: Upload to OpenSearch

Why this step?

Bulk API is the most efficient method for ingesting many records into OpenSearch. It avoids the overhead of sending one request per log entry.

Actions

OS_URL="https://<INDEXER-IP>:9200"
AUTH="USERNAME:PASSWORD"

for f in chunks/*.ndjson; do
  echo "Uploading $f"
  curl -s -X POST "$OS_URL/_bulk" \
    -H "Content-Type: application/x-ndjson" \
    -u "$AUTH" \
    --data-binary @"$f" \
    -k | jq '.errors'
done

What is happening here?

  • curl sends each .ndjson file to the OpenSearch bulk API.
  • The -k flag allows connections to untrusted SSL certificates (optional, depending on your setup).
  • jq '.errors' helps verify whether the bulk operation succeeded.


Step 3: Create Index Pattern in Wazuh Dashboard

Why this step?

To search and visualize the data in Wazuh (or Kibana), you must define an index pattern so the UI knows how to query the data.

Actions

  1. Go to Wazuh Dashboard → Dashboard Management → Index Patterns
  2. Click on Create index pattern
  3. Input:
restored-wazuh-archives*
  1. Select the appropriate timestamp field, typically:
  • timestamp
  • @timestamp
  • Or use the ingestion date if none exists

What is happening here?

  • This tells OpenSearch Dashboards (Wazuh/Kibana) to recognize your newly ingested data.
  • It enables search, filtering, and visualization in the Discover tab.


What Happens After Reindexing?

Once complete, you will be able to:

  • Use the Discover tab to search through raw Wazuh logs.
  • Apply filters for agent ID, IP addresses, log types, etc.
  • Build custom visualizations for:
    • Activity over time
    • Sources of logs
    • Distribution by agent, etc.

Important Limitations

  • These logs are for visibility, not alerting. They will not appear in Wazuh’s normal dashboards or rules.
  • Wazuh does not reprocess archived logs for alerts after reindexing.
  • This is for compliance, audit, or forensic review purposes only.