Splunk gathers logs by monitoring files, detecting file changes, listening on ports or running scripts to collect log data – all of these are carried out by the Splunk forwarder.The following diagram illustrates the Splunk architecture as a whole. Putting it All Together: Splunk Architecture Search heads as part of an indexer cluster – promotes data availability and data recovery.Multiple search heads in a search head cluster – with all search heads sharing the same configuration and jobs.One or more independent search heads to search across indexers (each can be used for a different type of data).There are a few common topologies for distributed search in Splunk: The indexers perform the search locally and return results to the search head, which merges the results and returns them to the user. In a distributed search scenario, the search head sends search requests to a group of indexers, also called search peers. Splunk provides a distributed search architecture, which allows you to scale up to handle large data volumes, and better handle access control and geo-dispersed data. It allows users to search and query Splunk data, and interfaces with indexers to gain access to the specific data they request. The search head provides the UI users can use to interact with Splunk. In Splunk Enterprise, you can set up a cluster of indexers with replication between them, to avoid data loss and provide more system resources and storage space to handle large data volumes. The indexer performs generic event processing on log data, such as applying timestamp and adding source, and can also execute user-defined transformation actions to extract specific information or apply special rules, such as filtering unwanted events. Indexes pointing to raw data (.TSIDX files).The indexer creates the following files, separating them into directories called buckets: The indexer transforms data into events (unless it was received pre-processed from a heavy forwarder), stores it to disk and adds it to an index, enabling searchability. Heavy Forwarder – performs parsing and indexing at the source, on the host machin,e and sends only the parsed events to the indexer.This is faster, and requires less resources on the host, but results in huge quantities of data sent to the indexer. Universal Forwarder – forwards the raw data without any prior treatment.The forwarder is an agent you deploy on IT systems, which collects logs and sends them to the indexer. The primary components in the Splunk architecture are the forwarder, the indexer, and the search head. Supports single site clustering and multi-site clustering for disaster recovery Your selection of a splunk edition will affect your architecture. Splunk Cloud – provided as a service with subscription pricing.Splunk Enterprise vs Splunk Light: How Does it Affect Your Architecture? Based on the user’s reporting needs, it creates objects like reports, dashboards and alerts. Data Search – at this stage Splunk enables users to query, view and use the event data.Finally, Splunk writes the parsed events to disk, pointing to them from an index file which enables fast search across huge data volumes. It then transforms event data using transformation rules defined by the operator. Data Storage – Splunk parses log data, by breaking it into lines, identifying timestamps, creating individual events and annotating them with metadata keys.Data Input – Splunk ingests the raw data stream from the source, breaks it into 64K blocks, and adds metadata keys, including hostname, source, character encoding, and the index the data should be stored in.Splunk is a distributed system that ingests, processes and indexes log data. How Splunk Works: Stages in the Data Pipeline Putting it all together: the Splunk architecture.In this article we’ll help you understand how the Splunk big data pipeline works, how components like the forwarder, indexer and search head interact, and the different topologies you can use to scale your Splunk deployment. Splunk is a distributed system that aggregates, parses and analyses log data.