Being a part of mission-critical services and managing our monitoring stacks has given me exposure to various such tools. But this monitoring stack is the hero.

InfluxDB has been the highest rated time-series database since 2016.

So what makes it so popular? The single word is simplicity.

The InfluxDB an opensource time-series database — is used to store metrics for monitoring and alerting purposes. The time series implies we are getting data for each second and we store it in no predefined structure. The latter is the definition of schema-less design.

But what proves my strong claims?

An introduction to setting up metrics in Elasticsearch [Part 1 of 3].

Photo by Bogdan Kupriets on Unsplash

We talked about Elasticsearch going red in a recent post at

But we must act proactively on such possibilities and never have downtime in ES. But, how to read the Elasticsearch metrics to predict the cluster behavior?

Monitoring the ES Cluster Health

To keep Elasticsearch Green we must keep an eye on the following metrics. These metrics are responsible for everything starting from search to indexing.

Number of Nodes and Shards

Regardless of the number of nodes in an cluster — an index must have a fixed…

Take decisions using your big data stored in S3 without running ETL jobs. Save cost by SMARTLY partitioning the data.

Photo by Jan Antonin Kolar on Unsplash

AWS Athena Pricing

Athena is a powerful service built to query the data in S3. It suites to data that is semi or unstructured. You only pay for the amount of data that was scanned during the query execution.

Since we are paying for the amount of data scanned, we can make a difference by — compressing, partitioning, or using smart formats to store the data.

It goes without saying that if we are storing big data in an S3 it must be…

Shards are the heart of Elasticsearch. This blog takes the understanding of shards further to link it with performance.

Photo by Mat Reding on Unsplash

Why Shard Size matter?

The search power of Elasticsearch revolves around the shard size. An index has many shards and a shard is a “search engine” in itself. Whenever a query is hit on an index, it is sent to all the shards inside that index. Further the results are aggregated to show you the query response.

What is a Red Cluster?

A red (or a yellow) cluster implies the cluster health of Elasticsearch. Let’s understand why a red cluster is a big deal and a step-by-step guide to tackling it.

An Elasticsearch index is composed of many shards. A shard is a unit of data that is a little search engine in itself. By default, Elasticsearch has 5 shards and 1 replication factor.

That implies each index is divided into 5 units and each of the 5 units has one replica. In total, we will have 10 shards.

Ansible helps us to setup or configure 100s of machines in an agent-less manner. This guide will help you structure the Ansible Playbook such that it is more intuitive — the team will love to collaborate.

By Hal Gatewood from unsplash

Why a structure is important in Ansible?

Ansible is the one destination for all IT automation. With time, the complexity of a system increases and so does the number of components involved. It becomes important to isolate the components in your automation scripts. These scripts work independently of each other — called roles in Ansible.

The Single Responsibility principle when applied to an Ansible folder structure makes a lot of sense…

The default indexing strategy in Elasticsearch is not the best option for you. Define custom analyzers to tailor your ES indexing strategy.

Why default Indexing strategy might not serve well?

During indexing, Elasticsearch extracts keywords from the raw data and stores them. This processing happens inside an analyzer in Elasticsearch. How we trim our data and create “keywords” — rules how efficient our search will be. Analyzers are not very intuitive in Elasticsearch, most of the time left unnoticed. But, they can make a big difference. ERROR java.lang.RuntimeException. UnsupportedOperationException The operation get is not supported.

Assume we are storing logs in one index and user data in…

Infra-as-code is the backbone for any cloud. But, why should one look beyond CloudFormation Templates?

Why Infrastructure as Code?

Infra-as-code implies we are provisioning infrastructure by writing a code. As in the case of AWS CloudFormation, we input .yaml files and AWS resources are generated.

We are inclined towards using infra as code due to two reasons.

Terraform was impressive.

I have been writing CloudFormation Templates (CFTs) for 4 years. But once I wrote my first Terraform, I never looked back. The first impression of terraform was.

Why use the CloudFormation Template?

One sure way to create an EC2 is via the AWS Console. You can click on “Create Instance” and enter subnets, security groups, and the instance type. Submit. It’s easy.

But, can you do the same steps 20 times over — in order to create a cluster with these instances?

The answer is, No.

Amroj Sandhu

An Engineer based in Delhi, India.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store