An introduction to setting up metrics in Elasticsearch [Part 1 of 3].
We talked about Elasticsearch going red in a recent post at https://medium.com/thecloudbee.
But we must act proactively on such possibilities and never have downtime in ES. But, how to read the Elasticsearch metrics to predict the cluster behavior?
To keep Elasticsearch Green we must keep an eye on the following metrics. These metrics are responsible for everything starting from search to indexing.
Regardless of the number of nodes in an cluster — an index must have a fixed…
Take decisions using your big data stored in S3 without running ETL jobs. Save cost by SMARTLY partitioning the data.
Athena is a powerful service built to query the data in S3. It suites to data that is semi or unstructured. You only pay for the amount of data that was scanned during the query execution.
Since we are paying for the amount of data scanned, we can make a difference by — compressing, partitioning, or using smart formats to store the data.
It goes without saying that if we are storing big data in an S3 it must be…
Shards are the heart of Elasticsearch. This blog takes the understanding of shards further to link it with performance.
The search power of Elasticsearch revolves around the shard size. An index has many shards and a shard is a “search engine” in itself. Whenever a query is hit on an index, it is sent to all the shards inside that index. Further the results are aggregated to show you the query response.
A red (or a yellow) cluster implies the cluster health of Elasticsearch. Let’s understand why a red cluster is a big deal and a step-by-step guide to tackling it.
An Elasticsearch index is composed of many shards. A shard is a unit of data that is a little search engine in itself. By default, Elasticsearch has 5 shards and 1 replication factor.
That implies each index is divided into 5 units and each of the 5 units has one replica. In total, we will have 10 shards.
Ansible helps us to setup or configure 100s of machines in an agent-less manner. This guide will help you structure the Ansible Playbook such that it is more intuitive — the team will love to collaborate.
Ansible is the one destination for all IT automation. With time, the complexity of a system increases and so does the number of components involved. It becomes important to isolate the components in your automation scripts. These scripts work independently of each other — called roles in Ansible.
The Single Responsibility principle when applied to an Ansible folder structure makes a lot of sense…
The default indexing strategy in Elasticsearch is not the best option for you. Define custom analyzers to tailor your ES indexing strategy.
During indexing, Elasticsearch extracts keywords from the raw data and stores them. This processing happens inside an analyzer in Elasticsearch. How we trim our data and create “keywords” — rules how efficient our search will be. Analyzers are not very intuitive in Elasticsearch, most of the time left unnoticed. But, they can make a big difference.
com.cloud.bee ERROR java.lang.RuntimeException. UnsupportedOperationException The operation get is not supported.
Assume we are storing logs in one index and user data in…
Infra-as-code is the backbone for any cloud. But, why should one look beyond CloudFormation Templates?
Infra-as-code implies we are provisioning infrastructure by writing a code. As in the case of AWS CloudFormation, we input
.yaml files and AWS resources are generated.
We are inclined towards using infra as code due to two reasons.
I have been writing CloudFormation Templates (CFTs) for 4 years. But once I wrote my first Terraform, I never looked back. The first impression of terraform was.
One sure way to create an EC2 is via the AWS Console. You can click on “Create Instance” and enter subnets, security groups, and the instance type. Submit. It’s easy.
But, can you do the same steps 20 times over — in order to create a cluster with these instances?
The answer is, No.
An Engineer based in Delhi, India.