christova

How Do C++, Java and Python Work?

December 10, 2024

Message Queue : Evolution

December 5, 2024

#MessageQueues #EvolutionOfMessageQueues #kafka #RabbitMQ #middleware

IBM MQ

IBM MQ was launched in 1993. It was originally called MQSeries and was renamed WebSphere MQ in 2002. It was renamed to IBM MQ in 2014. IBM MQ is a very successful product widely used in the financial sector. Its revenue still reached 1 billion dollars in 2020.

RabbitMQ

RabbitMQ architecture differs from IBM MQ and is more similar to Kafka concepts. The producer publishes a message to an exchange with a specified exchange type. It can be direct, topic, or fanout. The exchange then routes the message into the queues based on different message attributes and the exchange type. The consumers pick up the message accordingly.

Kafka

In early 2011, LinkedIn open sourced Kafka, which is a distributed event streaming platform. It was named after Franz Kafka. As the name suggested, Kafka is optimized for writing. It offers a high-throughput, low-latency platform for handling real-time data feeds. It provides a unified event log to enable event streaming and is widely used in internet companies.

Kafka defines producer, broker, topic, partition, and consumer. Its simplicity and fault tolerance allow it to replace previous products like AMQP-based message queues.

Pulsar

Pulsar, developed originally by Yahoo, is an all-in-one messaging and streaming platform. Compared with Kafka, Pulsar incorporates many useful features from other products and supports a wide range of capabilities. Also, Pulsar architecture is more cloud-native, providing better support for cluster scaling and partition migration, etc.

There are two layers in Pulsar architecture: the serving layer and the persistent layer. Pulsar natively supports tiered storage, where we can leverage cheaper object storage like AWS S3 to persist messages for a longer term.

ACID

November 29, 2024

#ACID #database

HTTP 1 / 2 / 3

November 28, 2024

#HTTP #HTTPS #HTTP1 #HTTP2 #HTTP3

How Caches Can Go Wrong

November 28, 2024

The diagram above shows 4 typical cases where caches can go wrong and their solutions.

1. Thunder herd problem

This happens when a large number of keys in the cache expire at the same time. Then the query requests directly hit the database, which overloads the database. There are two ways to mitigate this issue: one is to avoid setting the same expiry time for the keys, adding a random number in the configuration; the other is to allow only the core business data to hit the database and prevent non-core data to access the database until the cache is back up.

2. Cache penetration

This happens when the key doesn’t exist in the cache or the database. The application cannot retrieve relevant data from the database to update the cache. This problem creates a lot of pressure on both the cache and the database. To solve this, there are two suggestions. One is to cache a null value for non-existent keys, avoiding hitting the database. The other is to use a bloom filter to check the key existence first, and if the key doesn’t exist, we can avoid hitting the database.

3. Cache breakdown

This is similar to the thunder herd problem. It happens when a hot key expires. A large number of requests hit the database. Since the hot keys take up 80% of the queries, we do not set an expiration time for them.

4. Cache crash

This happens when the cache is down and all the requests go to the database. There are two ways to solve this problem. One is to set up a circuit breaker, and when the cache is down, the application services cannot visit the cache or the database. The other is to set up a cluster for the cache to improve cache availability.

#cache #caching #CachingProblems

AWS Networking

November 27, 2024

#AWS #AWSNetworking #networking #cloud

Docker CheatSheet

November 26, 2024

#docker #DockerCommands #DockerCheatSheet

REST API Design

November 26, 2024

#REST #API

Comparison of Cloud Services

November 25, 2024

#cloud #CloudServices #AWS #azure #GoogleCloud #GCP #oracle

Top 9 System Integrations

November 25, 2024

Top 9 Architectural Patterns for Data and Communication Flow

Peer-to-Peer

The Peer-to-Peer pattern involves direct communication between two components without the need for a central coordinator.

API Gateway

An API Gateway acts as a single entry point for all client requests to the backend services of an application.

Pub-Sub

The Pub-Sub pattern decouples the producers of messages (publishers) from the consumers of messages (subscribers) through a message broker.

Request-Response

This is one of the most fundamental integration patterns, where a client sends a request to a server and waits for a response.

Event Sourcing

Event Sourcing involves storing the state changes of an application as a sequence of events.

ETL

ETL is a data integration pattern used to gather data from multiple sources, transform it into a structured format, and load it into a destination database.

Batching

Batching involves accumulating data over a period or until a certain threshold is met before processing it as a single group.

Streaming Processing

Streaming Processing allows for the continuous ingestion, processing, and analysis of data streams in real-time.

Orchestration

Orchestration involves a central coordinator (an orchestrator) managing the interactions between distributed components or services to achieve a workflow or business process.

#SystemIntegrations #APIGateway #PubSub #Batching #ETL #RequestResponse