Tech Articles โ (please note these posts are collated from AmigosCode, Alex Xu and many others. Full copyright to the owners of their material)
The diagram above shows 4 typical cases where caches can go wrong and their solutions.
1. Thunder herd problem
This happens when a large number of keys in the cache expire at the same time. Then the query requests directly hit the database, which overloads the database. There are two ways to mitigate this issue: one is to avoid setting the same expiry time for the keys, adding a random number in the configuration; the other is to allow only the core business data to hit the database and prevent non-core data to access the database until the cache is back up.
2. Cache penetration
This happens when the key doesnโt exist in the cache or the database. The application cannot retrieve relevant data from the database to update the cache. This problem creates a lot of pressure on both the cache and the database. To solve this, there are two suggestions. One is to cache a null value for non-existent keys, avoiding hitting the database. The other is to use a bloom filter to check the key existence first, and if the key doesnโt exist, we can avoid hitting the database.
3. Cache breakdown
This is similar to the thunder herd problem. It happens when a hot key expires. A large number of requests hit the database. Since the hot keys take up 80% of the queries, we do not set an expiration time for them.
4. Cache crash
This happens when the cache is down and all the requests go to the database. There are two ways to solve this problem. One is to set up a circuit breaker, and when the cache is down, the application services cannot visit the cache or the database. The other is to set up a cluster for the cache to improve cache availability.
Top 9 Architectural Patterns for Data and Communication Flow
Peer-to-Peer
The Peer-to-Peer pattern involves direct communication between two components without the need for a central coordinator.
API Gateway
An API Gateway acts as a single entry point for all client requests to the backend services of an application.
Pub-Sub
The Pub-Sub pattern decouples the producers of messages (publishers) from the consumers of messages (subscribers) through a message broker.
Request-Response
This is one of the most fundamental integration patterns, where a client sends a request to a server and waits for a response.
Event Sourcing
Event Sourcing involves storing the state changes of an application as a sequence of events.
ETL
ETL is a data integration pattern used to gather data from multiple sources, transform it into a structured format, and load it into a destination database.
Batching
Batching involves accumulating data over a period or until a certain threshold is met before processing it as a single group.
Streaming Processing
Streaming Processing allows for the continuous ingestion, processing, and analysis of data streams in real-time.
Orchestration
Orchestration involves a central coordinator (an orchestrator) managing the interactions between distributed components or services to achieve a workflow or business process.
#SystemIntegrations #APIGateway #PubSub #Batching #ETL #RequestResponse
Things Every Developer Should Know:
Concurrency is ๐๐๐ parallelism.
In system design, it is important to understand the difference between concurrency and parallelism.
As Rob Pyke(one of the creators of GoLang) stated:โ Concurrency is about ๐๐๐๐ฅ๐ข๐ง๐ ๐ฐ๐ข๐ญ๐ก lots of things at once. Parallelism is about ๐๐จ๐ข๐ง๐ lots of things at once.โ This distinction emphasizes that concurrency is more about the ๐๐๐ฌ๐ข๐ ๐ง of a program, while parallelism is about the ๐๐ฑ๐๐๐ฎ๐ญ๐ข๐จ๐ง.
Concurrency is about dealing with multiple things at once. It involves structuring a program to handle multiple tasks simultaneously, where the tasks can start, run, and complete in overlapping time periods, but not necessarily at the same instant.
Concurrency is about the composition of independently executing processes and describes a program's ability to manage multiple tasks by making progress on them without necessarily completing one before it starts another.
Parallelism, on the other hand, refers to the simultaneous execution of multiple computations. It is the technique of running two or more tasks or computations at the same time, utilizing multiple processors or cores within a computer to perform several operations concurrently. Parallelism requires hardware with multiple processing units, and its primary goal is to increase the throughput and computational speed of a system.
In practical terms, concurrency enables a program to remain responsive to input, perform background tasks, and handle multiple operations in a seemingly simultaneous manner, even on a single-core processor. It's particularly useful in I/O-bound and high-latency operations where programs need to wait for external events, such as file, network, or user interactions.
Parallelism, with its ability to perform multiple operations at the same time, is crucial in CPU-bound tasks where computational speed and throughput are the bottlenecks. Applications that require heavy mathematical computations, data analysis, image processing, and real-time processing can significantly benefit from parallel execution.
What is DevSecOps?
DevSecOps emerged as a natural evolution of DevOps practices with a focus on integrating security into the software development and deployment process. The term โDevSecOpsโ represents the convergence of Development (Dev), Security (Sec), and Operations (Ops) practices, emphasizing the importance of security throughout the software development lifecycle. The diagram below shows the important concepts in DevSecOps.
1 . Automated Security Checks
2 . Continuous Monitoring
3 . CI/CD Automation
4 . Infrastructure as Code (IaC)
5 . Container Security
6 . Secret Management
7 . Threat Modeling
8. Quality Assurance (QA) Integration
9 . Collaboration and Communication
10 . Vulnerability Management