christova

Tech Articles – (please note these posts are collated from AmigosCode, Alex Xu and many others. Full copyright to the owners of their material)

How do live streaming platforms like YouTube Live, TikTok Live, or Twitch work? Live streaming is challenging because the video content is sent over the internet in near real-time. Video processing is compute-intensive. Sending a large volume of video content over the internet takes time. These factors make live streaming challenging. The diagram below explains what happens behind the scenes to make this possible.

Step 1: The streamer starts their stream. The source could be any video and audio source wired up to an encoder

Step 2: To provide the best upload condition for the streamer, most live streaming platforms provide point-of-presence servers worldwide. The streamer connects to a point-of-presence server closest to them.

Step 3: The incoming video stream is transcoded to different resolutions, and divided into smaller video segments a few seconds in length.

Step 4: The video segments are packaged into different live streaming formats that video players can understand. The most common live-streaming format is HLS, or HTTP Live Streaming.

Step 5: The resulting HLS manifest and video chunks from the packaging step are cached by the CDN.

Step 6: Finally, the video starts to arrive at the viewer’s video player.

Step 7-8: To support replay, videos can be optionally stored in storage such as Amazon S3.

#streaming

Microservices architecture has become popular for building complex, scalable software systems.

This architectural style structures an application as a collection of loosely coupled, independently deployable services. Each microservice is focused on a specific business capability and can be developed, deployed, and scaled independently.

While microservices offer numerous benefits, such as improved scalability, flexibility, and faster time to market, they also introduce significant challenges in terms of data management.

One of the fundamental principles of microservices architecture is that each service should own and manage its data. This principle is often expressed as “don't share databases between services” and it aims to ensure loose coupling and autonomy among services, allowing them to evolve independently.

However, it's crucial to distinguish between sharing a data source and sharing data itself. While sharing a data source (e.g., a database) between services is discouraged, sharing data between services is often necessary and acceptable.

In this post, we’ll look at different ways of sharing data between microservices and the various advantages and disadvantages of specific approaches.

Sharing Data Source vs Sharing Data

As mentioned earlier, it's crucial to understand the difference between sharing a data source and sharing data itself when working with microservices.

This distinction has significant implications for service coupling, independence, and overall system design.

Sharing a data source means multiple services directly access the same database, potentially leading to tight coupling and dependencies. It violates the principle of service data ownership in microservices.

On the other hand, sharing data involves services exchanging data through well-defined APIs or messaging patterns, maintaining their local copies of the data they need. This helps preserve service independence and loose coupling.

Let's consider an example scenario involving an Order service and a Product service to illustrate the data-sharing dilemma.

  • Order Service:
    • Manages order information.
    • Stores order details in its database.
  • Product Service:
    • Handles product information.
    • Maintains a separate database for product details.
  • Relationship:
    • Each order is associated with a set of products

The challenge arises when the Order service needs to display product information alongside order details.

In a monolithic architecture, this would be straightforward since all the data resides in a single database. However, in a microservices architecture, the Order service should not directly access the Product service database. Instead, it should fetch the data via the Product Service.

This scenario highlights the need for effective data-sharing strategies in microservices:

  • Data Independence: Each service owns its data, preventing direct database access between services.
  • Data Retrieval: The Order service must find a way to retrieve product information without violating service boundaries.
  • Performance Considerations: The chosen data-sharing approach should not significantly impact system performance or introduce excessive latency.
  • Consistency: Ensuring data consistency between services becomes crucial when sharing data across service boundaries.

Let us now look at different ways of sharing data.

Synchronous Data Sharing in Microservices

Synchronous data sharing in microservices involves real-time communication between services, where the requesting service waits for a response before proceeding. This approach ensures immediate consistency but introduces challenges in terms of scalability and performance.

Request/Response Model

The request/response model is the most straightforward synchronous approach:

  • Service A sends a request to Service B for data.
  • Service B processes the request and sends back a response.
  • Service A waits for the response before continuing its operation.

For example, whenever the Order Service is called to provide order details (which includes product information), it has to fetch product data by calling the Product Service.

As expected, the synchronous data-sharing approaches face several challenges:

  • Increased Response Time: Each synchronous call adds to the overall response time of the system.
  • Cascading Failures: If one service is slow or down, it can affect the entire chain of dependent services.
  • Resource Utilization: Services may need to keep connections open while waiting for responses, potentially leading to resource exhaustion.
  • Network Congestion: As the number of inter-service calls increases, network traffic can become a bottleneck.

Due to these challenges, services can share data using duplication. For example, storing the product name with order details within the order table. This way the Order Service does not have to contact the Product Service to fetch the product’s name.

However, this creates consistency issues during data updates. For example, the product name is now duplicated across two different services. If the product name gets changed, the updates have to be done in both the product table and the order table to maintain consistency.

Note that this is just a simple example for explanation purposes. Even more critical consistency requirements can exist in a typical application.

The Gateway Approach

The Gateway approach is a simple approach to ensure data consistency while sharing data across multiple services.

Here’s an example scenario of how it can work:

  • A coordinator that acts like a gateway initiates the update process.
  • The coordinator sends update requests to all participating services.
  • Each service performs the update and sends back a success or failure response.
  • If all services succeed, the transaction is considered complete. If any service fails, the coordinator must initiate a rollback on all services.

See the diagram below where a possible change in the product name requires updates in the order service where a copy of the product name is shared. The coordinator ensures that both the updates are successful before informing the user.

Some advantages of the gateway approach are as follows:

  • Simpler to implement and reason about
  • Maintains consistency

However, there are also disadvantages:

  • Less reliable in failure scenarios.
  • Poor experience for the users in case one of the services fails.
  • Difficult to handle partial failures.

As evident, while synchronous approaches offer strong consistency and simplicity, they come with significant trade-offs in terms of scalability, performance, and resilience.

This is also the reason why asynchronous approaches are usually more popular.

Asynchronous Data Sharing in Microservices

Asynchronous data sharing in microservices architectures enables services to exchange data without waiting for immediate responses. This approach promotes service independence and enhances overall system scalability and resilience.

Let's explore the key components and concepts of asynchronous data sharing.

Event-Driven Architecture

In an event-driven architecture, services communicate through events:

  • When a service performs an action or experiences a state change, it publishes an event to a message broker.
  • Other services interested in that event can subscribe to it and react accordingly.
  • This loosely coupled approach allows services to evolve independently.

The example below describes such an event-driven approach

  • The Order Service publishes an “OrderCreated” event when a new order is placed.
  • The Inventory Service subscribes to the “OrderCreated” event and updates stock levels.
  • The Shipping Service also subscribes to the event and initiates the shipping process.

### Message Queues

Message queues, such as RabbitMQ, serve as the backbone of asynchronous communication in microservices: * They provide a reliable and scalable way to exchange messages between services. * Services can publish messages to specific topics or queues. * Other services can consume those messages at their own pace.

### Eventual Consistency

Asynchronous data sharing often leads to eventual consistency.

Services maintain their local copies of data and update them based on received events. It means that data across services may be temporarily inconsistent but will eventually reach a consistent state.

This approach allows services to operate independently and improves performance by reducing synchronous communication.

Here’s an example of the impact of eventual consistency on the earlier example about data sharing between Product and Order Service: * The Product Service maintains a local copy of the product data. * When a product’s details (such as name) are updated, the Product Service publishes a “ProductUpdated” event. * The Order Service consume the event and updates their local copies of the product data asynchronously. * There may be a short period where product data is inconsistent across services, but it will eventually become consistent.

See the diagram below:

The advantages of asynchronous data sharing are as follows: * Loose Coupling: Services can evolve independently without tight dependencies on each other. * Scalability: Services can process messages at their own pace, allowing for better scalability and resource utilization. * Resilience: If a service is temporarily unavailable, messages can be buffered in the queue and processed later, improving system resilience. * Improved Performance: Asynchronous communication reduces the need for synchronous requests, resulting in faster response times and improved overall performance.

However, there are also some disadvantages: * Eventual Consistency: Data may be temporarily inconsistent across services, which can be challenging to handle in certain scenarios. * Increased Complexity: Implementing event-driven architectures and handling message queues adds complexity to the system. * Message Ordering: Ensuring the correct order of message processing can be challenging, especially in scenarios where message ordering is critical. * Error Handling: Dealing with failures and errors in asynchronous communication requires careful design and implementation of retry mechanisms and compensating actions.

## Hybrid Approach

A hybrid approach to data sharing in microservices combines elements of both synchronous and asynchronous methods. This approach aims to balance the benefits of local data availability with the need for up-to-date and detailed information from other services.

Let's consider a ride-sharing application with two microservices: Driver and Ride. * The Ride service stores basic driver information (ID, name) locally. * For detailed information (e.g., driver rating, car details), the Ride service makes a synchronous call to the Driver service.

This hybrid approach allows the Ride service to have quick access to essential driver data while still retrieving more comprehensive information when needed.

See the diagram below:

The hybrid approach offers a couple of benefits such as: * Reduces Redundant Calls: By storing basic information locally, the Ride service can reduce the frequency of service-to-service calls. This improves performance and minimizes the load on the network. * Smaller Database Footprint: Only essential data is duplicated across services, resulting in a smaller database footprint. This saves storage space and reduces the overhead of managing redundant data.

However, it also has some drawbacks: * Still Affects Response Time: Although basic information is available locally, retrieving detailed information through synchronous calls to the Driver service still impacts the overall response time of the Ride service. * Some Data Duplication: Even though only essential data is duplicated, there is still some level of data duplication across services. This requires careful management to ensure data consistency and avoid discrepancies.

​​When implementing a hybrid approach, consider the following: * Data Consistency: Ensure that the duplicated data remains consistent across services. Implement mechanisms to update the local copy of data when changes occur in the source service. * Caching Strategies: Employ caching strategies to store frequently accessed detailed information locally, reducing the need for repetitive synchronous calls to other services. * Asynchronous Updates: Consider using asynchronous methods, such as event-driven architecture or message queues, to propagate updates to the duplicated data asynchronously, minimizing the impact on response times.

## Trade-offs and Decision Making

When designing a microservices architecture that involves data sharing, it's crucial to carefully evaluate the trade-offs between consistency, performance, and scalability.

This decision-making process requires a deep understanding of your system's requirements and constraints.

### 1 – Evaluating Consistency Requirements

One of the key factors in choosing an appropriate solution is the consistency requirement.

#### Strong Consistency

Strong consistency ensures that all services have the most up-to-date data at all times. It is suitable for systems where data accuracy is critical, such as financial transactions or medical records.

The main characteristics are as follows: * Often implemented using synchronous communication patterns like the gateway pattern or request-response approach. * Ensures immediate data consistency across all services. * Drawbacks include higher latency and reduced availability.

#### Eventual Consistency

Eventual consistency allows for temporary data inconsistencies but guarantees that data will become consistent over time. It is appropriate for systems that can tolerate temporary inconsistencies, such as social media posts or product reviews.

The primary characteristics of a system based on eventual consistency are as follows: * Implemented using asynchronous communication patterns like event-driven architecture. * Allows for faster response times and higher availability. * Requires careful design to handle conflict resolution and compensating transactions.

### 2 – Performance Considerations

The key performance-related considerations are as follows: * Latency * Strong consistency often leads to higher latency due to synchronous communication. * Eventual consistency can provide lower latency but may serve stale data. * Throughput * Eventual consistency generally allows for higher throughput as services can operate more independently. * Strong consistency may limit throughput due to the need for coordination between services. * Resource Utilization * Strong consistency may require more computational and network resources to maintain data integrity. * Eventual consistency can be more efficient in terms of resource usage but requires additional storage for local data copies.

### 3 – Scalability

The data-sharing solution has a significant impact on the application's scalability. Some of the key aspects are as follows: * Horizontal Scaling: Eventual consistency models often scale better horizontally as services can be added without significantly impacting overall system performance. Strong consistency models may face challenges in horizontal scaling due to increased coordination overhead. * Data Volume: As data volume grows, maintaining strong consistency across all services becomes challenging. Eventual consistency can handle large data volumes more gracefully but requires careful management of data synchronization. * Service Independency: Eventual consistency allows services to evolve more independently, facilitating easier scaling. Strong consistency models create tighter coupling between services.

## Summary

In this article, we’ve taken a detailed look at the important topic of sharing data between microservices. This is a critical requirement when it comes to building robust and high-performance microservices.

Let’s summarize the key learnings: * The principle of each service owning its data aims to ensure loose coupling and autonomy among services, allowing them to evolve independently. * However, it's crucial to distinguish between sharing a data source and sharing data itself. * Synchronous data sharing in microservices involves real-time communication between services, where the requesting service waits for a response before proceeding. * Some approaches for synchronous data sharing are the request/response model and the gateway approach * Asynchronous data sharing in microservices architectures enables services to exchange data without waiting for immediate responses. * The key components of the asynchronous data-sharing approach are event-driven architecture, message queues, and the concept of eventual consistency. * A hybrid approach to data sharing in microservices combines elements of both synchronous and asynchronous methods. This approach aims to balance the benefits of local data availability with the need for up-to-date and detailed information from other services. * Multiple trade-offs need to be taken into consideration while choosing the appropriate data-sharing approach. Some key factors are consistency requirements, performance needs, and the desired scalability of the application.

#MicroServices #data #DataSharing

In today's world of rapidly evolving cybersecurity threats, protecting your application from unauthorized access is paramount.

Spring Security, a powerful and flexible framework, plays a critical role in securing Spring Boot applications.

Whether you're dealing with traditional username/password authentication, JWT tokens, or other custom mechanisms, Spring Security provides the necessary tools to handle authentication and authorization seamlessly.

In this blog, we'll dive into the Spring Security architecture, exploring how various components like the Security Filter ChainAuthenticationManager, and Authentication Providers work together to secure your application.

1. Client Request

  • A user or client makes a request to the application, usually to access a protected resource.
  • This request is processed by a chain of Security Filters.

2. Security Filter Chain

  • Security Filter A, B, ... N: These are different filters that apply to the incoming request, where each filter can handle specific types of security logic (like CORS, CSRF protection, and more).
  • One of these filters is responsible for authenticating the user.

3. Authentication Flow

  • UsernamePassword Authentication Token: This token represents the user's credentials (username and password) and is passed to the authentication logic.
  • The authentication logic checks if the user credentials are valid. This typically involves checking a database or other identity source.

4. AuthenticationManager / ProviderManager

  • The AuthenticationManager or ProviderManager manages the overall authentication process. It delegates the authentication request to different Authentication Providers based on the type of authentication required.

5. Authentication Providers

  • JWTAuthentication Provider: If you're using JWT (JSON Web Token) for authentication, this provider handles the verification of JWT tokens.
  • DaoAuthentication Provider: This provider handles traditional authentication using a database, checking user credentials against stored data (e.g., in a relational database).
  • Other Providers (Authentication Provider N): You can define multiple custom authentication providers if your application supports multiple methods of authentication (e.g., OAuth2, LDAP).

6. UserDetailsService

  • The UserDetailsService is responsible for loading user-specific data, typically by looking up the user’s details from a database (using the DaoAuthenticationProvider).
  • The PasswordEncoder ensures that passwords are securely encoded (hashed) before they are compared during the authentication process.

7. SecurityContext & JWT Authentication Filter

  • If the user is successfully authenticated, the SecurityContext is updated to store the user’s authentication status.
  • The JWT Authentication Filter is responsible for handling JWT tokens, ensuring that valid tokens allow access to protected resources.

8. Authentication Request/Response

  • Once the authentication is performed by the filters and providers, an Authentication Request is sent to the backend.
  • After validation, an Authentication Response is returned, which could contain the authentication token (such as a JWT), allowing the client to access secure resources in subsequent requests.

9. SecurityContextHeader

  • The SecurityContextHeader encapsulates important security information like the user’s Principal (authenticated user), Credentials (such as the password or token), and Authorities (permissions or roles).
  • These fields include:
    • getAuthorities(): Fetches the roles or permissions granted to the user.
    • getPassword()getUsername(): Standard user details.
    • isAccountNonExpired()isAccountNonLocked()isCredentialsNonExpired()isEnabled(): These are checks to ensure the user account is in good standing (not expired, locked, etc.).

#spring #SpringSecurity #security

Git Merge vs. Rebase vs. Squash Commit! What are the differences?

When we 𝐦𝐞𝐫𝐠𝐞 𝐜𝐡𝐚𝐧𝐠𝐞𝐬 from one Git branch to another, we can use ‘git merge’ or ‘git rebase’. The diagram above shows how the two commands work.

𝐆𝐢𝐭 𝐌𝐞𝐫𝐠𝐞

This creates a new commit G’ in the main branch. G’ ties the histories of both main and feature branches. Git merge is 𝐧𝐨𝐧-𝐝𝐞𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐯𝐞. Neither the main nor the feature branch is changed.

𝐆𝐢𝐭 𝐑𝐞𝐛𝐚𝐬𝐞

Git rebase moves the feature branch histories to the head of the main branch. It creates new commits E’, F’, and G’ for each commit in the feature branch.

The benefit of rebase is that it has 𝐥𝐢𝐧𝐞𝐚𝐫 𝐜𝐨𝐦𝐦𝐢𝐭 𝐡𝐢𝐬𝐭𝐨𝐫𝐲.

Rebase can be dangerous if “the golden rule of git rebase” is not followed.

𝐓𝐡𝐞 𝐆𝐨𝐥𝐝𝐞𝐧 𝐑𝐮𝐥𝐞 𝐨𝐟 𝐆𝐢𝐭 𝐑𝐞𝐛𝐚𝐬𝐞 Never use it on public branches!

#Git #git #gitmerge #gitrebase

What is k8s (Kubernetes)? k8s is a container orchestration system. It is used for container deployment and management. Its design is greatly impacted by Google’s internal system Borg. A k8s cluster consists of a set of worker machines, called nodes, that run containerized applications. Every cluster has at least one worker node. The worker node(s) host the Pods that are the components of the application workload. The control plane manages the worker nodes and the Pods in the cluster. In production environments, the control plane usually runs across multiple computers and a cluster usually runs multiple nodes, providing fault-tolerance and high availability.

Control Plane Components

1. API Server The API server talks to all the components in the k8s cluster. All the operations on pods are executed by talking to the API server.

2. Scheduler The scheduler watches the workloads on pods and assigns loads on newly created pods.

3. Controller Manager The controller manager runs the controllers, including Node Controller, Job Controller, EndpointSlice Controller, and ServiceAccount Controller.

4. etcd etcd is a key-value store used as Kubernetes' backing store for all cluster data.

Nodes

1. Pods A pod is a group of containers and is the smallest unit that k8s administers. Pods have a single IP address applied to every container within the pod.

2. Kubelet An agent that runs on each node in the cluster. It ensures containers are running in a Pod.

3. Kube Proxy kube-proxy is a network proxy that runs on each node in your cluster. It routes traffic coming into a node from the service. It forwards requests for work to the correct containers.

#kubernetes #k8s

Enter your email to subscribe to updates.