RESTfull API, XML, IDL
- Registry API
- Registor API
- Anti-Registor API
- Hearbeat Report API
- Service Subscript API
- Service Update API
- Service Get API
- Cluster Deploy
- Services Health Status Dectation
- Service Status Change Notification
- Whitelist
-
Monitor Object
- Client Monitor
- Interface Monitor
- Resource Monitor
- Infrastructure Monitor
-
Monitor Indicator
- QPS (Query Per Seconds)
- Response Time
- Failure Rate
-
Data Collection
- Servcie Activly Report
- Agent Collection
- Smapling Rate
-
Data Transmition
- UDP
- Kafka
- Binary Foramt: PB
- Text Foramt: JSON, XML
-
Data Process
- Aggragate by interface/machine
- Index DB, time series DB
-
Data Demonstration
-
Practice
- Beats, LogStash: Collect & Transform
- ElasticSearch: Search & Analyze
- Kibana: Visualize & Manager
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
-
Nodes Management
- Remove by registry
- Remove by service consumer
-
Load balance
- Round-robin
- Weighted round-robin
- Least Active
- Consistent Hash
- Random
-
Service Routing
- Use case:
- Nearby access
- Grayscale release
- Trafic switching
- Read write separation
- Routing Rules:
- Condition Routing:
condition://0.0.0.0/dubbo.test.interfaces.TestService?category=routers&dynamic=true&priority=2&enabled=true&rule=" + URL.encode(" host = 10.20.153.10=> host = 10.20.153.11")
- Exclude some service nodes
- Whitelise and blacklist
- IDC separation
- Read write separation
- Script Routing:
- Condition Routing:
- Get Rule files:
- Local configuration files
- Configuration center
- Dynamic delivery
- Use case:
-
Service Fault Tolerance
-
Fail Over: Automatic switching on failure
-
Active-passive
With active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat is interrupted, the passive server takes over the active's IP address and resumes service.
The length of downtime is determined by whether the passive server is already running in 'hot' standby or whether it needs to start up from 'cold' standby. Only the active server handles traffic.
Active-passive failover can also be referred to as master-slave failover.
-
Active-active
In active-active, both servers are managing traffic, spreading the load between them.
If the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are internal-facing, application logic would need to know about both servers.
Active-active failover can also be referred to as master-master failover.
-
-
Fail Back: Notification on failure
-
Fail Cache: Cache on failure
-
Fail Fast
-
-
Service Health Management
- Heartbeat switch protection mechanism
- Protect registry center against vast requests from client when network is unstable
- Service node removal protection mechanism
- Protect registry center against removing all nodes when the network is unstable
- Heartbeat switch protection mechanism
-
Configuration Center
- Application:
- Resource Servicing
- Business dynamic degradation
- Packet traffic switching
- Application:
- Service Granularity
- Split by business logic
- Split by extensibility: stable and unstable
- Split by avilability
- Split by performance
- Cons: 🔗 Ref
- One problem is the mismatch between the needs of the client and the fine‑grained APIs exposed by each of the microservices.
- Some services might use protocols that are not web‑friendly. One service might use Thrift binary RPC while another service might use the AMQP messaging protocol. Neither protocol is particularly browser‑ or firewall‑friendly and is best used internally.
- It makes it difficult to refactor the microservices.
API Gateway is the single point of entry for the all the client requests. It acts like a reverse proxy that serves all the client traffic to the microservices in the cluster.
An API gateway takes all API calls from clients, then routes them to the appropriate microservice with request routing, composition, and protocol translation. Typically it handles a request by invoking multiple microservices and aggregating the results, to determine the best path. It can translate between web protocols and web‑unfriendly protocols that are used internally.
-
Functionalities of API Gateway: 🔗 Ref
-
Routing
Encapsulating the underlying system and decoupling from the clients, the gateway provides a single entry point for the client to communicate with the microservice system.
-
Offloading
API gateway consolidates the edge functionalities rather than making every microservices implementing them. Some of the functionalities are:
- Identity Provider, Authentication and Authorization
- Service discovery integration
- Response caching
- Retry policies, circuit breaker, and QoS
- Rate limiting and throttling
- Load balancing
- Logging, tracing, correlation
- Headers, query strings, and claims transformation
- IP whitelisting
- IAM
-
Centralized Logging (transaction ID across the servers, error logging)
-
-
Advantages of the modified architecture with API Gateway: 🔗 Ref
- Every client is not required to be aware of all the microservices and the endpoints it needs to talk to. This gives the application team flexibility to eventually migrate out of a microservice , make modifications to existing services or create a new microservice.
- We can offload any cross cutting concerns like Authentication, Logging and Caching to this gateway layer. For instance, by allowing only Authenticated and Trusted client traffic to flow through the gateway to microservices. Also internal communication between the services can happen over a trusted private network without worrying about handling additional overhead like Authenticating the request and securing communications over SSL.
- It can also help with API Composition by querying multiple microservices and joining on the results to produce the final aggregated response.
- It can also act as a Rate Limiter by throttling requests coming from a client that has gone into a bad state, this helps in making the cluster more fault tolerant.
-
Drawbacks of API Gateway 🔗 Ref
- It is yet another highly available component that must be developed, deployed, and managed.
- There is also a risk that the API Gateway becomes a development bottleneck.
- Developers must update the API Gateway in order to expose each microservice’s endpoints.
- It is important that the process for updating the API Gateway be as lightweight as possible. Otherwise, developers will be forced to wait in line in order to update the gateway.
-
Service Mesh and API Gateway
- It appears as though API gateways and service meshes solve the same problem and are therefore redundant. They do solve the same problem but in different contexts.
- API gateway is deployed as a part of a business solution that is discoverable by the external clients handling north-south traffic(face external client).
- Service mesh handles east-west traffic (among different microservices).
- Advantages:
- Enables decoupling of the request producer from the request consumer giving them the flexibility to process requests at their own scale.
- Gives the flexibility for each microservice to scale up and down based on the bursts in traffic. Neither the producer nor the consumer need to worry about request throttling.
- Improves the overall availability and fault tolerance of the system as the producer is less concerned with handling failures from the consumer. Consumers can be rest assured that as long as the proper message payload is published to the Message Bus it will be processed eventually.
REST is an architectural style enforcing a client/server model where the client acts on a set of resources managed by the server. The server provides a representation of resources and actions that can either manipulate or get a new representation of resources. All communication must be stateless and cacheable.
There are four qualities of a RESTful interface:
- Identify resources (URI in HTTP) - use the same URI regardless of any operation.
- Change with representations (Verbs in HTTP) - use verbs, headers, and body.
- Self-descriptive error message (status response in HTTP) - Use status codes, don't reinvent the wheel.
- HATEOAS (HTML interface for HTTP) - your web service should be fully accessible in a browser.
Sample REST calls:
GET /someresources/anId
PUT /someresources/anId
{"anotherdata": "another value"}
REST is focused on exposing data. It minimizes the coupling between client/server and is often used for public HTTP APIs. REST uses a more generic and uniform method of exposing resources through URIs, representation through headers, and actions through verbs such as GET, POST, PUT, DELETE, and PATCH. Being stateless, REST is great for horizontal scaling and partitioning.
- With REST being focused on exposing data, it might not be a good fit if resources are not naturally organized or accessed in a simple hierarchy. For example, returning all updated records from the past hour matching a particular set of events is not easily expressed as a path. With REST, it is likely to be implemented with a combination of URI path, query parameters, and possibly the request body.
- REST typically relies on a few verbs (GET, POST, PUT, DELETE, and PATCH) which sometimes doesn't fit your use case. For example, moving expired documents to the archive folder might not cleanly fit within these verbs.
- Fetching complicated resources with nested hierarchies requires multiple round trips between the client and server to render single views, e.g. fetching content of a blog entry and the comments on that entry. For mobile applications operating in variable network conditions, these multiple roundtrips are highly undesirable.
- Over time, more fields might be added to an API response and older clients will receive all new data fields, even those that they do not need, as a result, it bloats the payload size and leads to larger latencies.
- MQTT — Message Queue Telemetry Transport (MQTT) is an ISO standard pub-sub based lightweight messaging protocol used widely in the Internet Of Things.
- AMQP — Advanced Message Queuing Protocol (AMQP) is an open standard application layer protocol for message-oriented middleware.
- STOMP — Simple Text Oriented Messaging Protocol, (STOMP), is a text-based protocol modeled on HTTP for interchanging data between services.
A saga is a sequence of local transactions that updates each service and publishes a message/event to trigger the next local transaction. In case of failure of any of the local transactions, saga executes series of compensating transactions that undo changes made by preceding local transactions thereby preserving atomicity.
- Choreography Based saga — participants exchange events without a centralized point of control.
- Orchestration Based saga — a centralized controller tells the saga participants what local transactions to execute.
- Microsoft Safa
- Idempotent Transactions
- Eventual Consistency
- Distributed Tracing
- Service Mesh
- The Token Validation Microservice is to introspect and validate OAuth 2.0 access_tokens that adhere to either of the following IETF specifications:
- The Token Validation Microservice uses the introspection endpoint defined in RFC-7662, OAuth 2.0 Token Introspection.
- A client requests access to Secured Microservice A, providing a stateful OAuth 2.0 access_token as credentials.
- Secured Microservice A passes the access_token for validation to the Token Validation Microservice, using the /introspect endpoint.
- The Token Validation Microservice requests the Authorization Server to validate the token.
- The Authorization Server introspects the token, and sends the introspection result to the Token Validation Microservice.
- The Token Validation Microservice caches the introspection result, and sends it to Secured Microservice A.
- Secured Microservice A uses the introspection result to decide how to process the request. In this case it continues processing the request. Secured Microservice A asks for additional information from Secured Microservice B, providing the validated token as credentials.
- Secured Microservice B passes the access_token to the Token Validation Microservice for validation, using the /introspect endpoint.
- The Token Validation Microservice retrieves the introspection result from the cache, and sends it to Secured Microservice B.
- Secured Microservice B uses the introspection result to decide how to process the request. In this case it passes its response to Secured Microservice A.
- Secured Microservice A passes its response to the client.
-
Read and write separation
- Duplication Delay
- After writing, the read operation routed to the master server
- If read slave server failed, read master.
- Only non-key business use read and write separation
- Allocation Mechanism
- Code encapsulation
- Middle-ware encapsulation
- Duplication Delay
-
Database and table separation
- Business database separation
- have join, transaction, cost issues
- Table separation
- vertical separation
- Split out not frequently used, occupy a lot of space columns
- The complexity comes from increasing the number of operations
- horizontal separation
- Suitable for very big tables, e.g., more than 50,000,000 rows
- Complexity: data routing(Range, Hash, Configuration), join, count, order by
- vertical separation
- Business database separation
-
Cache
-
-
Cache Penetration: the data to be searched doesn't exist at DB and the returned empty result set is not cached as well and hence every search for the key will hit the DB eventually.
- Cache empty/null result
- Bloom filter
-
Cache breakdown: the cached data expires and at the same time there are lots of search on the expired data which suddenly cause the searches to hit DB directly and increase the load to the DB layer dramatically.
- Use lock
- Asynchronous update
-
Cache avalanche: lots of cached data expire at the same time or the cache service is down and all of a sudden all searches of these data will hit DB and cause high load to the DB layer and impact the performance.
- Using clusters to ensure that some cache server instance is in service at any point of time.
- Some other approaches like hystrix circuit breaker and rate limit can be configured so that the underlying system can still serve traffic and avoid high load
- Can adjust the expiration time for different keys so that they will not expire at the same time.
-
Cache hotspot
-
-
Policies
- Cache Aside
- Miss: The application first fetches the data from the Cache, if not, fetches the data from the database, and puts it in the cache after success.
- Hit: The application fetches data from the Cache and returns after fetching it.
- Update: First save the data in the database, and then invalidate the cache after success.
- Read/write Throught
- Read/write Back
- E.g.,
def get_user(self, user_id): user = cache.get("user.{0}", user_id) if user is None: user = db.query("SELECT * FROM users WHERE user_id = {0}", user_id) if user is not None: key = "user.{0}".format(user_id) cache.set(key, json.dumps(user)) return user
- Cache Aside
-
Indexes
- Hit rate
- Live time
- Be careful with crawleres
-
-
Load Balance
- DNS Load Balance: geographic-level
- Hardware Load Balance: cluster-level
- Software Load Balance: machine-level
-
Async Handling
- High availability Storage Architecture
- Double Machine
- How master copys data to slave
- How slave detects health status of master
- How switch off when master down
- Cluster and Partition
- Need to consider data's: balance, fault tolerence, scalibility
- Data partition Ruls: by address location: state, country
- Data duplicaton: Centralized, mutual backup, independent
- Double Machine
- Live more in different places
- Key business first
- Make sure key data BASE
- Multiple sync methods: MSQ, read again, sync by storage system, back to source read, regenerate data
- Guarantee most user
- Practice: Live more in different places
- Classify Business: highly visited business, core business, profitable business
- Classify Data: amount, uniqueness, lossability, recoverablity,
- Data sync: storage system sync, MSQ, regenerate
- Handle exception: multil-channel sync, log record, compensate user
- Handle interface-level failure
- System degradation
- Circute break
- Flow limite
- Queue
- Split System
- Process-oriented split: split the entire business process into several stages, with each stage as a part.
- Service-oriented split: split the services provided by the system, each as a part.
- Function-oriented split: split the functions provided by the system, and each function as a part.
- Architecture:
- Process-oriented split: layered architecture.
- Service-oriented split: SOA, microservices.
- Function-oriented split: microkernel architecture.
Resilience in microservices is the ability of recover from failures and return to the fully functional state. 🔗 Ref It’s not about avoiding failures but responding to failures in a way that avoids downtime and data loss.
- Source of fault
- Fault Split Policies
- Split by service
- Split by user
- Key points design
- Well deinfed granularity
- Making trde-off between system complexity, cost, performance and resource usage.
- Needs of high-availability, retry, asynchronous, message middleware, flow control, fusing and other design modes
- Complexity of operation and maintenance
- key points of design
- Decople the dependicies of servcies to make better isolation of services
- Better throughput, and the performance of each service is relatively independent without interference.
- Using the Broker or queue method can also achieve the jitter throughput into a uniform throughput, this is the so-called "peak clipping", which is a good protection for the back-end system.
- The services are relatively independent, and they can be independent from other services in terms of deployment, expansion, and operation and maintenance.
- Key points of the design
- The retry time and number of retries
- Consider the idemptence of the callees
The initial state of a CB is Closed, what means that information is flowing from one service to another².
After an specific event occurrence (usually a certain amount of tries that result in error) the circuit goes to OPEN state, which means that the information flow is interrupted². It stays there for a certain amount of time or even until another criteria is reached². Thus, during this period every call to the function returns a circuit break error².
After this, the circuit goes to a Half Open State, when the function is called again it tries to contact the other service one more time². If it succeeds the CB goes to Closed state, otherwise it goes back to OPEN².
- Key points of the design
- The type of error
- Log monitoring
- Service health checking: the fuse can periodically pings the remote services
- Manual reset
- Concurrency issues
- Resouce partition
- Policy
- Denial of Service
- Service Degradation
- Privilege Request
- Delay Processing
- Elastic Scaling
- Degradation design is about sacrifice:
- Reduce consistency. From strong consistency to final consistency.Stop secondary functions.
- We should clearly realize that most systems in the world do not need strong consistency.
- Stop accessing unimportant functions, thus releasing more resources.
- Simplify the function. Simplify some functions, such as simplifying business processes, or no longer returning full data, only partial data.
- An API will have two versions, one version returns the full amount of data, and the other version only returns part or the smallest available data
- Reduce consistency. From strong consistency to final consistency.Stop secondary functions.