Data Centers / Zones / Regions
Data Centers Zones Regions ( Geographical locations )
Low availability -----------------------------------> High Availability
High Latency to other regions -----------------> Low latency
Limited Customer Groups ------------------------> Many customer groups
Sensitive Data / Government Regulations ------> Can store region specific sensitive data
Support multiple Government regulations and audits
Google Compute Engine Features
1. Virtual Machines ( Cost optimized / Memory Optimized / Compute Optimized/ GPU )
2. Persistence Disks. ( Attached Disks or Network Disks)
3. Load balancing with Auto Scaling
4. Network configurations
VM Families
Cost Optimized Family - E2, N2, N2D, N1
Good for testing, small scale application deployments
Memory Optimized Family - M1, M2
Good for in memory DB, in memory analytics
Compute Optimized Family - C2
Good for Gaming
GPU based Family
Good for machine learning
Machine Types
E2-Standard-2, 4,8,16,32
Memory , Disk, Network capabilities will be increased along with number of CPUs.
Virtual Machines IPs
Internal IP - VM can be accessible from internal network
External IP - VM can be accessible via this IP. Non static external IPs will be changed every restart
- Remove external IPs if no need to access it over internet
Static IP - Assigning constant external IP to VM
Static IPs in VMs
VPC Network -> External IPs -> Reserve Static IPs -> Select the scope -> Attached to VM -> Dynamic IP will be automatically released.
Static IP with no VM attached will be charged higher rate hourly rather than it assigned time to VM
Static IP can be switched from one VM to another VM in same project
Static IP address need to be removed manually.
Simplifying VM startups
1. Bootstrapping with startup scripts
Create VM -> Automation -> Startup Script -> Past scripts in y options at required places.
2. Instant templates
Pre configured VM templates , bootstrap time will be high due to OS patches, software installations
3. Custom images
Can be created from vm instance, snapshot, persistence disk, file in cloud storage
Can be shared across projects and multi regional
Can be depreciated old images
Pre Requests Create VM Instance
1. Project
2. Billing Account
3. Enable Compute Engine APIs
Sole Tenant Nodes
Dedicated compute engine server to start VMs. Very expensive
VM Manager
Can manage 1000s of VM machines , software updates and etc.
Managed Instant Groups
Group of VM instances managed by single entity
Can be in Zonal or Regional
New versions of Apps can be deployed with Manged instance groups
Update Types - Rolling updates, Canary updates, Rolling restart and Rolling replace considering Max surge and Max Unavailable
Manage Instances can be used for REST API services, auto scaling, load balancing
Un-Managed VM
Different configurations. It doesn't support auto scaling / healing and load balancing
Features : health checks, auto scaling, single any-cast IP
Communication protocols
Application Layer protocols - HTTP , HTTPS, SMTP
Transport Layer ( Ensure Network layer communication ) protocols - TCP , TLS, UDP
Network Layer ( bit and bytes ) protocols - IP
Cloud Load Balancing
HTTP/S Load balancer - Layer 4 Load balancing, Good for rest APIs. Support multi-regional
TCP Load balancer - Layer 4 Load balancing, Good for gaming systems, Support multi-regional, Can be used as SSL proxy, High performance
UDP Load balancer - Layer 4 Load balancing, Good for gaming systems, Support single-regional, High performance but not considering server availability
Create Load Balancer
Network services -> Create Load Balancer -> HTTP/S / TCP / UDP -> Create instant groups each backend services-> Routing rules -> Client integration with LB port and IP -> Health Check configuration-> Cloud CDN for static contents -> Cloud Armer for Security -> Select protocol .
SSL Termination ( Layer 7 to Layer 4 )
Client to Load balancer ( TLS / HTTPS )
Load balancer to internal services ( TCP / HTTP )
GCP Load balancer concepts
Backend ( Manage instance group )
Backend Service ( Group of backends )
URL mapping ( Routing requests )
Choosing Correct Load Balancer
For internal traffic - Internal Load Balancer ( internal HTTP/S , internal TCP/UDP load balancers ). Support regional traffic
For external traffic - External Load Balancer ( external HTTP/S , SSL proxy for TCP traffic under SSL offload, TCP proxy if global load balancing required ) support global traffic
If back-end services require to trace exact client IP, then need to configure HTTP/S load balancers as pass through not a proxy
Global routing under premium network tier to route traffic to nearest region
Architectural Concerns
Resiliency : Provide the right functionality when one or more part of the system failed.
Use manage instance groups behind global load balancer
Use cloud monitoring
Use cloud logging
Use health checks
Use hardened images
Availability: Applications available when users need them
- Create instance groups in multiple regions
- Implement Global HTTPS load balancer
- Implement health checks for instance groups and load balancers
- Enable live migration for VM instances using availability policy
Live migration support VM with local SSD, It is not supported for GPU and preemptive instances.
Scalability : Handling in growth of users, traffic and data size proportional to resource
- Vertical Scaling - Increase CPU and Memory. It is costly solution
- Horizontal Scaling -Deploy more instances with load balancer. Most preferable solution.
- VM level vertical scaling - Increase CPU family, Number of CPU cores, Memory
- VM level horizontal scalling - Distribute VMs in single zone, multiple zone in single region, multiple zones accross multiple regions
Performance
Use correct machine family type
Use GPU for machine learning or math intensive workload
TPU for matrix operations
Use hardened images to reduce startup time
Security
Enforce firewall rules to restric traffic
Use internal IPs as much as possible
Use sole-tenant nodes
Use hardened images
Cost
Automatic discounts - enable by default
Discounts increases with usage - enable by default
Committed discounts
GCP Billing
Pay for every second after the 1st minute
Once VM shutdown have to pay for the storage
Enable budget alters
Cost Efficiency
Enable VM level auto scaling
Understand sustained discounts
Committed discounts
Use preemptable VMs when possible
GPU ( Graphics Processing Units )
For math / video intensive processing
Have to use images with GPU libraries
Cannot be used with memory optimized , shared core machine types
Preemptive VMs
Short lived very cheep VMs
Can be terminated withing 24 hours. It will provide 30 second warning
Good for batch jobs
gcloud
Big query - bq
Cloud Big table - cbt
Cloud Storage - gsuit
Kubernetes - kubectl
Cloud Terminals
Cloud shell from web console : backed by 16 GB disk with small VM
Cloud SDK terminal
Gcloud command
gcloud --version
gcloud init
gcloud config list
Gcloud command structure
gcloud <group> <subgroup> <action>
Ex: gloud compute instances list
gloud compute zones list
gloud compute regions list
gloud compute machine-type list
gloud compute machine-type list --filter zone:asia-southeast2-b
gloud compute instances create test-vm-1
gloud compute instances describe test-vm-1
gloud compute instances delete test-vm-1
Google Managed Services
IaaS - Infrastructure As a Service
Cloud provider responsible up to virtualization
User has to install OS, install OS patches, run times, Auto scaling , Availability
Ex : Compute Engine
PaaS - Platform As a Service ( CaaS, Serverless )
Cloud provider responsible for Auto scaling, Load balancing, OS patches, Availability
User has to configure the application.
Ex: App Engine
FaaS - Function As a Service ( Serverless )
Functions instead of Apps
Cloud Functions
CaaS - Container As a Service
Containers instead of Apps
Ex: Google Kubernetes Service, Cloud Run
Serverless
It doesn't mean no servers
No visibility to infrastructure including which OS is running and etc.
It provides auto scaling
It provides discovery services,
It provides load balancing
It provides zero downtime deployment facilities
User require to consider about function only.
Ex : Lambda
Containerizing
Create docker image of your application, that will has all dependencies and can be run in all possible environment.
Compare to VM, they don't have OS. That is a advantage
Docker containers are independent to each other
Container Orchestration
Kubernetes is a Container Orchestration Framework.
It provides auto scaling
It provides discovery services,
It provides load balancing
It provides zero downtime deployment facilities
App Engine
Simplest way to deploy application in GCP
Provides end to end application management
Support many languages.
Provides load balancing, auto scaling, OS patch updates, Health check monitoring, Application versioning
Provides two environments - language specific sandbox and docker containers
App Engine Component Hierarchy - Project ( Application ) -> Multiple Services -> Multiple Versions ( Multiple Version can be co existed ) -> Multiple Instances
Compute Engine vs App Engine
Compute Engine - IaaS, User has to choose images, Network configuration, Availability, CPU, Memory and Etc
App Engine - PaaS , Serverless, Cannot update CPU, Memory and etc.
App Engine Feature Configurations -
Standard Configuration
Flexible Configuration
App Engine Scaling
- Automatic ( Based on cpu , throughput, concurrent request, Good for continuous workload )
- Basic ( Scaling happen when request retrieved. Good for adhock workloads )
- Manual
App Engine Commands
cd default-service
gcloud app deploy
gcloud app services list
gcloud app versions list
gcloud app instances list
gcloud app deploy --version=v2
gcloud app versions list
gcloud app browse
gcloud app browse --version 20210215t072907
gcloud app deploy --version=v3 --no-promote
gcloud app browse --version v3
gcloud app services set-traffic split=v3=.5,v2=.5
gcloud app services set-traffic splits=v3=.5,v2=.5
watch curl https://melodic-furnace-304906.uc.r.appspot.com/
gcloud app services set-traffic --splits=v3=.5,v2=.5 --split-by=random
cd ../my-first-service/
gcloud app deploy
gcloud app browse --service=my-first-service
gcloud app services list
gcloud app regions list
Application Deployment with App Engine
Create GCP project
Enable App engine admin APIs
Create App engine application, select region, select language and etc.
Go to project editor
Create below project hiererchy
my-first-project
|--> app.yaml
|--> Test.java
app.yaml
runtime : java11
entrypoint : java -jar test.jar
Test.java
Your application code
How to run
gcloud config set project stately-equinox-342317
gcloud app deploy
brows the target URL to view the result. It
App Engine versioning and traffic allocation
gcloud app services list
gcloud app versions list
gcloud app instances list
gcloud app deploy --version=v2
gcloud app versions list
App Engine traffic distribution
gcloud app service set-traffic spliit=v3=0.5,v2=0.5 ( IP based spliiting )
gcloud app service set-traffic spliit=v3=0.5,v2=0.5 --spliit-by=random
Deploy App Engine App as a service
yaml file
| runtime : java11
| service : my-first-service
gcloud app deploy
Can be access via service URL generated
Delete App Engine Apps
Stop traffic first manually : gcloud app deploy --version=v3 --no-promote
Google Kuberneters
------------------
Container ochestration platform that provide auto scalling, service discoveries, helth checks and self healing
Cluster auto scalling
Cluster can have multiple VM nodes
Pod auto scalling
Pod is a instance of a microservice
Integrate with cloud logging and cloud monitoring
Create cluster in auto pilot mode to optimize the cost
Setup seperate node pool if we require new setup with new CPU level
Kubernetes ingress is to routing rules to route traffic
Kubernetes cluster has master cluster and worker node cluster
Pod can have multiple containers
A pod can have ephemeral ip
For low cost solution - use premtibel vms. Use e2 than n1 machine types, commited discounts
For efficint solution - use horizontal pod auto scaller and cluster auto scaller
Pod status pending - due to resource problem
Pod status waiting - due to failure on image fetch
GKE cluster types
-----------------
zonal cluster ( single zone and multi zone - master cluster available in single zone in both )
regional cluster ( master and worker nodes availalble in multi regions )
private cluster - Used in VPC setups have internal IPs
Google Kubernetes deployments can be done by YML or by Commands
---------------------------------------------------------------
YML files can be obtained once we create manually it. Can be edit and later
deployment.yaml and service.yaml need to change accoridngly - kubectl apply -f deployment.yaml
GKE services are responsible to expose services to external users
gcloud config set project my-kubernetes-project-304910
gcloud container clusters get-credentials my-cluster --zone us-central1-c --project my-kubernetes-project-304910
kubectl create deployment hello-world-rest-api --image=in28min/hello-world-rest-api:0.0.1.RELEASE
kubectl get deployment
kubectl expose deployment hello-world-rest-api --type=LoadBalancer --port=8080
kubectl get services
kubectl get services --watch
curl 35.184.204.214:8080/hello-world
kubectl scale deployment hello-world-rest-api --replicas=3
gcloud container clusters resize my-cluster --node-pool default-pool --num-nodes=2 --zone=us-central1-c
kubectl autoscale deployment hello-world-rest-api --max=4 --cpu-percent=70
kubectl get hpa ( horizontal pod autoscalling details )
kubectl create configmap hello-world-config --from-literal=RDS_DB_NAME=todos ( This is for microservice configurations )
kubectl get configmap
kubectl describe configmap hello-world-config
kubectl create secret generic hello-world-secrets-1 --from-literal=RDS_PASSWORD=dummytodos
kubectl get secret
kubectl describe secret hello-world-secrets-1
kubectl apply -f deployment.yaml
gcloud container node-pools list --zone=us-central1-c --cluster=my-cluster
kubectl get pods -o wide
kubectl set image deployment hello-world-rest-api hello-world-rest-api=in28min/hello-world-rest-api:0.0.2.RELEASE
kubectl get services
kubectl get replicasets ( show deployment versions and ensure number of pods to be run )
kubectl get pods
kubectl delete pod hello-world-rest-api-58dc9d7fcc-8pv7r
kubectl scale deployment hello-world-rest-api --replicas=1
kubectl get replicasets
gcloud projects list
kubectl delete service hello-world-rest-api
kubectl delete deployment hello-world-rest-api
gcloud container clusters delete my-cluster --zone us-central1-c
Ingress
-------
Provides load balancing and routing rules
SSL terminations
Only require one load balancer for many microservices with Ingress
Container Registry
------------------
Docker hub is public container registry
Google has its own container registry to store docker images
Docker file main content
------------------------
FROM - base image
WORKDIR - working directory
COPY - copy files from local machines to image
RUN
EXPOSE
CMD node index.js
Cloud Functions
---------------
1st gen cloud functions ( problems - cold start, small instance sizes, low timeouts , one request per one instance )
2nd gen cloud functions ( 1 hour timeout, larger instnce sizes, traffc split amoung services, support 90+ events, can handle many requests from one instance )
Run code when it trigger cloud event.
Pay for the execution time, cpu memory used
Max Time bound - 1 hour
Support multiple languages
It uses Cloud build - Google CICD tool
Cloud Run
---------
Google serverless platform
Easiest way to deploy containarized application
Pay per use for cpu / memory
gcloud run deploy service-name --image --revision
gcloud run revision list
gcloud run services update-traffic my-service --to-revisions=v2=10,v1=10
Anthos
--------
Run kebernetes clusters anywhere - aws , azure etc.
Google KMS
----------
Support encryption and decrption mechanisms
Create key ring - that holds multiple keys
Add encryption mechanim when we create VM instances
Cloud Storages
###############
Block Storage
-------------
Block storage is fast and efficient
( Persistance disks - Good for durability - Performance scale with size, Local SSD - Good for high perforamnce, Keeps temporary data )
Can connect read only block storages with multiple servers
Can be access as DAS , NAS - Direct Attached or Network Area Storage
Zonal Replication - only in one zone
Regional Replication - multiple zones
File Storage
------------
Large files, Support sharing ex: Video editing files
Object Storage
--------------
create buckets first
very inexpesive, can store very large objects
max object size is 5 TB
can store unlimitted items
Choose storage classes
Support Object versioning
Ramp up request rate gradually
Cloud storage command - gsutil
Use cloud transfer service for more than 1 TB
Use physical transfer appliance if it takes more than serveral days
Data lock to avoid changes
Storage classes - standard, nearline, coldline, archive
Cloud IAM
##########
Authentication - Is this right user ?
Authrorization - Do they have access ?
Identities - GCP user/ users/ application / unauthenticated users
Roles - Set of permissions on specific resources
Policy - Assign roles to users
Basic roles ( Viewer / Editor / Owner ) not recommended to use in production
Use predifined roles or custom roles define by our self
Command line - gcloud
Policy troubleshooter to troubleshoot access issues
Use services accounts which generated by GCP to provide access.
Service accounts has no passwords. Based on public / private RSA keys
Permission to cloud -> Cloud ( Use services accounts, Key managed by cloud )
Permission to OnPrem -> Cloud ( Use service account with user managed keys )
Credential Types -> Oauth 2( for elavated permissions ), OpenId ( service to service authentication ),
Accesss Control List -> to avoid uniform access given by service accounts )
Can expose bucket content in to simple static web site. Name of the bucket should match with DNS
Cloud DB
##########
Keep 2 or more data centers
Synchornous replication ( master db, standby db, take snapshots from 2nd db )
11'9 data durability
4'9 data availability
Increase availability ( distribute data across zones, regions )
Recovery point object - Acceptable period of data loss
Recovery time object - Acceptable downtime
Reduce RTO, RPO is key
Hot standby ( Master to Standby failover to minimize RTO RPO )
Warm standby ( RPO is 1 min , RTO 15 min with minimum infrastructure to scale up)
Cold standby ( RPO is 1 min, RTO few hours, regular data snapshots with transaction logs to cloud storage )
Enhance DB performance ( By memory and cpu , costly distrubuted databases , create read replicas for read operations )
Strong consistancy - Replicate all nodes synchronously, impact on performance , good for banking transactions
Eventual consistance - Replicate with little lag, Good for social media
Relational DB for OLTP ( Based on row storage )
-----------------------------------------------
Cloud SQL - Good for OLTP as it is good for transactions ( atomocity, consistancy ), Good for few TB
- Support mysql, postgress databases
- Can configure HA with synchronus replication
- Use cloud sql proxy
Cloud Spanner - To store petabyte level db
Relational DBfor OLAP ( Based on column based storage, support high compression )
---------------------------------------------------------------------------------
Cloud Big Query
Non Relations DB
-----------------
Cloud Firestore - Serverless document DB with consistancy, Scalability, Support ACID transactions , for few terabytes
Cloud Bigtable -
Good for more than 10 TB to Petabytes, NoSQL DB, Good for apps having more than 30 GB data per hour, Based on Hbase, Good for IOT streeming, Automatically shared data. reads and writes amoung cluster nodes, support HDD and SSD, create multiple clusters with replications for high availablity.
Cloud datastore can create multiple indexes but Cloud bigtable support only single index per table.
In memory DB
----------------------------
Memorystore ( Caching, sessions management )
Cloud VPC
-----------
Create all resources under VPC.
Can attach global resources.
VPC Subnet
-----------
Control the access to internal services. Public subnet and private subnets
Shared VPC
----------
To work multiple projects together
VPC Peering
-----------
Connect VPC to differnt network of another organization
All communication happend only internally
CIDR Blocks
------------
Subnet IP ranges, A CIDR block is a collection of IP addresses that share the same network prefix and number of bits.
192.168.1.0/24 ( Availble IPs : 2 pow (32-24) = 256 )
Firewalls
---------
If input traffic is allowed, then output traffic will be automatically allowed
Default SSH port 22 allowed
All all internal traffices
Default RDP port 3389 allowed
Default ICMP allowed
Denied all ingress
Allowed all egress
Create VM instances with tags and attach tags to Firewall rules
Allow traffic from load balancer only
Remove 0.0.0.0/0 source IP
Allow helth checks from load balancer
Set priority on rules if requried
Cloud Operations
----------------
Cloud Monitoring
----------------
Cloud Monitoring with workspaces
Can monitor AWS accounts as well
Can create alerting policies
Install cloud monitoring agents
Cloud Loggings
--------------
Realtime loging , analyze , store massive data
Log Explorer, Log Dashboard, Log Metrics, Log Router
Audit Logs ( Admin access logs, Data access logs, System Event Logs, Policy Denied Logs )
Cloud debugger is depreciated
Cloud Profiler
--------------
Identify performance bottlenecks
Profiling agent
Profile interface
Profile the source code
Cloud Trace
-----------
Trace requests
Error Reporting
---------------
Can be send to cloud logging
Can be send by calling Reporting API
Support realtime error reporting
Stackdriver is now comes ( Cloud monitoring, cloud reporting, Cloud loging, Cloud Trace )
GCP Resource
------------
Organization -> Folder -> Project -> Resources
Make sure better naming covention
Billing account
---------------
Mandatory
Can associate with multiple projects
Can have multiple billing accounts in the organization
2 Types - self served, invoiced - for large places
Can configure budget and alerts
IAM best practices
------------------
Principle of least priviladge
Super Admin can do any thing
Use Google Workspace to manage users. Link it to GCP
Use GCP with your identify provider
Coporate Directory Fedaration - integrate external identify provider with Federate Cloud Identity or Google workspace
IAM members ( Google accounts, Service App accounts )
Enable SSO ( Users redirect to external identity provider )
When Users authenticated, SAML is sent to GCP
Google Access Control List ( give permenent access to subset of objects )
Google IAM ( Ex : Give permenent access to entire bucket )
SignIN URL ( Ex: Give time limited access to entire bucket )
Roles and Groups ( Ex: Give certain access to development team )
Google Cloud Directory Sync to sync Active directories
VM SSH
-------
2 Options to create keys ( Create manually and configured , OS managed )
2 ways to connect ( Use SSH button or use gcloud compute ssh )
Trobleshoot VM Startups
-----------------------
Check quota errors
Boot disk failures
Serial port errors
VMs cannot be move between zones/regions which are having local SSD or terminated status
Create snapshots, copy persistance disks to required zones
Pub/Sub ( has topics and subscribers )
--------------------------------------
Aysnchronous Communication via topics
Can handle up to 1BN messages per day
Auto scalling
Good for streeming, analytics
Support both push and pull
Pull messages - subscripber pull messages when ready
Push message - subscriber provides web cook and publisher send data
Data Flow
---------
Can be use for streeming and batch processing
Use cloud data flow to deduplication of messages
pub/sub -> Dataflow -> Big Query ( Streeming )
pub/sub -> Dataflow -> Cloud Storage ( Files )
Cloud Storage -> Dataflow -> Bigtable/Spanner/Datastore/Big Query ( Batch processing )
Hybrid Cloud
------------
Cloud VPN ( Can connect VPN to GCP over internet, Use IPSec VPN tunnel, Use Internet Key Exchange )
VPN gateways are regional
HA VPN ( Provides only two IP tunnel , support dynamic routing, need dynamic router )
Classic VPN ( Provides only one IP tunnel , support static routing with manual work )
Make sure failover mechanism ( Dedicated interconnect as primary and VPN or Direct peering option )
Cloud Interconnect
------------------
High speed low latency private network connect google services to private network
Dedicated interconnect - for high bandwith
Partner interconnect - for low bandwith
Datawarehouse - BigQuery
------------------------
BigQuery is relational DB
Import and Exporting Data
Support streeming data
Automatically expire data
Can query data from external (Cloud storage, Cloud SQL, BigTable,Google Drive ) without storring
Access Data via Cloud console, bq command, BigQuery REST API, HBase API )
Big Query are very expersive. So make sure to do a calculate using pricing calculator and dry run
Cost will be estimated based on amount of data scanned. So table partioning and clustering ( grouped query ) important
Make sure configure expire data sets, expire partions Importing batch of data is free. Streeiming is expensive. Can use Cloud dataflow and dataproc import after processing Can be use federated queries to import external data BigQuery is very fast for very complex queries Cloud Dataproc -------------- It is data analytic service Managed spark and hadoop service Data Life cycles ---------------- Inges -> Store -> Process and Analysis -> Export Visualize Cloud Data Loss Prevention -------------------------- Control sensitive data Inges ----- Streeming with pub/sub Batch by Storage Transfer Service, BigQuery Transfer, Transfer Applience, gsuit DB migrations Store ------ Cloud Storage -> Unstructured Object Store Cloud SQL -> Manged MySQL and other relational DB Cloud Spanner -> Horizontally scalable relational DB. Good for transactions, can scale globaly. Cloud Filestore -> No SQL DB Cloud BigTable -> No SQL DB for massive data operations Cloud BigQuery -> Data analytics Process and Analysis -------------------- Dataprep -> clean and transform for ML work Dataflow -> ETL pipeline Dataproc -> Complex processing using spark and hadoop Explorer and Visualize ---------------------- Cloud BigQuery ML Prebuilt ( Vision API, Speech to text, NLP API, Video Inteligence API ) ML Custom Built ( Based on Tensflow ) Cloud Datalab - > Webbased explorer, can use jupiter notebooks to run Cloud Datastudio -> Dashboard Cloud IOT --------- Cloud IOT Core ( Register , authentication, authorization ) Use pub/sub Data Lake --------- Since Big Data solutions are very complex, datalake is a centralized repository designed to store, process, and secure large amounts of structured, semistructured, and unstructured data Provides flexibility with cost effective
Caching Usecases
1. Backend data is not changing frequently
2. User sessions
memory sore
============
Fully managed in memory data store
Can setup monitoring
Support both redis and memcache
Memcache for pure caching ( reference data, db queries, session stores )
Redis for low latency, high availability and persistance ( Good for games )
Can be access from google cloud, app engine, google kubernetes, cloud functions
Make sure to create set of nodes incase of failures
Memcache in App Engine
======================
Legacy in memory cache. Good for DB query cache, Cache session data, User preferences
No persistance
2 level of cache proviers ( Free shared memcache, dedicated memcache )
Use hashkey to check cache availability
Cloud CDN
==========
Serve global content with low latency
Google uses edge servers with https load balancing via proxy
First laod content from backend
TTL to configure cache expiry
Process Evolution
-----------------
Agile -> Devops -> Site Reliability Engineer
Agile ( Many itegrations, Focus on software rather than document, customer involvents, Respond to change )
cloud Source Repositories ( Fully featured or Private GIT repository )
Container Registiory
Jenkins
Cloud Build to build jar, docker files
Spinnarker - multicloud continuos delivery
Infrastructure as a code
--------------------------
Create infrastructure using a code, reduce mistakes when creating similar envirnment for QA,
Use terraform
Use Google cloud deployment manager ( Create VPC, Subnets, load balancers using a script )
Configuration managenemt
------------------------
Install right software and tools
Use chef, puppet, ansible
Cloud Marketplace ( Cloud launcher )
------------------------------------
Centrol repo of deployable apps . Ex; wordpress
Site Reliability Engineering
----------------------------------
Availability , Latency, Performance, Efficiency, Change Management, Monitoring, Emergency Response, Capacity Planing
Service level objects ( Internal Objectives ). 99.999 availability, 99.99999% durability
Service level aggrements ( Extenral Objects for customers ) - signed contract
Site Reliability Engineering Best Practices
----------------------------------------------------
Handle excess load by load Shedding ( API limits for different customer bases )
Avoid cascade failrues ( Circute breakes )
Pentration testing
Load testing
Resilience testing ( Stress testing with internal failures )
Enable test automation
Frequent and small feature releases
Reduce cost of failures
Release Management in Google Cloud
Some deployment needs 0 downtime
Only one version at live in a given time
One instance expose to production traffic before go live
Small incremental changes
Automate release management
Deployment Approaches
Recreate - Down V1 and Deploy V2
Canary - Deploy few instances of V2, test and deploy rest of V2 instances, Make sure data are backword compatible in case of failrues
A/B testing - Check users are like new feature or not
Rolling - First deploy 5% of instances, increasing gradually in given time window
Rolling with Batches - First deploy new instance v2, then rolling gradulally
Blue Green - Create parellel envirnemnt V2. Once testing done, move traffic to V2, no downtime. This enable shadow testing
Kubernetes : rolling update is default deployment type with maxSerge, maxUnavailble. No downtime as recreate approach
Manage Instane Group - gcloud commands for rolling releases. Can mention rolling percentages
App Engine Commands
Shift all trafic to V2 : gcloud app deploy
Deploy without shifting traffic : --no-promote
Shiffting traffice V2 at once : gcloud app services set traffic s1 --spliits V2=1
A/B Testing : gcloud app services set traffic s1 --spliits V2=0.5,V1=0.5
Complience and Regulations
ISO 27001 certified for information security
ISO 27017 certified for cloud serviecs
ISO 27018 certified for cloud privacy for PII data
ISO 27701 certified for global privacy
PCI DSS certified for payment card indutry with data security standards : Ex: payment solutions
-- Allow outboud traffic to external payment systems, Only Compute engine and GKE recommended as App engine and Cloud function are not allow egress traffic
-- Use HTTPS load balancer with signed certiificates
-- Review Access Rules, Resource inventry over the time
SOC1 and SOC2- Audit standards
COPPA for children's privacy - Ex : children's web sites
HIPAA for health insurance Ex: health solutions for UK / USA
Cloud Migrations
Rehosting - Simply take the application from data center and deploy cloud
Replatforming - Make few adjustments like contrainarizing
Repurchace - Complete cloud native migration
Refactoring - Costly
Cloud Architecture Decisions
Reduce cost ( licence cost, computing cost, storage cost, ingress and egress data cost, personal account cost, penelties )
Use mannage servcies to reduce cost ( auto scalling , availability handles itself )
When project starts, Identify business requiments and define KPI
Define Technical Requirements
Functional Requirments
private networks, flexible schema , Large volumne of data stores, Container ochestration
Non Functional requirements
Avilablilty ( geographycal distrubution, manage instance groups with load balancing, multi master cluster with use of regional clusters, cluster auto scalling, enanle live migration for compute engines )
Use manage servies ( App engine, Cloud functions, Cloud storages, Cloud Filestore, Cloud Datastore, BigQuery , live resizing with persistance disk, maintain big table clusters , multi regions for cloud datastore, Use HA for Cloud SQL, Use premium network tier, Use hybrid network for availability )
Scalbility ( High scalable compute instance groups, Use pod and cluster autoscalling, Make sure to use resources that can scale fast , Cloud SQL not support horizontal scale , persistance disks can be scale horizontal and verticle, Cloud storage, App engine, Cloud functions, pub/sub, Big query, Cloud datastore are auto scalling severless features, Big table, Cloud spanner, Cloud SQL, data proc are not serverless )
Security - CIA ( Confidentiality : only right peole has right access , Integrity : IAM best practices, encryption, hash verification, digital signatures Availability : Availability at right time to right people by firewalls, redundency, failovers, protect from ddos attacks, use private networks with firewall rules, use private IPs )
Cloud KMS - Can generate private and public keys and digital signatures to verify integrity, This is to verify log files, build files
Cloud Amour - Protection for OWAPS 10 VA issues and DDOS
Secret Manager - To manage DB passwords, API key secrets, Not to maintain passwords in config files, can audit, rotate passwords
Change Management
Business Continiuty Planinig - Dissaster recovery
Insident Managemnt - Alerts
Data Managemnet - Storage scalling, Archive ?, Data size ?
Other Cloud Services
Cloud scheduler - Fully managed scheduler for batch and big data jobs, pub/subs, http calls, Have to create app engine app
Cloud emulator - Develop gcp apps in local mahcine without connecting GCP. Can emulate Cloud Big table, Data store, File store, Pub Sub, spanner
Cloud DNS ( Global domain name system ) - Setup and website with domain name. Obtain domain name ( abc.com ) from domain registrar, web site hosting, Route request to abc.com then to my web site using abc.com to IP mapping, also support email rounting as well, provides public and private zones, public DNS zones expose to internet. private managed DNS to access from private subnets, Once create DNS zone, we can add record sets that manage mapping from address to ip address. DNS zone is container with records
Cloud pricing calculator - Estimation of cost of solution
Anthos - Transfer cloud load to other cloud services or on prem systems, has centerlized config management, provides multicluster management
Machine Learning - Provides prebuilt ML APIs, No need to become ML experties, Cloud Auto ML to build custom ML modules, AI platform for data scientiest , Bigquery ML
APIgee API management - API authentication authorization, rate limiting, API monitoring, cacheing, scalling, Integrate legacy services with new services
Identify platform - Customer identity and access managers. Authentication and Authorization for web and mobiles. Provides multiple authenticaiton options. Cloud IAM is different than this it is related to users, roles, etc.
Cloud events - Microserviecs react to changes in event status, loosely coupled, flexible ochestration, resilence, Asynchronous
Eventarc - Simplifying event triggering in GCP aligning with cloudevent.io specification
Observability and Telemetry - Measure the internal state using outputs using logs, metrices, traces
Service directory - Help microservices to find other services
No comments:
Post a Comment