Tuesday, December 21, 2021

Google Cloud Terminologies for 2022

Data Centers / Zones / Regions

Data Centers                         Zones                      Regions ( Geographical locations )

Low availability -----------------------------------> High Availability

High Latency to other regions    -----------------> Low latency

Limited Customer Groups ------------------------> Many customer groups

Sensitive Data / Government Regulations ------> Can store region specific sensitive data

                                                                              Support multiple Government regulations and audits


Google Compute Engine Features

1. Virtual Machines ( Cost optimized / Memory Optimized /  Compute Optimized/ GPU )

2. Persistence Disks. ( Attached Disks or Network Disks)

3. Load balancing with Auto Scaling

4. Network configurations


VM Families

Cost Optimized Family - E2, N2, N2D, N1 

Good for testing, small scale application deployments

Memory Optimized Family - M1, M2

Good for in memory DB, in memory analytics

Compute Optimized Family - C2

Good for Gaming 

GPU based Family

Good for machine learning


Machine Types

E2-Standard-2, 4,8,16,32

Memory , Disk, Network capabilities will be increased along with number of CPUs.


Virtual Machines IPs

Internal IP - VM can be accessible from internal network

External IP - VM can be accessible via this IP. Non static external IPs will be changed every restart

                   - Remove external IPs if no need to access it over internet

Static IP - Assigning constant external IP to VM


Static IPs in VMs

VPC Network -> External IPs -> Reserve Static IPs -> Select the scope -> Attached to VM -> Dynamic IP will  be automatically released.

Static IP with no VM attached will be charged higher rate hourly rather than it assigned time to VM

Static IP can be switched from one VM to another VM in same project

Static IP address need to be removed manually.


Simplifying VM startups

1. Bootstrapping with startup scripts 

    Create VM -> Automation -> Startup Script -> Past scripts in y options at required places.

2. Instant templates 

    Pre configured VM templates , bootstrap time will be high due to OS patches, software installations

3. Custom images 

    Can be created from vm instance, snapshot, persistence disk, file in cloud storage

    Can be shared across projects and multi regional

    Can be depreciated old images


Pre Requests Create VM Instance

1. Project

2. Billing Account

3. Enable Compute Engine APIs

Sole Tenant Nodes

Dedicated compute engine server to start VMs. Very expensive

VM Manager

Can manage 1000s of VM machines , software updates and etc.

Managed Instant Groups

Group of VM instances managed by single entity

Having same configurations. It support auto scaling / healing load balancing

Can be in Zonal or Regional

New versions of Apps can be deployed with Manged instance groups

Update Types - Rolling updates, Canary updates,  Rolling restart and Rolling replace considering Max surge and Max Unavailable 

State-full Manage Instances can be used for DB systems, auto healing, load balancing

Manage Instances can be used for REST API services, auto scaling, load balancing

Un-Managed VM

Different configurations. It doesn't support auto scaling / healing and load balancing

Cloud Load Balancing

Features : health checks, auto scaling, single any-cast IP

Communication protocols

Application Layer protocols - HTTP , HTTPS, SMTP

Transport Layer ( Ensure Network layer communication ) protocols - TCP , TLS, UDP

Network Layer ( bit and bytes ) protocols - IP

Cloud Load Balancing

HTTP/S Load balancer - Layer 4 Load balancing, Good for rest APIs. Support multi-regional

TCP Load balancer - Layer 4 Load balancing, Good for gaming systems, Support multi-regional, Can be                                    used as SSL proxy, High performance

UDP Load balancer - Layer 4 Load balancing, Good for gaming systems, Support single-regional, High                                     performance but not considering server availability


Create Load Balancer

Network services -> Create Load Balancer -> HTTP/S / TCP / UDP -> Create instant groups each backend services-> Routing rules -> Client integration with LB port and IP -> Health Check configuration-> Cloud CDN for static contents -> Cloud Armer for Security -> Select protocol .

SSL Termination ( Layer 7 to Layer 4 )

Client to Load balancer ( TLS / HTTPS )

Load balancer to internal services ( TCP / HTTP )

GCP Load balancer concepts

Backend ( Manage instance group )

Backend Service ( Group of backends )

URL mapping ( Routing requests )


Choosing Correct Load Balancer

For internal traffic - Internal Load Balancer ( internal HTTP/S , internal TCP/UDP load balancers ).                                           Support regional traffic

For external traffic - External Load Balancer ( external HTTP/S , SSL proxy for TCP traffic under SSL                                      offload, TCP proxy if global load balancing required ) support global traffic 

If back-end services require to trace exact client IP, then need to configure HTTP/S load balancers as pass through not a proxy

Global routing under premium network tier to route traffic to nearest region


Architectural Concerns


Resiliency : Provide the right functionality when one or more part of the system failed.

Use manage instance groups behind global load balancer

Use cloud monitoring 

Use cloud logging

Use health checks

Use hardened images

Availability: Applications available when users need them

- Create instance groups in multiple regions

- Implement Global HTTPS load balancer

- Implement health checks for instance groups and load balancers

- Enable live migration for VM instances using availability policy

  Live migration support VM with local SSD, It is not supported for GPU and preemptive instances.

Scalability : Handling in growth of users, traffic and data size proportional to resource

- Vertical Scaling - Increase CPU and Memory. It is costly solution

- Horizontal Scaling -Deploy more instances with load balancer. Most preferable solution.

- VM level vertical scaling - Increase CPU family, Number of CPU cores, Memory

- VM level horizontal scalling - Distribute VMs in single zone, multiple zone in single region, multiple zones accross multiple regions

Performance

Use correct machine family type

Use GPU for machine learning or math intensive workload 

TPU for matrix operations

Use hardened images to reduce startup time

Security

Enforce firewall rules to restric traffic

Use internal IPs as much as possible

Use sole-tenant nodes

Use hardened images


Cost 

Automatic discounts - enable by default

Discounts increases with usage - enable by default

Committed discounts 


GCP Billing

Pay for every second after the 1st minute

Once VM shutdown have to pay for the storage

Enable budget alters

 

Cost Efficiency

Enable VM level auto scaling

Understand sustained discounts

Committed discounts

Use preemptable VMs when possible


GPU ( Graphics Processing Units )

For math / video intensive processing

Have to use images with GPU libraries

Cannot be used with memory optimized , shared core machine types

 

Preemptive VMs

Short lived very cheep VMs

Can be terminated withing 24 hours. It will provide 30 second warning

Good for batch jobs


Gcloud and related tools

gcloud

Big query - bq

Cloud Big table - cbt

Cloud Storage - gsuit

Kubernetes - kubectl

 

Cloud Terminals

Cloud shell from web console : backed by 16 GB disk with small VM

Cloud SDK terminal

 

Gcloud command

gcloud --version

gcloud init

gcloud config list


Gcloud command structure

gcloud <group> <subgroup> <action>

Ex: gloud compute instances list

      gloud compute zones list

      gloud compute regions list

      gloud compute machine-type list

      gloud compute machine-type list --filter zone:asia-southeast2-b

      gloud compute instances create test-vm-1

      gloud compute instances describe test-vm-1

      gloud compute instances delete test-vm-1


Google Managed Services

IaaS - Infrastructure As a Service

Cloud provider responsible up to virtualization

User has to install OS, install OS patches, run times, Auto scaling , Availability

Ex : Compute Engine


PaaS - Platform As a Service ( CaaS, Serverless )

Cloud provider responsible for Auto scaling, Load balancing, OS patches, Availability

User has to configure the application.

Ex: App Engine

FaaS - Function As a Service ( Serverless )

Functions instead of Apps 

Cloud Functions

CaaS - Container As a Service

Containers instead of Apps

Ex: Google Kubernetes Service, Cloud Run

Serverless

It doesn't mean no servers

No visibility to infrastructure including which OS is running and etc.

It provides auto scaling

It provides discovery services, 

It provides load balancing

It provides zero downtime deployment facilities

User require to consider about function only.

Ex : Lambda


Containerizing

Create docker image of your application, that will has all dependencies and can be run in all possible environment.

Compare to VM, they don't have OS. That is a advantage

Docker containers are independent to each other

Container Orchestration

Kubernetes is a Container Orchestration Framework. 

It provides auto scaling

It provides discovery services, 

It provides load balancing

It provides zero downtime deployment facilities


App Engine

Simplest way to deploy application in GCP

Provides end to end application management

Support many languages.

Provides load balancing, auto scaling, OS patch updates, Health check monitoring, Application versioning

Provides two environments - language specific sandbox and docker containers

App Engine Component Hierarchy  - Project ( Application ) -> Multiple Services -> Multiple Versions ( Multiple Version can be co existed ) -> Multiple Instances

Compute Engine vs App Engine

Compute Engine - IaaS, User has to choose images, Network configuration, Availability, CPU, Memory and Etc

App Engine - PaaS , Serverless, Cannot update CPU, Memory and etc.

App Engine Feature Configurations

Standard Configuration

Flexible Configuration

App Engine Scaling

- Automatic   ( Based on cpu , throughput, concurrent request, Good for continuous workload )

- Basic   ( Scaling happen when request retrieved. Good for adhock workloads )

- Manual 


App Engine Commands

    cd default-service
    gcloud app deploy
    gcloud app services list
    gcloud app versions list
    gcloud app instances list
    gcloud app deploy --version=v2
    gcloud app versions list
    gcloud app browse
    gcloud app browse --version 20210215t072907
    gcloud app deploy --version=v3 --no-promote
    gcloud app browse --version v3
    gcloud app services set-traffic split=v3=.5,v2=.5
    gcloud app services set-traffic splits=v3=.5,v2=.5
    watch curl https://melodic-furnace-304906.uc.r.appspot.com/
    gcloud app services set-traffic --splits=v3=.5,v2=.5 --split-by=random
     
    cd ../my-first-service/
    gcloud app deploy
    gcloud app browse --service=my-first-service
     
    gcloud app services list
    gcloud app regions list


Application Deployment with App Engine

Create GCP project

Enable App engine admin APIs

Create App engine application, select region, select language and etc.

Go to project editor

Create below project hiererchy

my-first-project

|--> app.yaml

|--> Test.java


app.yaml

runtime : java11

entrypoint : java -jar test.jar

Test.java

Your application code


How to run

gcloud config set project stately-equinox-342317

gcloud app deploy

brows the target URL to view the result. It


App Engine versioning and traffic allocation

    gcloud app services list
    gcloud app versions list
    gcloud app instances list
    gcloud app deploy --version=v2
    gcloud app versions list


App Engine traffic distribution

gcloud app service set-traffic spliit=v3=0.5,v2=0.5 ( IP based spliiting )

gcloud app service set-traffic spliit=v3=0.5,v2=0.5 --spliit-by=random


Deploy App Engine App as a service

yaml file

|  runtime : java11

|  service : my-first-service

gcloud app deploy

Can be access via service URL generated


Delete App Engine Apps

Stop traffic first manually : gcloud app deploy --version=v3 --no-promote



Google Kuberneters

------------------

Container ochestration platform that provide auto scalling, service discoveries, helth checks and self healing

Cluster auto scalling

Cluster can have multiple VM nodes

Pod auto scalling

Pod is a instance of a microservice

Integrate with cloud logging and cloud monitoring

Create cluster in auto pilot mode to optimize the cost

Setup seperate node pool if we require new setup with new CPU level

Kubernetes ingress is to routing rules to route traffic

Kubernetes cluster has master cluster and worker node cluster

Pod can have multiple containers

A pod can have ephemeral ip 

For low cost solution - use premtibel vms. Use e2 than n1 machine types, commited discounts

For efficint solution - use horizontal pod auto scaller and cluster auto scaller

Pod status pending - due to resource problem

Pod status waiting - due to failure on image fetch


GKE cluster types

-----------------

zonal cluster ( single zone and multi zone - master cluster available in single zone in both )

regional cluster ( master and worker nodes availalble in multi regions )

private cluster - Used in VPC setups have internal IPs



Google Kubernetes deployments can be done by YML or by Commands

---------------------------------------------------------------


YML files can be obtained once we create manually it. Can be edit and later

deployment.yaml and service.yaml need to change accoridngly - kubectl apply -f deployment.yaml

GKE services are responsible to expose services to external users





gcloud config set project my-kubernetes-project-304910

gcloud container clusters get-credentials my-cluster --zone us-central1-c --project my-kubernetes-project-304910

kubectl create deployment hello-world-rest-api --image=in28min/hello-world-rest-api:0.0.1.RELEASE

kubectl get deployment

kubectl expose deployment hello-world-rest-api --type=LoadBalancer --port=8080

kubectl get services

kubectl get services --watch

curl 35.184.204.214:8080/hello-world

kubectl scale deployment hello-world-rest-api --replicas=3

gcloud container clusters resize my-cluster --node-pool default-pool --num-nodes=2 --zone=us-central1-c

kubectl autoscale deployment hello-world-rest-api --max=4 --cpu-percent=70

kubectl get hpa ( horizontal pod autoscalling details )

kubectl create configmap hello-world-config --from-literal=RDS_DB_NAME=todos  ( This is for microservice configurations )

kubectl get configmap

kubectl describe configmap hello-world-config

kubectl create secret generic hello-world-secrets-1 --from-literal=RDS_PASSWORD=dummytodos

kubectl get secret

kubectl describe secret hello-world-secrets-1

kubectl apply -f deployment.yaml

gcloud container node-pools list --zone=us-central1-c --cluster=my-cluster

kubectl get pods -o wide

 

kubectl set image deployment hello-world-rest-api hello-world-rest-api=in28min/hello-world-rest-api:0.0.2.RELEASE

kubectl get services

kubectl get replicasets ( show deployment versions and ensure number of pods to be run )

kubectl get pods

kubectl delete pod hello-world-rest-api-58dc9d7fcc-8pv7r

 

kubectl scale deployment hello-world-rest-api --replicas=1

kubectl get replicasets

gcloud projects list

 

kubectl delete service hello-world-rest-api

kubectl delete deployment hello-world-rest-api

gcloud container clusters delete my-cluster --zone us-central1-c



Ingress

-------

Provides load balancing and routing rules

SSL terminations

Only require one load balancer for many microservices with Ingress



Container Registry

------------------

Docker hub is public container registry

Google has its own container registry to store docker images



Docker file main content

------------------------

FROM    - base image

WORKDIR - working directory

COPY    - copy files from local machines to image

RUN

EXPOSE

CMD node index.js




Cloud Functions

---------------

1st gen cloud functions ( problems - cold start, small instance sizes, low timeouts , one request per one instance )

2nd gen cloud functions ( 1 hour timeout, larger instnce sizes, traffc split amoung services, support 90+ events, can handle many requests from one instance )


Run code when it trigger cloud event. 

Pay for the execution time, cpu memory used

Max Time bound -  1 hour

Support multiple languages

It uses Cloud build - Google CICD tool



Cloud Run

---------


Google serverless platform

Easiest way to deploy containarized application

Pay per use for cpu / memory 

gcloud run deploy service-name --image --revision

gcloud run revision list

gcloud run services update-traffic my-service --to-revisions=v2=10,v1=10



Anthos

--------

Run kebernetes clusters anywhere - aws , azure etc.


Google KMS

----------

Support encryption and decrption mechanisms

Create key ring - that holds multiple keys

Add encryption mechanim when we create VM instances



Cloud Storages

###############


Block Storage

-------------

Block storage is fast and efficient 

( Persistance disks - Good for durability - Performance scale with size, Local SSD - Good for high perforamnce, Keeps temporary data )

Can connect read only block storages with multiple servers

Can be access as DAS , NAS  - Direct Attached or Network Area Storage

Zonal Replication - only in one zone

Regional Replication - multiple zones



File Storage

------------

Large files, Support sharing ex: Video editing files



Object Storage

--------------

create buckets first

very inexpesive, can store very large objects

max object size is 5 TB 

can store unlimitted items

Choose storage classes

Support Object versioning

Ramp up request rate gradually

Cloud storage command - gsutil

Use cloud transfer service for more than 1 TB

Use physical transfer appliance if it takes more than serveral days

Data lock to avoid changes

Storage classes - standard, nearline, coldline, archive



Cloud IAM

##########


Authentication - Is this right user ?

Authrorization - Do they have access ?

Identities - GCP user/ users/ application / unauthenticated users


Roles - Set of permissions on specific resources

Policy - Assign roles to users

Basic roles ( Viewer / Editor / Owner ) not recommended to use in production

Use predifined roles or custom roles define by our self

Command line - gcloud

Policy troubleshooter to troubleshoot access issues

Use services accounts which generated by GCP to provide access. 

Service accounts has no passwords. Based on public / private RSA keys


Permission to cloud -> Cloud ( Use services accounts, Key managed by cloud )

Permission to OnPrem -> Cloud ( Use service account with user managed keys )

Credential Types -> Oauth 2( for elavated permissions ), OpenId ( service to service authentication ), 


Accesss Control List -> to avoid uniform access given by service accounts )

Can expose bucket content in to simple static web site. Name of the bucket should match with DNS





Cloud DB

##########


Keep 2 or more data centers

Synchornous replication ( master db, standby db, take snapshots from 2nd db )

11'9 data durability

4'9 data availability

Increase availability ( distribute data across zones, regions )

Recovery point object - Acceptable period of data loss

Recovery time object - Acceptable downtime

Reduce RTO, RPO is key

Hot standby ( Master to Standby failover to minimize RTO RPO )

Warm standby ( RPO is 1 min , RTO 15 min with minimum infrastructure to scale up)

Cold standby ( RPO is 1 min, RTO few hours, regular data snapshots with transaction logs to cloud storage )

Enhance DB performance ( By memory and cpu , costly distrubuted databases , create read replicas for read operations )

Strong consistancy - Replicate all nodes synchronously, impact on performance , good for banking transactions

Eventual consistance - Replicate with little lag, Good for social media


Relational DB for OLTP ( Based on row storage )

-----------------------------------------------

Cloud SQL  - Good for OLTP as it is good for transactions ( atomocity, consistancy ), Good for few TB

           - Support mysql, postgress databases

           - Can configure HA with synchronus replication

           - Use cloud sql proxy            


Cloud Spanner - To store petabyte level db


Relational DBfor OLAP ( Based on column based storage, support high compression )

---------------------------------------------------------------------------------

Cloud Big Query



Non Relations DB 

-----------------

Cloud Firestore - Serverless document DB with consistancy, Scalability, Support ACID transactions , for few terabytes



Cloud Bigtable - 

Good for more than 10 TB to Petabytes, NoSQL DB, Good for apps having more than 30 GB data per hour, Based on Hbase, Good for IOT streeming, Automatically shared data. reads and writes amoung cluster nodes, support HDD and SSD, create multiple clusters with replications for high availablity.

Cloud datastore can create multiple indexes but Cloud bigtable support only single index per table.

In memory DB 

----------------------------

Memorystore ( Caching, sessions management )



Cloud VPC

-----------

Create all resources under VPC.

Can attach global resources. 



VPC Subnet

-----------


Control the access to internal services. Public subnet and private subnets


Shared VPC

----------

To work multiple projects together


VPC Peering

-----------

Connect VPC to differnt network of another organization

All communication happend only internally


CIDR Blocks

------------

Subnet IP ranges, A CIDR block is a collection of IP addresses that share the same network prefix and number of bits.



192.168.1.0/24 ( Availble IPs : 2 pow (32-24) = 256 )



Firewalls

---------


If input traffic is allowed, then output traffic will be automatically allowed

Default SSH port 22 allowed

All all internal traffices

Default RDP port 3389 allowed

Default ICMP allowed

Denied all ingress

Allowed all egress

Create VM instances with tags and attach tags to Firewall rules

Allow traffic from load balancer only

Remove 0.0.0.0/0 source IP

Allow helth checks from load balancer

Set priority on rules if requried



Cloud Operations

----------------


Cloud Monitoring

----------------

Cloud Monitoring with workspaces

Can monitor AWS accounts as well

Can create alerting policies

Install cloud monitoring agents


Cloud Loggings

--------------

Realtime loging , analyze , store massive data

Log Explorer, Log Dashboard, Log Metrics, Log Router

Audit Logs ( Admin access logs, Data access logs, System Event Logs, Policy Denied Logs )

Cloud debugger is depreciated



Cloud Profiler

--------------

Identify performance bottlenecks

Profiling agent

Profile interface

Profile the source code


Cloud Trace

-----------

Trace requests



Error Reporting

---------------

Can be send to cloud logging

Can be send by calling Reporting API

Support realtime error reporting


Stackdriver is now comes ( Cloud monitoring, cloud reporting, Cloud loging, Cloud Trace )





GCP Resource

------------

Organization -> Folder -> Project -> Resources

Make sure better naming covention


Billing account

---------------

Mandatory

Can associate with multiple projects

Can have multiple billing accounts in the organization

2 Types - self served, invoiced - for large places

Can configure budget and alerts


IAM best practices

------------------

Principle of least priviladge

Super Admin can do any thing

Use Google Workspace to manage users. Link it to GCP

Use GCP with your identify provider

Coporate Directory Fedaration - integrate external identify provider with Federate Cloud Identity or Google workspace

IAM members ( Google accounts, Service App accounts )


Enable SSO ( Users redirect to external identity provider )

When Users authenticated, SAML is sent to GCP

Google Access Control List ( give permenent access to subset of objects )

Google IAM ( Ex : Give permenent access to entire bucket )

SignIN URL ( Ex: Give time limited access to entire bucket )

Roles and Groups ( Ex: Give certain access to development team )

Google Cloud Directory Sync to sync Active directories





VM SSH

-------

2 Options to create keys ( Create manually and configured , OS managed )

2 ways to connect ( Use SSH button or use gcloud compute ssh )


Trobleshoot VM Startups

-----------------------

Check quota errors

Boot disk failures

Serial port errors

VMs cannot be move between zones/regions which are having local SSD or terminated status

Create snapshots, copy persistance disks to required zones




Pub/Sub ( has topics and subscribers )

--------------------------------------

Aysnchronous Communication via topics

Can handle up to 1BN messages per day

Auto scalling

Good for streeming, analytics

Support both push and pull

Pull messages - subscripber pull messages when ready

Push message - subscriber provides web cook and publisher send data


Data Flow

---------

Can be use for streeming and batch processing

Use cloud data flow to deduplication of messages

pub/sub -> Dataflow -> Big Query ( Streeming )

pub/sub -> Dataflow -> Cloud Storage ( Files )

Cloud Storage -> Dataflow -> Bigtable/Spanner/Datastore/Big Query ( Batch processing )



Hybrid Cloud

------------

Cloud  VPN ( Can connect VPN to GCP over internet, Use IPSec VPN tunnel, Use Internet Key Exchange )

VPN gateways are regional

HA VPN ( Provides only two IP tunnel , support dynamic routing, need dynamic router )

Classic VPN ( Provides only one IP tunnel , support static routing with manual work )

Make sure failover mechanism ( Dedicated interconnect as primary and VPN or Direct peering option )


Cloud Interconnect

------------------

High speed low latency private network connect google services to private network

Dedicated interconnect - for high bandwith

Partner interconnect - for low bandwith



Datawarehouse - BigQuery

------------------------

BigQuery is relational DB

Import and Exporting Data

Support streeming data

Automatically expire data

Can query data from external (Cloud storage, Cloud SQL, BigTable,Google Drive ) without storring

Access Data via Cloud console, bq command, BigQuery REST API, HBase API )

Big Query are very expersive. So make sure to do a calculate using pricing calculator and dry run

Cost will be estimated based on amount of data scanned. So table partioning and clustering ( grouped query ) important

Make sure configure expire data sets, expire partions Importing batch of data is free. Streeiming is expensive. Can use Cloud dataflow and dataproc import after processing Can be use federated queries to import external data BigQuery is very fast for very complex queries Cloud Dataproc -------------- It is data analytic service Managed spark and hadoop service Data Life cycles ---------------- Inges -> Store -> Process and Analysis -> Export Visualize Cloud Data Loss Prevention -------------------------- Control sensitive data Inges ----- Streeming with pub/sub Batch by Storage Transfer Service, BigQuery Transfer, Transfer Applience, gsuit DB migrations Store ------ Cloud Storage -> Unstructured Object Store Cloud SQL -> Manged MySQL and other relational DB Cloud Spanner -> Horizontally scalable relational DB. Good for transactions, can scale globaly. Cloud Filestore -> No SQL DB Cloud BigTable -> No SQL DB for massive data operations Cloud BigQuery -> Data analytics Process and Analysis -------------------- Dataprep -> clean and transform for ML work Dataflow -> ETL pipeline Dataproc -> Complex processing using spark and hadoop Explorer and Visualize ---------------------- Cloud BigQuery ML Prebuilt ( Vision API, Speech to text, NLP API, Video Inteligence API ) ML Custom Built ( Based on Tensflow ) Cloud Datalab - > Webbased explorer, can use jupiter notebooks to run Cloud Datastudio -> Dashboard Cloud IOT --------- Cloud IOT Core ( Register , authentication, authorization ) Use pub/sub Data Lake --------- Since Big Data solutions are very complex, datalake is a centralized repository designed to store, process, and secure large amounts of structured, semistructured, and unstructured data Provides flexibility with cost effective



Caching Usecases

1. Backend data is not changing frequently

2. User sessions

memory sore

============

Fully managed in memory data store

Can setup monitoring

Support both redis and memcache

Memcache for pure caching ( reference data, db queries, session stores )

Redis for low latency, high availability and persistance (  Good for games )

Can be access from google cloud, app engine, google kubernetes, cloud functions

Make sure to create set of nodes incase of failures



Memcache in App Engine

======================

Legacy in memory cache. Good for DB query cache, Cache session data, User preferences

No persistance

2 level of cache proviers ( Free shared memcache, dedicated memcache )

Use hashkey to check cache availability



Cloud CDN

==========

Serve global content with low latency

Google uses edge servers with https load balancing via proxy

First laod content from backend

TTL to configure cache expiry



Process Evolution

-----------------

Agile -> Devops -> Site Reliability Engineer


Agile ( Many itegrations, Focus on software rather than document, customer involvents, Respond to change )

cloud Source Repositories ( Fully featured or Private GIT repository )

Container Registiory

Jenkins

Cloud Build to build jar, docker files

Spinnarker - multicloud continuos delivery


Infrastructure as a code

--------------------------

Create infrastructure using a code, reduce mistakes when creating similar envirnment for QA, 

Use terraform

Use Google cloud deployment manager ( Create VPC, Subnets, load balancers using a script )


Configuration managenemt

------------------------

Install right software and tools

Use chef, puppet, ansible



Cloud Marketplace ( Cloud launcher )

------------------------------------

Centrol repo of deployable apps . Ex; wordpress


Site Reliability Engineering

----------------------------------

Availability , Latency, Performance, Efficiency, Change Management, Monitoring, Emergency Response, Capacity Planing

Service level objects ( Internal Objectives ). 99.999 availability, 99.99999% durability

Service level aggrements ( Extenral Objects for customers ) - signed contract 


Site Reliability Engineering Best Practices

----------------------------------------------------

Handle excess load by load Shedding (  API limits for different customer bases )

Avoid cascade failrues ( Circute breakes ) 

Pentration testing

Load testing

Resilience testing ( Stress testing with internal failures )

Enable test automation

Frequent and small feature releases

Reduce cost of failures


Release Management in Google Cloud

Some deployment needs 0 downtime

Only one version at live in a given time

One instance expose to production traffic before go live

Small incremental changes

Automate release management


Deployment Approaches 

Recreate - Down V1 and Deploy V2

Canary - Deploy few instances of V2, test and deploy rest of V2 instances, Make sure data are backword compatible in case of failrues

A/B testing - Check users are like new feature or not

Rolling - First deploy 5% of instances, increasing gradually in given time window

Rolling with Batches - First deploy new instance v2, then rolling gradulally 

Blue Green - Create parellel envirnemnt V2. Once testing done, move traffic to V2, no downtime. This enable shadow testing

Kubernetes : rolling update is default deployment type with maxSerge, maxUnavailble. No downtime as recreate approach

Manage Instane Group - gcloud commands for rolling releases. Can mention rolling percentages

App Engine Commands

Shift all trafic to V2 : gcloud app deploy

Deploy without shifting traffic :  --no-promote

Shiffting traffice V2 at once : gcloud app services set traffic s1 --spliits V2=1

A/B Testing  :  gcloud app services set traffic s1 --spliits V2=0.5,V1=0.5


Complience and Regulations

ISO 27001 certified for information security

ISO 27017 certified for cloud serviecs

ISO 27018 certified for cloud privacy for PII data

ISO 27701 certified for global privacy

PCI DSS certified for payment card indutry with data security standards : Ex: payment solutions

 -- Allow outboud traffic to external payment systems, Only Compute engine and GKE recommended as App engine and Cloud function are not allow egress traffic

-- Use HTTPS load balancer with signed certiificates

-- Review Access Rules, Resource inventry over the time

SOC1 and SOC2- Audit standards

COPPA for children's privacy - Ex : children's web sites

HIPAA for health insurance Ex: health solutions for UK / USA


Cloud Migrations

Rehosting - Simply take the application from data center and deploy cloud

Replatforming - Make few adjustments like contrainarizing

Repurchace - Complete cloud native migration

Refactoring - Costly


Cloud Architecture Decisions

Reduce cost ( licence cost, computing cost, storage cost, ingress and egress data cost, personal account cost, penelties )

Use mannage servcies to reduce cost ( auto scalling , availability handles itself )

When project starts, Identify business requiments and define KPI

Define Technical Requirements 

Functional Requirments

private networks, flexible schema , Large volumne of data stores, Container ochestration 

Non Functional  requirements

Avilablilty ( geographycal distrubution, manage instance groups with load balancing, multi master cluster with use of regional clusters, cluster auto scalling, enanle live migration for compute engines )

Use manage servies ( App engine, Cloud functions, Cloud storages, Cloud Filestore, Cloud Datastore, BigQuery , live resizing with persistance disk, maintain big table clusters , multi regions for cloud datastore, Use HA for Cloud SQL, Use premium network tier, Use hybrid network for availability )

Scalbility ( High scalable compute instance groups, Use pod and cluster autoscalling, Make sure to use resources that can scale fast , Cloud SQL not support horizontal scale , persistance disks can be scale horizontal and verticle, Cloud storage, App engine, Cloud functions, pub/sub, Big query, Cloud datastore are auto scalling severless features, Big table, Cloud spanner, Cloud SQL, data proc are not serverless )

Security - CIA ( Confidentiality : only right peole has right access ,  Integrity : IAM best practices, encryption, hash verification, digital signatures Availability : Availability at right time to right people by firewalls, redundency, failovers, protect from ddos attacks, use private networks with firewall rules, use private IPs )

 

Cloud KMS - Can generate private and public keys and digital signatures to verify integrity,  This is to verify log files, build files

Cloud Amour - Protection for OWAPS 10 VA issues and DDOS

Secret Manager - To manage DB passwords, API key secrets, Not to maintain passwords in config files, can audit, rotate passwords

Change Management

Business Continiuty Planinig - Dissaster recovery

Insident Managemnt - Alerts

Data Managemnet - Storage scalling, Archive ?, Data size ?


Other Cloud Services

Cloud scheduler - Fully managed scheduler for batch and big data jobs, pub/subs, http calls,  Have to create app engine app 

Cloud emulator - Develop gcp apps in local mahcine without connecting GCP. Can emulate Cloud Big table, Data store, File store, Pub Sub, spanner

Cloud DNS ( Global domain name system ) - Setup and website with domain name. Obtain domain name ( abc.com ) from domain registrar, web site hosting, Route request to abc.com then to my web site using abc.com to IP mapping, also support email rounting as well, provides public and private zones, public DNS zones expose to internet. private managed DNS to access from private subnets, Once create DNS zone, we can add record sets that manage mapping from address to ip address. DNS zone is container with records

Cloud pricing calculator - Estimation of cost of solution

Anthos - Transfer cloud load to other cloud services or on prem systems, has centerlized config management, provides multicluster management

Machine Learning - Provides prebuilt ML APIs, No need to become ML experties, Cloud Auto ML to build custom ML modules, AI platform for data scientiest , Bigquery ML

APIgee API management - API authentication authorization, rate limiting, API monitoring, cacheing, scalling, Integrate legacy services with new services

Identify platform - Customer identity and access managers. Authentication and Authorization for web and mobiles. Provides multiple authenticaiton options. Cloud IAM is different than this it is related to users, roles, etc.

Cloud events - Microserviecs react to changes in event status, loosely coupled, flexible ochestration, resilence, Asynchronous 

Eventarc - Simplifying event triggering in GCP aligning with cloudevent.io specification

Observability and Telemetry - Measure the internal state using outputs using logs, metrices, traces

Service directory - Help microservices to find other services