My Learning Path

Thursday, June 1, 2017

Dynamo DB

DynamoDB:

DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single digit millisecond latency at any scale. It is a fully managed database and supports both document and key-value data models. Its flexible data model and reliable performance make it a great fit for mobile, web, gaming, ad-Tech, IoT and many other applications.
Stored on SSD storage.
Spread across three geographically distinct data centers.
If any data is written to one of the table in DynamoDB, the data is then written to other two locations. There are two different data consistency models:

Eventual Consistent Reads(Default):
Consistency across all copies of data is usually reached within a second. Repeating a read after a short period of time should return the updated data.(Best read performance).
Strongly Consistent Reads:
A strongly consistent read returns a result that reflects all writes that received a successful response prior to the read.

The Basics of DynamoDB:

Tables
Items(Think a row of data in a table)
Attributes(Think of a column of data in a table).
DynamoDB allows nested attributes upto a level of 35.

Pricing :

Provisioned throughput capacity:
- write throughput of 0.0065$ /hr/10 units.
- read throughput of 0.0065$ /hr/50 units.
- Storage cost is 0.25$ per GB per month.
First 25GB stored/month is free.
After that 0.25GB/GB/Month.

For Ex:- Our application needs 1M reads and writes per day. Dividing 1M (1,000,000)
-- 1,000,000/(24 hrs)/60(Mins)/60 Secs -> 11.6 reads per second
Dynamo DB can handle 1 write capacity per second, so we ll need 12 write capacities through out the day
Per Unit write is 0.00065$ -- so per day will cost --> 0.00065$*12*24 hrs = 0.1872 per day
Per Unit read is 0.0065/50 --> so per day will cost --> 0.0065/50*12*24 = 0.0374
If the total storage for out application is 30GB, since 25GB is free, we have to pay for the extra 5GB which is 0.25*5 = 1.25$ per month.
The total is 1.25$ + (0.1872$+0.0374$)*30.

Wednesday, May 31, 2017

S3: Simple Storage Service
    S3 provides developers and IT teams with secure, durable, highly-scalable object storage. S3 is easy to use with simple webservices interface to store and retrieve any amount of information from anywhere on the web.
    It is object based storage(flat files).
    It is spread across multiple devices and facilities.
    Files can be from 0 bytes to 5TB.
    There is unlimited storage.
    Files are stored in buckets.
    S3 is a universal namespace, i.e., names must be unique throughout the world.
    The S3 DNS is something like
     https://S3-region-name.amazonaws.com/name_of_the_bucket
    When you upload a file to S3 we always get a HTTP 200 code if the upload was successful.

    Data Consistency Model for S3:
    If we are writing a new object to S3 it will be available immediately, but if we are editing the record or delete it, it will take some time for the changes to take place.
    The reason behind this is AWS doesnt let you see any corrupt data whatsoever.
    Read after write consistency for PUTS of new objects. -- Can read immediately after writing
    Eventual consistency of overwrite PUTS and DELETES(can take some time to propagate.)
--Updates and deletes can take some time
    S3 is a simple Key-Value Store.
    S3 is object based. Objects consists of the following:

Key(This is simply the name of the object)
Value(This is simply the data and is made up of a sequence of bytes)
Version ID(Important for versioning)
Metadata(Data about the data that is being stored)

        Subresources:
            Access Control Lists.
            Torrent.

Built for 99.99% availability for the S3 platform.
Amazon Guarantees 99.99% availability.
Amazon Guarantees 99.999999999% durability for S3 information.
(9*11)
Tiered storage available.
Lifecycle Management.
Versioning.
Encryption.
Secure the data in couple of different ways using Access Control Lists and Bucket Policies.

S3 Storage Classes/Tiers:

S3: 99.99% availability, 9*11 durability, stored across multiple devices in multiple facilities and is designed to sustain the loss of 2 facilities concurrently.
S3: IA(Infrequently Accessed), For data that is accessed less frequently, but requires rapid access when needed. Lower fees than S3 but charged for retrieval.
Reduced Redundancy Storage: 99.99% durability and 99.99% availability of objects over a given year.
Glacier: Very cheap but used for archival only. It takes 3-4 hours to restore from glacier.

Glacier:

Extremely Low.
Used only for data archieval.
$0.01 per GB per month.
Retrieval times from 3-5 hours.

S3- Charges:

Storage
Requests
Storage Management Pricing
Data Transfer Pricing
Transfer Acceleration (fast easy and secure transfer of files over long distances b/n end users and an S3 bucket. It takes advantage of CloudFront's globally distributed edge locations. As the data arrives at the edge location, data is routed to Amazon S3 over an optimized network path)

S3 FAQs IMP

S3 falls under the storage services of the AWS.
This is based on object based storage and can be used to store files, docs, multimedia, audio files, video files etc., and not to do software installations.

As soon as we login to the console we can see S3 under storage.
The first step under the S3 is we need to create a bucket.

Under the bucket created we have some more options available under the properties:

Versioning :

To see the different versions of the same object in the bucket.

Static Web Hosting :

Allows to host a static web site without any server side technologies (plain html).
This has tremendous advantages.
No worries about load balancing, auto scaling, virtual machines.
It scales automatically and extremely low cost.

Logging :

We can set up log reports.

Under the Advanced Settings:

Tags :

Using tags for cost controls.

Cross Region Replication :

Used for replicating the objects in different regions.

Events :

Under events we can have specific notifications when specific events occur in your bucket.
For Ex: Someone uploaded a file and we want to invoke a lambda function and that lambda function is going to convert that into a thumb nail and save the output of that in another bucket.

Under the Lifecycle :

We can have settings which will say if the object is not used by the user in certain number of days, move it to less frequently used tier or if it is more than 120 days move it to Glacier and things like that can be configured here.
This specifies the different storage tiers.

Under Permissions :

ACL (Access Control Lists) :

We can specify the access control lists. By default all the buckets created are private/inaccessible.

Under Management :

Analytics :

We can do analytic s for different storage classes.

Metrics and Inventory:

Uploading some data in S3:

The first time we upload some data in S3, the permissions are not set to it(it is in private mode so no one can actually access it)
The format for the object will be something like this :
https://s3.amazonaws.com/bucketName/objectName.

As soon as you upload an object and try to access it they will be an error message saying the access is denied.
If we want the object to be accessed by someone, we need to either specify the email id/everyone/aws verified user.

Under the object permission there are two things
1. Giving the user permission to read/write the object.
2. Giving the user permission to read/write the object permissions(giving the authority to actually autorize the others users).

Once the versioning is enabled on S3, we cannot remove versioning but we can only disable it.
To check the versions of an object, we can view that on the drop down beside the file.

From the architecture point of view, we should not have large files with versioning checked in with out any archieval planning after certain versions. As each file is in its original size.

When we delete a file in the bucket it disappears from the version but it is essentially hiding the versioning and not actually deleting the file. To restore the file back after delete, just go back to show versions and delete the 'Delete Marker', which will delete the invisible condition set on the object.
It is a great back up tool.

An additional layer of security can be added to the S3 buckets by creating MFA delete capability which can avoid accidental deletion of S3 objects.

Creating a static website using amazon S3:

Create a bucket in S3, upload objects into the bucket.
Under the website hosting section of the bucket we can enable website with an index.html and error.html as the landing and error pages for our static website.

Cross Origin Resource Sharing (CORS):

This is a way of referring code in one S3 bucket using JavaScript in another S3 bucket.
This allows all the buckets talk to each other.
Under the CORS section for the sample just mention the URL of the bucket which it should allow to be accessed from.

Polly:

Polly is text to speech recognition service.

S3 - Versioning Lab

Create a bucket and while creating it enable versioning.
Versioning actually stores all the objects/versions seperately.
Once Versioning is enabled it cannot be removed but only be disabled.
If versioning has to be removed, we have to create a new bucket and transfer the objects to that bucket.
If we delete a version we cannot restore it back, but if we delete an object itself we can restore it back.

Cross Region Replication

Versioning must be enabled on both the source and the destination buckets.
Regions must be unique.
Files in an existing bucket are not replicated automatically. All the subsequent updated files will be replicated automatically.
You cannot replicate to multiple buckets or use daisy chaining.
Delete markers are replicated.
Deleting individual versions or delete markets will not be replicated.

Lifecycle Mangement, IA S3 and Galcier - Hands On

Lifecycle Management:

This basically helps in maintaining the life cycle of the object by writing a rule for the object under management section, by adding a life cycle rule.

Life cycle rule: These rules help in manage the life cycle cost by transitioning from S3, after certain time into IA S3 and then to Glacier archiving the files which are least frequently used there by reducing the cost of the storage drastically.

Under the life cycle transition rules we can configure either on the current version or on the previous versions.

Object Created -- 30days later --> Transitioned to IA -- 30days later --> Transitioned to Glacier

-- 425 days later --> Expires

If the number of days in the glacier is less than 90 days and we want to expire it before completing 90 days in the glacier, we will be required to give an extra authorization saying we want to delete the object even though we are charged for 90 days(which logically doesn't make sense).
Glacier is designed to store an object at least for a minimum of 90 days.

Min 30 days after creation to IA S3, 60 days to Glacier and 61 days to Expire

Wednesday, May 24, 2017

AWS Cloud Formation

CloudFormation allows us to quickly and easily deploy your infrastructure resources and applications on AWS.
We can either 'Create New Stack' ( if we do not have any stacks ) or 'Launch CloudFormer' if a Stack is already available.
The use of services such as CloudFormation, ElasticBealStalk and AutoScaling are free, we are actually paying for the resources these services create

To get the attribute name we use the function Fn::GetAtt for the value in the outputs tag will return the name of the attribute specified.
If there is an error in the script CloudFormation will automatically roll back all the resources that were created.
Rollback is enabled by default.

Amazon SWF Service: Simple WorkFlow Service

SWF is a webservice that makes it easy to coordinate work across distributed application components.
SWF enables applications for a wide range of use cases, including media processing, web-applications back-ends, business process workflows, and analytics pipelines, to be designed as a coordination of tasks.
Tasks represent invocations of various processing steps in an application which can be performed by executable code, web serice calls, human actions, and scripts.

SWF Workers & Deciders:

The Workers and Deciders can run on Cloud Infrastructure, such as Amazon EC2, or on machines behind the firewalls. SWF brokers the interaction between the workers and deciders.
It allows the deciders to get consistent views into the progress of tasks and to initiate new tasks in an ongoing manner.
At the same time, Amazon SWF stores tasks, assign them to workers when they are ready, and monitors their progress. It ensures that the task is assigned only once and is never duplicated. Since SWF maintains the applications state durably, workers and deciders dont have to keep track of execution state. They can run independantly and scale quickly.

    SQS and SWF-- SWF is assinged only once and SQS can have duplicate messages.

    SWF Domain:
    The workflow and activity types and the workflow execution itself are all scoped to a domain. Domains isolate a set of types, executions and task lists from others within the same account.

    We can register a domain using amazon console or by RegisterDomain action in the Amazon SWF API.

    SQS - By default 12 hours and SWF - 1 year and the value is always measured in seconds.

    SWF -- Task Oriented API, Message only once never duplicated, Keeps track of all the events and tasks, Human interaction
    SQS -- Message Oriented API, Duplicate messages need to be handled, Application level tracking, No Human Interaction

SNS : Simple Notification Service;

SNS : Simple Notification Service;

It is a web service that makes it easy to set up, operate and send notifications from the cloud.
It provides developers with a highly scalable, flexible and cost effective capability to publish messages from an application and immediately deliver them to subscribers or other applications.

SNS follows publish-subscribe messaging paradigm, with notifications being delivered to clients using a "push" mechanism that eliminates the need to periodically check or poll for the new information and updates.

With simple APIs requiring minimal up-front development effort, no maintainance, SNS gives developers an easy mechanism to incorporate a powerfull notification system with their applications.
SNS -- PUSH ; SQS -- PULL/POLL;

SNS can deliver notifications by SMS Text Messages or email, to SQS queues or to an HTTP end point.
To prevent the messages from getting lost the SNS messages are stored across multiple availability zones.
SNS allows you to group multiple receiptients using topics. Topic is an access point for allowing receiptients to dynamically subscribing for identical copies of the same notification.

One Topic -- Multiple end points
        -- IOS devices
        -- Android devices
        -- SMS Receipients etc.,
        When we publish once to a topic SNS will deliver appropriately formatted copies of your message to each subscriber.

SNS Benefits:
    -- Instantaneous, push-based delivery (no polling)
    -- Simple APIs and easy integration with applications
    -- Flexible message delivery over multiple tranport protocols.
    -- Inexpensive, pay-as-you-go model with no upfront costs.
    -- Web-based AWS Management Console offers simplicity of a point and click interface.

SNS Vs. SQS
-- Both messaging services.
-- SNS -- Push
-- SQS -- Poll/Pulls

SNS Pricing
-- $.50 for 1 million requests for SNS.
-- $.06 per 100,000 notification deliveries over HTTP.
-- $.75 for 100 notifications deliveries over SMS.
-- $2.00 for 100,000 notification deliveries over Email.

SQS - Simple Queue Service

SQS : Simple Queue Service

It s web service that gives you access to a message queue that can be used to store messages while waiting for a computer to process them.
It is a distributed queue system that enables web service applications to quickly and reliably queue messages that one component in the applications generates to be consumed by another component. A queue is a temporary repository for messages that are awaiting processing.
Decouple the components of an application so they can run independently, amazon SQS easing the message management between components. Any component of a distributed application can store messages in a fail-safe queue.
Messages can contain up to 256KB of text in any format. Any component can later retrieve messages programmatically using the Amazon SQS API.

    Issue SQS resolves: Producer is producing work faster than the consumer can process it, or if the producer and consumer are intermittently connected to the N/W.

    With SQS service ensures delivery of a message atleast once and supports multiple readers and writers interacting with the same queue.

    A single queue can be used by simultaneously by many distributed application components, with no need for the components to coordinate with each other to share the queue.

    It is engineered to be always available and deliver messages.

    SQS TradeOff : SQS doesnt guarantee first in first out delivery of messages. For many distributed applications, each message can stand on its own and wait as long as the messages are delivered, the order is not important.

    If the system requires order to be preserved, we can place sequencing information in each message, so that we can reorder the messages when the queue returns them.

    The visibility time out clock starts only when the application server has picked up the service atleast once. If the server goes offline the visibility timeout also expires and the message will still be available for another application server.

    Only when the message is deleted from the queue thats when the message is complete.

    If the messages have gone close to the threshold, it will make more application servers spin to complete the messages. SQS with Autoscaling is VERY POWERFUL.

    Exam Q's :

No First In First Out
12 hours visibility timeout by default.
SQS is engineered to provide the message at least once. We need to make sure as developer so that multiple messages will not give errors/inconsistencies.
256KB message now available
Billed at 64KB chunks.
A 256KB is 4 chunks( 4* 64KB).

Pricing:

First 1 million SQS requests per month are free.
$0.50 per million SQS requests per month there after ($0.00000050 per SQS request).
A single request can have 1 to 10 messages upto a maximum total payload of 256KB.
Even a single API call for 256KB is billed 4 times per 64KB.

Tuesday, May 23, 2017

AWS Databases

Services Available under AWS Databases:

RDS
DynamoDB
ElastiCache
Redshift
DMS

RDS Types Available :

SQL Server
Oracle
MySQL Server
PostgreSQL
Aurora
MariaDB

NRDB :