S3 provides developers and IT teams with secure, durable, highly-scalable object storage. S3 is easy to use with simple webservices interface to store and retrieve any amount of information from anywhere on the web.
It is object based storage(flat files).
It is spread across multiple devices and facilities.
Files can be from 0 bytes to 5TB.
There is unlimited storage.
Files are stored in buckets.
S3 is a universal namespace, i.e., names must be unique throughout the world.
The S3 DNS is something like
https://S3-region-name.amazonaws.com/name_of_the_bucket
When you upload a file to S3 we always get a HTTP 200 code if the upload was successful.
Data Consistency Model for S3:
If we are writing a new object to S3 it will be available immediately, but if we are editing the record or delete it, it will take some time for the changes to take place.
The reason behind this is AWS doesnt let you see any corrupt data whatsoever.
Read after write consistency for PUTS of new objects. -- Can read immediately after writing
Eventual consistency of overwrite PUTS and DELETES(can take some time to propagate.)
--Updates and deletes can take some time
S3 is a simple Key-Value Store.
S3 is object based. Objects consists of the following:
- Key(This is simply the name of the object)
- Value(This is simply the data and is made up of a sequence of bytes)
- Version ID(Important for versioning)
- Metadata(Data about the data that is being stored)
Access Control Lists.
Torrent.
- Built for 99.99% availability for the S3 platform.
- Amazon Guarantees 99.99% availability.
- Amazon Guarantees 99.999999999% durability for S3 information.
(9*11) - Tiered storage available.
- Lifecycle Management.
- Versioning.
- Encryption.
- Secure the data in couple of different ways using Access Control Lists and Bucket Policies.
S3 Storage Classes/Tiers:
- S3: 99.99% availability, 9*11 durability, stored across multiple devices in multiple facilities and is designed to sustain the loss of 2 facilities concurrently.
- S3: IA(Infrequently Accessed), For data that is accessed less frequently, but requires rapid access when needed. Lower fees than S3 but charged for retrieval.
- Reduced Redundancy Storage: 99.99% durability and 99.99% availability of objects over a given year.
- Glacier: Very cheap but used for archival only. It takes 3-4 hours to restore from glacier.
- Extremely Low.
- Used only for data archieval.
- $0.01 per GB per month.
- Retrieval times from 3-5 hours.
S3- Charges:
- Storage
- Requests
- Storage Management Pricing
- Data Transfer Pricing
- Transfer Acceleration (fast easy and secure transfer of files over long distances b/n end users and an S3 bucket. It takes advantage of CloudFront's globally distributed edge locations. As the data arrives at the edge location, data is routed to Amazon S3 over an optimized network path)
S3 FAQs IMP
This is based on object based storage and can be used to store files, docs, multimedia, audio files, video files etc., and not to do software installations.
As soon as we login to the console we can see S3 under storage.
The first step under the S3 is we need to create a bucket.
Under the bucket created we have some more options available under the properties:
Versioning :
To see the different versions of the same object in the bucket.Static Web Hosting :
Allows to host a static web site without any server side technologies (plain html).This has tremendous advantages.
No worries about load balancing, auto scaling, virtual machines.
It scales automatically and extremely low cost.
Logging :
We can set up log reports.Under the Advanced Settings:
Tags :
Using tags for cost controls.Cross Region Replication :
Used for replicating the objects in different regions.Events :
Under events we can have specific notifications when specific events occur in your bucket.For Ex: Someone uploaded a file and we want to invoke a lambda function and that lambda function is going to convert that into a thumb nail and save the output of that in another bucket.
Under the Lifecycle :
We can have settings which will say if the object is not used by the user in certain number of days, move it to less frequently used tier or if it is more than 120 days move it to Glacier and things like that can be configured here.
This specifies the different storage tiers.
Under Permissions :
ACL (Access Control Lists) :
We can specify the access control lists. By default all the buckets created are private/inaccessible.Under Management :
Analytics :
We can do analytic s for different storage classes.Metrics and Inventory:
Uploading some data in S3:
The first time we upload some data in S3, the permissions are not set to it(it is in private mode so no one can actually access it)The format for the object will be something like this :
https://s3.amazonaws.com/bucketName/objectName.
As soon as you upload an object and try to access it they will be an error message saying the access is denied.
If we want the object to be accessed by someone, we need to either specify the email id/everyone/aws verified user.
Under the object permission there are two things
1. Giving the user permission to read/write the object.
2. Giving the user permission to read/write the object permissions(giving the authority to actually autorize the others users).
Once the versioning is enabled on S3, we cannot remove versioning but we can only disable it.
To check the versions of an object, we can view that on the drop down beside the file.
From the architecture point of view, we should not have large files with versioning checked in with out any archieval planning after certain versions. As each file is in its original size.
When we delete a file in the bucket it disappears from the version but it is essentially hiding the versioning and not actually deleting the file. To restore the file back after delete, just go back to show versions and delete the 'Delete Marker', which will delete the invisible condition set on the object.
It is a great back up tool.
An additional layer of security can be added to the S3 buckets by creating MFA delete capability which can avoid accidental deletion of S3 objects.
Creating a static website using amazon S3:
- Create a bucket in S3, upload objects into the bucket.
- Under the website hosting section of the bucket we can enable website with an index.html and error.html as the landing and error pages for our static website.
- This is a way of referring code in one S3 bucket using JavaScript in another S3 bucket.
- This allows all the buckets talk to each other.
- Under the CORS section for the sample just mention the URL of the bucket which it should allow to be accessed from.
- Polly is text to speech recognition service.
- Create a bucket and while creating it enable versioning.
- Versioning actually stores all the objects/versions seperately.
- Once Versioning is enabled it cannot be removed but only be disabled.
- If versioning has to be removed, we have to create a new bucket and transfer the objects to that bucket.
- If we delete a version we cannot restore it back, but if we delete an object itself we can restore it back.
- Versioning must be enabled on both the source and the destination buckets.
- Regions must be unique.
- Files in an existing bucket are not replicated automatically. All the subsequent updated files will be replicated automatically.
- You cannot replicate to multiple buckets or use daisy chaining.
- Delete markers are replicated.
- Deleting individual versions or delete markets will not be replicated.
Lifecycle Mangement, IA S3 and Galcier - Hands On
Lifecycle Management:
This basically helps in maintaining the life cycle of the object by writing a rule for the object under management section, by adding a life cycle rule.Life cycle rule: These rules help in manage the life cycle cost by transitioning from S3, after certain time into IA S3 and then to Glacier archiving the files which are least frequently used there by reducing the cost of the storage drastically.
Under the life cycle transition rules we can configure either on the current version or on the previous versions.
Object Created -- 30days later --> Transitioned to IA -- 30days later --> Transitioned to Glacier
-- 425 days later --> Expires
If the number of days in the glacier is less than 90 days and we want to
expire it before completing 90 days in the glacier, we will be required
to give an extra authorization saying we want to delete the object even
though we are charged for 90 days(which logically doesn't make sense).Glacier is designed to store an object at least for a minimum of 90 days.
Min 30 days after creation to IA S3, 60 days to Glacier and 61 days to Expire