SoftwareEngineer,DaemyeongKang_강대명 Class Description
With English Subtitles! A back-end guide for High Capacity Service with Redis Exercises
"Didn't you ever wonder how
giant IT companies that have a huge amount of users
like Kakao and Naver are able to provide
stable services for 100 thousand or even 1 million people without failure?"
There's no place to learn how to build a high capacity service
other than large companies that have built high capacity services.
So I've brought all my experiences gained from developing
Naver Mail at Naver and KakaoStory at Kakao.
by a developer from Naver, Kakao, and Udemy
* Scalability: If an additional server is deleted, the service performance must be increased accordingly.
Click to find out about essential back-end concepts!
* Fault resilience: Service must be continuously available even if some of the service's API servers, DB servers, etc. fail.
* Automation: Service deployment or addition/removal of servers should be automated by scripts and not done manually.
* Monitoring: Monitoring is crucial because it's the most important to know that the service is working properly.
* ngrinder: It's important to know how much load the service can hold, and where the bottleneck occurs. ngrinder is a tool for load testing.
* prometheus: Collect the service index using the exporter. Prometheus stores them and shows them.
* Grafana: Grafana is used to visualize the indices collected by Prometheus.
* Load Balancing-ServerSide: ServerSide load balancing is a method in which the client only looks at the load balancer in front of the actual service, and the load balancer distributes the load to the service server at the back.
* Load Balancing-ClientSide: ClideSide load balancing is a method in which the client knows the list of actual service hosts and directly controls the load.
* Service Discovery-Zookeeper: A way to manage and find the service type and the list of servers that are part of the service.
* Circuit Breaker: A method which allows for a quick failure when the external service is failing and response is slowing down due to this call.
* Replication: Duplication of data. Replication is required for service availability.
* Failover: Handing over services to another server to replace it when the server fails. Failover is required for service availability.
* Consistent Hashing: Data misses from cache in large-scale services may affect service response. Data may disappear whenever a cache server is added/removed, and consistent hashing minimizes the data loss.
* Sharding-Index: A method of storing data separately when there's a large amount. The index method is complex, but it assigns on which server each data should be allocated, therefore minimizes data movement.
* Sharding-Range: The easiest way to apply sharding. Allocates servers according to the key range.
* Sharding-Modular: A way to shard data evenly compared to the Range method. However, if the scalability of the server gets limited by the end.
* Guid: It is very important to create the only key for sharding in a large-scale service. Guid efficiently effeciently creates the only key.
* Async Queue: If you save data from the API server directly to the DB, it eventually adds load to the DB server as much as it gets crowded in the API server. If you use Queue to asynchronously process the data while controlling the load, you can probide a more stable service.
* Deploy-BlueGreen: Time spent for a Rolling Update increases as the number of servers increase in cases of deployment and rollback. Blue-Green Deployment solves this problem.
* Deploy-Canary: Even if the deployment is done as Blue-Green, if a new version is deployed to a large number of servers, many users will experience bugs when there are any. Canary deployment only deploys a part of the new version to prevent this.
* Timeout:If you set a wrong Timeout value when calling for an external service, problems such as duplicate data may occur. This is a tip on how to prevent this.
* Caching-Look Aside: Usually, if there's no data in cache, cache is created by reading the corresponding data from the DB. This is what we call the Look-Aside Caching method.
* Caching-Write Back: The Write-back method writes data in the cache first, and then stores the actual data in the storage afterwards because writing on the DB causes load. Depending on the situation there may be a risk of data loss.
* Hot Key Issue: A hot key causes problems when a large amount of data is crowded in a particular key over the cache or DB server performance capacity. Ways to solve the hot key issue are local cache, using the read replica, and multi write read one, etc.
* Scalability: If an additional server is deleted, the service performance must be increased accordingly.
Daemyeong Kang was the first Korean to do a presentation at RedisConf in 2016, and he currently ranks 22nd among Redis contributors
(*Open source development project contributor) worldwide!
With Daemyeong Kang, we'll learn about the essential knowledge for building a high capacity service along with Redis technology in this class.
Daemyeong Kang, ranked 22nd from 2009 to 2021 among Redis contributors worldwide (*As of July 2021)
What is Redis, and why is it important? Click to find out!
It is a distributed cache technology used by IT giants such as Facebook, Naver, and Kakao, and unicorn startups such as BaeMin and Coupang to process large-scale messages from users. Redis has the advantage of reducing the developers' workload by providing fast speed while also providing data structures that can be easily used by developers.
Length: 39 videos
- Linux or Mac
- Python 3(3.8.6)
- AWS Account, Access Key
4 Class Exercises
Practice 1. Service Discovery
(#Scalability #FaultResilience #Automation)By repeating to add and remove a callee,
we can see that the caller automatically recognizes and processes it.
By using Service Discovery through this process, when a server is added/removed,
you can learn that the caller can automatically reflect it without changing any other settings.
Practice 2. Consistent hashing
(#FaultResilience)Through exercises where you add/delete a Redis server which acts as a cache, you can see for yourself that data distribution is minimized due to consistent hashing.
In a large-scale service, performance is degraded even if the lost data is cache data. If you use consistent hashing, you can actually see that cache data loss is reduced even when a cache server is added/deleted, and you can apply it later!
Practice 3. AutomationI'll show you the process of uploading a simple GeoIP service using ansible and terraform.
Deploy a GeoIP server by creating 1 AWS.
This process shows that an the same infrastructure can always be "automated", and easily built and deployed through IaC(Infrastructure as Code) rather than manually building it.
Learning about IaC is essential for building a large-scale service.
Practice 4. MonitoringRequest a grafana dashboard where you monitor basic nodes,
deploy a simple GeoIP and simply monitor the results through prometheus.
If an error occurs over the set value, a notification will be sent through Slack. Monitoring is essential to find out if there's been a failure. Learn what information should be monitored from the class.
*These are sample images for better understanding.
Faults happen anytime, anywhere.
So the key is not to prevent failure in a service,
but to quickly find the cause of the failure when it occurs,
or how quickly the failure is recovered.
Large-scale service development ultimately
depends on how easily the service is scaled,
and building a structure that could easily respond to failures.
However, it's difficult to apply this just by listening to the
class once and simply learning.
So in this class, as I show you the basic
knowledge that's necessary for large-scale service development,
and the parts that actually affect performance through exercises,
I'll help you improve your understanding
and apply it to practical work.
LemonTree / Software Engineer
- Tech Lead Managing AWS Infra Design and Implement Service Architecture
• 2020 ~ 2021
Weverse Company / Software Engineer
- Data Pipeline Management (EMR, Databricks, Snowflake, Structured Streaming, Kafka)
- Airflow Management and DAG Development(k8s)
- Changed Data Capture processing through Databricks DeltaLake
- Backend API Design and Implement Servers for BI Services
• 2017 ~ 2020
Udemy / Software Engineer
- Data Pipeline Management
- Data Pipeline OnPremise relocation to AWS
- GDPR related processing
• 2013 ~ 2017
Kakao / Software Engineer
- Kakao Story Development
- Kakao Home Development
• 2008 ~ 2012
Naver / Software Engineer
- Naver Mail development
- Naver Windows mobile app development
• 2002 ~ 2008
FINALDATA / Software Engineer
- Final Forensics development
Projects & Awards
• 2002 ~ 2008
- Redis Operation Management(Hanbit Media)
- Memcached and Redis for Building High
Capacity Servers(Hanbit Media)
- Windows Network Programming(Daelim)
• 2019 ~ 2020
- Pusan National University SW Education Center
- Devground Junior
- Woowa Redis (Woowa Brothers)
- Andong National University Open SW Camp
- Domestic Open SW Classes for the Currently Employed
- Kyungpook National University
• 2017 ~ 2018
- Seoul National University of Science and Technology
- SW Maestro
- Zum internet
• 2013 ~ 2016
- Redis Conf 2016(San Francisco)
- Naver D2SF
- Open SW Day
- Deview 2013
- NDC 2013
Let's have a sneak peek of Daemyeong Kang's
"Multi-Write Read One" method of Redis exercise.
When a Hot Key occurs, even a high-performance cache server
cannot receive data over its capacity.
To deal with this, did you know that you can solve this problem using the Multi-Write, Read One method that writes cache in a few locations and reads data randomly?
The problem is solved through the following process.
- Setting up Redis
- Cache Write stores in 2 locations.
- Cache Read reads from 1 location.
▼ Through the above process and exercise, you can learn 'this'! ▼
In a large-scale service, hot keys, where more access is made over the performance provided the cache, may occur and degrade performance. You can learn how to process hot keys to prevent performance degradation.
Buy now, get unlimited access.
The special offer ends this Friday
This special offer ends soon.
Buy now and save more!
with Daemyeong Kang
"I'm participating in a development project for global users at Weverse Company in the globally competitive entertainment industry."
I'm in charge of making a data pipeline which collects data from Weverse and Weverse Shop, which are a global community & service platform that provide communication between artists and fans, contents, and service at Weverse Company. The collected data are processed so the data analysts can analyze them. A log allows for a stable processing of data on how many users visit Weverse and Weverse shop, and what actions they take.
"I participated in the development projects for the Mobile App/Mail at Naver, KakaoHome/KakaoStory at Kakao."
I was able to use many cache servers in Naver and Kakao as well. I also tried handling Memcached servers with over 3TB of memory, and Redis servers with over 5TB. I also worked on converting Ruby to Java on a service with more than 10 million DAUs, which is often described as changing the wheels of a running train. In a large-scale service, it's important to solve problems well, but I've learned that I have to automate the simple to hard parts as much as possible, and monitor as many parts as possible in order to operate the service.
"In my 20 years of being a developer, I'm always experiencing a new kind of failure."
In a service with over 10 million MAUs, several servers experience failure in a day for no reason. There were failures where I had to actually find out and fix using a Linux kernel source why the kernel memory runs out when using a RAM disk in a 32-bit kernel, why crashes may occur in a certain port bands when the server is running. I've also experienced problems with caching issues in DNS, issues caused by internal issues which occurred while upgrading from Java 1.7 to 1.8, latency issues between eastern and western US, and issues caused by latency between regions in Korea. In Udemy, I've experienced many problems while transferring the service from on-premise to AWS.
"Redis is my secret weapon."
Currently, high capacity services are increasing with various social media services, including web services. Because of this, interest in high capacity data processing technology is increasing. Distributed cache technologies, which distribute servers and withstand a lot of traffic, play a key role in building high capacity servers. So I thought Redis could be a secret weapon not just for me, but for all back-end developers. Through this lecture, I'd like to help many back-end developers so they could work with ease.
Who would you recommend this class to?
I think this class will be the most helpful for those who are willing to work as a back-end developer, or junior back-end developers with 1~5 years of experience.
Specifically speaking, Part 1 is for beginners, and Part 2 teaches actual knowledge related to large-scale service for juniors. If you have practical work experience, I recommend you to focus on Part 3 which deals with things to think about in practical work.
Background knowledge required for this class is basically knowing what back-end is. This class would be easier to understand for those who have experienced creating a simple web service.
If you take this class, I believe you will surely obtain all essential knowledge for building a large-scale service.
This course will use Linux or Mac, Python 3(3.8.6), Ansible, Terraform, AWS Account, and Access Key.
Please purchase and install these program(s) for an optimized lecture experience.
*These programs and/or materials will not be provided with the course.
※About the availability on Mac and Windows, the class has been filmed on Windows(Windows 10, 64-bit), but the actual exercises were done on Mac or Linux.
(I'll remotely have a Linux server on and show how it works from there.)
※There are parts where AWS servers are used during exercises, but an AWS account is required and it is a paid service. However, most of the exercises can be done locally.(Only a few exercises need to be done on AWS.)
Like this class?
Then check these out!
This course will open on January 17, 2023 (PST)
*The duration of the class discount may change without prior notice.
*Please ensure to fill in your email address correctly, as the payment and class information will be sent to the registered email address.
[How do refunds work?]
If you would like to request a refund because a Class did not meet your expectations, please contact us for the refund (email@example.com) Also, for more detailed information, please review our Refund Policy.
1. Earlybird Class
If you purchase an Earlybird Class and request a refund before the class videos are available, you are eligible to receive a complete refund or the amount paid by you through the Coloso Platform.
2. Purchasing a "Now Available" Class
Up to 14 days after purchase: If you purchase a "Now Available" class and request a refund, you may receive a complete or near-complete refund depending on refund eligibility. Please refer to the Refund Eligibility section below to see if you are eligible for a refund. Please refer to our Refund Policy for more information on the refundable amount.
3. Refund Eligibility
To submit a valid refund request and receive reimbursement for your purchase, you are required to meet each and every one of the following conditions:
(a) you must be a registered User on the Coloso Platform;
(b) you must be the User that enrolled in the Class;
(c) you must request the refund in writing to our support center within 14 days of purchase, and you must provide us the requested information, including but not limited to the information about your Account, Class, and the circumstances of the refund request;
(d) you must have consumed less than three clips of the Class
(e) you must not have downloaded any of our class materials
4. Additional reasons for refund denial
You may not be eligible for any refund in cases where we believe there is refund abuse or fraudulent behavior, including but not limited to the following circumstances where:
(a) a user has requested multiple refunds for a single course
(b) a user has asked for excessive refunds
(c) we detected fraudulent behavior(s) from a User
(d) an account has been reported, banned, or deactivated due to a violation of our Terms.
5. We limit the number of devices that can access an account to 3 each.
Device registration occurs upon accessing a class video. If you wish to change the device you have registered to your account (i.e. you are using a new mobile device), please contact us at firstname.lastname@example.org.
Changes to device registration can happen only once a year. (Your device is registered to your account after you sign in to the account with your device)
Questions about refund?
Please email us here: email@example.com
Would like to request a refund?
Pleaes email us here: firstname.lastname@example.org