Building a real-time clickstream personalization engine with Amazon Personalize
Our customer is a media company that publishes blog posts daily. They seek to increase the engagement time of their users and accumulate more impressions. They want to target their users individually with real-time personalized content to achieve this. To meet this demand, the solution must scale quickly and be cost-effective and secure.
We adopted the serverless mindset for the delivery of this solution. As our customer was looking to validate the business value of personalization, we wanted to deliver a working solution to them as fast as possible. The serverless pattern allowed us to do so by providing tools for quick iteration. In addition, we took advantage of fully managed serverless AWS services like Amazon API Gateway, AWS Lambda, and Amazon Kinesis. This approach creates a fully serverless, secure, highly-available, pay-for-use model that can automatically scale and handle virtually any load.
Initially, there is a need to collect initial training data before the solution can provide recommendations. The data collection API is integrated with the customer platform to start aggregating information into central storage - S3. The data gets streamed directly from API Gateway to Kinesis Data Streams and then to Kinesis Firehose. The API Gateway to Kinesis integration is natively implemented with mapping templates. Native API Gateway integrations with other services are a powerful approach to alleviating future maintenance and operational costs.
Once we have sufficient data, we can process it for Amazon Personalize. First, we trigger a series of AWS Step Functions to orchestrate the initial training process. After the AWS Personalize setup and training is complete, we are ready to provide an API for the ML inference. A Lambda function is used to handle click-stream updates of the recommendation model, enabling real-time recommendations for the users.
We adopted Infrastructure as Code, and all the infrastructure is deployed using AWS CDK. This ensures that we have consistency between multiple environments and recreate the whole infrastructure from scratch in other AWS global regions.
Data encryption is enabled both “in transit” and “at rest” for every service which handles the data stream. In addition, AWS WAF is configured with API Gateway to filter the web traffic, providing DDoS protection and managed rules for the most common vulnerabilities in OWASP Top Ten.
The observability is based on Amazon CloudWatch:
Logging for all services is centralized to CloudWatch Logs;
CloudWatch Dashboards provide up-to-date visibility of the service metrics;
AWS X-Ray is used for traces - bringing end-to-end traceability of requests in the solution.
To capture and debug failed Lambda function invocations - a dead-letter SQS queue is configured.
As a part of our CI/CD pipeline, following the DevSecOps principles of fast feedback loops and shifting security left on the pipelines,
Amazon CodeGuru is used for code quality and security vulnerability scans
CloudFormation Guard Validator to ensure the infrastructure complies with predefined checks
Amazon Personalize allows us to quickly build and deploy curated recommendations and intelligent user segmentation at scale. Using the serverless pattern and fully managed AWS services help us deliver a working solution and business value quickly. The serverless approach needs little to no maintenance and support, which lowers the long-term operational overhead. It enables our client to have an architecture that is secure, cost-effective, automatically scalable, and highly available.