SEEK Asia modernizes search with CI/CD and Amazon OpenSearch Service
10 mins read

SEEK Asia modernizes search with CI/CD and Amazon OpenSearch Service


This submit was written in collaboration with Abdulsalam Alshallah (Salam), Software program Architect, and Hans Roessler, Principal Software program Engineer at SEEK Asia.

SEEK is a market chief in on-line employment marketplaces with deep and wealthy insights into the way forward for work. As a world enterprise, SEEK has a presence in Australia, New Zealand, Hong Kong, Southeast Asia, Brazil and Mexico and its web sites entice over 400 million visits per yr. SEEK Asia’s enterprise operates throughout seven nations and contains main portal manufacturers akin to jobsdb.com and jobstreet.com and leverages information and know-how to create modern options for candidates and hirers.

On this submit, we share how SEEK Asia modernized their search-based system with a steady integration and steady supply (CI/CD) pipeline and Amazon OpenSearch Service (successor to Amazon Elasticsearch Service).

Challenges related to a self-managed search system

SEEK Asia offers a search-based system that allows employers to handle interactions between hirers and candidates. Though the system was already on AWS, it was a self-managed system working on Amazon Elastic Compute Cloud (Amazon EC2) with restricted automation.

The self-managed system posed a number of challenges:

  • Slower launch cycles – Deploying new configurations or new subject mappings into the Elasticsearch cluster was a high-risk exercise as a result of modifications affected the steadiness of the system. The little automation on each the self-managed cluster and workflows led to slower launch cycles.
  • Greater operational overhead – Sizing the cluster to ship higher efficiency, whereas managing affordably, was the opposite problem. As with each different distributed system, even with sizing steering, figuring out the suitable variety of shards per node and the variety of nodes to satisfy efficiency necessities nonetheless required some quantity of trial and error, turning the train right into a tedious and time-consuming exercise. This consequently additionally led to slower launch cycles. To beat this problem, in lots of events, oversizing the cluster turned the quickest technique to obtain the specified time to market, on the expense of price.

Additional challenges the group confronted with self-managing their very own Elasticsearch cluster included maintaining with new safety patches, and minor and main platform upgrades.

Automating search supply with Amazon OpenSearch Service

SEEK Asia knew that automation would the important thing to fixing the challenges of their current search service. Automating the undifferentiated heavy lifting would allow them to ship extra worth to their prospects rapidly and enhance workers productiveness.

With the issues outlined, the group got down to remedy the challenges by automating the next:

  • Search infrastructure deployment
  • Search A/B testing infrastructure deployment
  • Redeployment of search infrastructure for any new infrastructure configuration (akin to safety patches or platform upgrades) and index mapping updates

The important thing companies enabling the automation could be Amazon OpenSearch Service and establishing a search infrastructure CI/CD pipeline.

Structure overview

The next diagram illustrates the structure of the SEEK infrastructure and CI/CD pipeline with Amazon OpenSearch Service.

The workflow contains the next steps:

  1. Earlier than the workflow kicks off, an current Amazon OpenSearch Service cluster with a dwell feeder hydrates it. The dwell feeder is a serverless utility constructed on Amazon Easy Queue Service (Amazon SQS) through Amazon Easy Notification Service (Amazon SNS) and AWS Lambda. Amazon SQS queues paperwork for processing, Amazon SNS permits information fanout (if required), and a Lambda operate is invoked to course of messages within the SQS queue to import information into Amazon OpenSearch Service. The feeder receives dwell updates for modifications that should be mirrored on the cluster. Write concurrency to Amazon OpenSearch Service is managed by limiting the variety of concurrent Lambda operate invocations.
  2. The Amazon OpenSearch Service index mapping is model managed in SEEK’s Git repository. Each time an replace to the index mapping is dedicated, the CI/CD pipeline kicks off a brand new Amazon OpenSearch Service cluster provisioning workflow.
  3. As a part of the workflow, a brand new information hydration initialization feeder is deployed. The initialization feeder assemble is just like the dwell feeder, with one extra part: a script that runs inside the CI/CD pipeline to calculate the variety of batches required to hydrate the newly provisioned Amazon OpenSearch Service cluster as much as a particular timestamp. The feeder programs had been designed to realize idempotency processing. This meant distinctive identifiers (UIDs) from the supply information shops are reused for every doc, and duplicated paperwork replace an current doc with the very same values.
  4. Similtaneously Step 3, an Amazon OpenSearch Service cluster is deployed. To speed up the preliminary information hydration course of quickly, the brand new cluster could also be sized two or thrice bigger in opposition to sizing steering with shard replicas and index refresh interval disabled till the hydration course of is full. The present Amazon OpenSearch Service cluster stays as is, which implies that two clusters are working concurrently.
  5. The script inspects the variety of paperwork the supply information retailer has and teams the paperwork by batch sizes. SEEK recognized that 1,000 paperwork per batch offered the optimum ingestion import time, after working quite a few experiments.
  6. Every batch is represented as one message and is queued into Amazon SQS through Amazon SNS. Each message that lands in Amazon SQS invokes a Lambda operate. The Lambda operate queries a separate information retailer, builds the doc, and hundreds it into Amazon OpenSearch Service. The extra messages that go into the queue, the extra capabilities are invoked. To create baselines that allowed for additional indexing optimization, the group took the next configurations into consideration and reiterated to realize greater ingestion efficiency:
    1. Reminiscence of the Lambda operate
    2. Measurement of batch
    3. Measurement of every doc within the batch
    4. Measurement of cluster (reminiscence, vCPU, and variety of major shards)
  7. With the initialization feeder working, new paperwork are streamed to the cluster till it’s synced with the information supply. Finally, the newly provisioned Amazon OpenSearch Service cluster catches up and is in the identical state as the present cluster. The hydration is full when there aren’t any remaining messages within the SQS queue.
  8. The initialization feeder is deleted and the Amazon OpenSearch Service cluster is downsized mechanically to finish the deployment workflow, with duplicate shards created and the index refresh interval configured.
  9. Stay search visitors is routed to the newly provisioned cluster when A/B testing is enabled through the API layer constructed on Software Load Balancer, Amazon Elastic Container Service (Amazon ECS), and Amazon CloudFront. The API layer decouples the consumer interface from the backend implementation that runs on Amazon OpenSearch Service.

Improved time to market and different outcomes

With Amazon OpenSearch Service, SEEK was in a position to automate a complete cluster, full with Kibana, in a safe, managed surroundings. If testing didn’t produce the specified outcomes, the group might change the size of the cluster horizontally or vertically utilizing totally different occasion choices inside minutes. This enabled them to carry out stress assessments rapidly to establish the candy spot between efficiency and price of the workload.

“By integrating Amazon OpenSearch Service with our current CI/CD instruments, we’re in a position to absolutely automate our search operate deployments, which accelerated software program supply time,” says Abdulsalam Alshallah, APAC Software program Architect. “The newly discovered confidence within the trendy stack, alongside improved engineering practices, allowed us to mitigate the danger of modifications—bettering our time to market by 89% with zero impression to uptime.”

With the adoption of Amazon OpenSearch Service, different groups additionally noticed enhancements, together with the next:

  • Widespread Vulnerability and Publicity (CVE) has dropped to zero with Amazon OpenSearch Service dealing with the underlying {hardware} safety updates on SEEK’s behalf, bettering their safety posture
  • Improved availability with the Amazon OpenSearch Service Availability Zone consciousness characteristic

Conclusion

Amazon OpenSearch Service managed capabilities has helped SEEK Asia to enhance buyer expertise with velocity and automation. By eradicating the undifferentiated heavy lifting, groups can deploy modifications rapidly to their engines like google, permitting prospects to get the most recent search options quicker and in the end contributing to the SEEK function of serving to folks dwell extra productive working lives and organisations succeed.

To study extra about Amazon OpenSearch Service, see Amazon OpenSearch Service options, the Developer Information, or Introducing OpenSearch.


Concerning the Authors

Fabian Tan is a Principal Options Architect at Amazon Net Companies. He has a powerful ardour for software program growth, databases, information analytics and machine studying. He works carefully with the Malaysian developer neighborhood to assist them deliver their concepts to life.

Hans Roessler is a Principal Software program Architect at SEEKAsia. He’s enthusiastic about new applied sciences and upgrading legacy to newer stacks. At all times staying in contact with the most recent applied sciences is certainly one of his passions.

Abdulsalam Alshallah (Salam) is a Software program architect at SEEK, Beforehand a Lead Cloud Architect for SEEKAsia, Salam has at all times been enthusiastic about new applied sciences, Cloud, Serverless & DevOps, along with his ardour of eliminating wasted time/effort & assets; He’s additionally one of many leaders of AWS Consumer Group Malaysia.

Leave a Reply

Your email address will not be published. Required fields are marked *