This web page was created programmatically, to learn the article in its unique location you possibly can go to the hyperlink bellow:
https://aws.amazon.com/blogs/big-data/building-a-modern-lakehouse-architecture-yggdrasil-gamings-journey-from-bigquery-to-aws/
and if you wish to take away this text from our web site please contact us
This is a visitor publish by Edijs Drezovs, CEO and Founder of GOStack, Viesturs Kols, Data Architect at GOStack, and Krisjanis Beitans, Senior Data Engineer at GOStack, in partnership with AWS.
Yggdrasil Gaming develops and publishes on line casino video games globally, processing large quantities of real-time gaming information for sport efficiency analytics, participant conduct insights, and trade intelligence. As Yggdrasil’s system grew, managing dual-cloud environments created operational overhead and restricted their potential to implement superior analytics initiatives. This problem turned crucial forward of the launch of the Game in a Box resolution on AWS Marketplace, which generates will increase in information quantity and complexity.
Yggdrasil Gaming diminished multi-cloud complexity and constructed a scalable analytics basis by migrating from Google BigQuery to AWS analytics companies. In this publish, you’ll uncover how Yggdrasil Gaming remodeled their information structure to fulfill rising enterprise calls for. You will study sensible methods for migrating from proprietary programs to open desk codecs similar to Apache Iceberg whereas sustaining enterprise continuity.
Yggdrasil labored with GOStack, an AWS Partner, emigrate to an Apache Iceberg-based lakehouse structure. The migration helped cut back operational complexity and enabled real-time gaming analytics and machine studying.
Yggdrasil confronted a number of crucial challenges that prompted their migration to AWS:
Yggdrasil labored with GOStack, an AWS APN associate, to design their new lakehouse structure. The following diagram exhibits the excessive degree overview of this structure.
Yggdrasil efficiently migrated from Google BigQuery to a knowledge lakehouse structure utilizing Amazon Athena, Amazon EMR, Amazon Simple Storage Service (Amazon S3), AWS Glue Data Catalog, AWS Lake Formation, Amazon Elastic Kubernetes Service (Amazon EKS) and AWS Lambda. Their strategic method goals to cut back multi-cloud complexity whereas constructing a scalable basis for his or her Game in a Box resolution and particular AI/ML initiatives like personalised sport suggestions and fraud detection.
The mixture of Amazon S3, Apache Iceberg, and Amazon Athena allowed Yggdrasil to maneuver away from provisioned, always-on compute fashions. The Amazon Athena pay-per-query pricing prices just for information scanned, eradicating idle compute prices throughout off-peak intervals. Internal price modeling carried out through the analysis section indicated that this structure might cut back analytics system prices by 30–50% in comparison with compute-based warehouse pricing fashions of different options, significantly for bursty workloads pushed by sport launches, tournaments, and seasonal site visitors. By adopting AWS-native analytics companies, Yggdrasil diminished operational complexity via native integration with AWS Identity and Access Management (AWS IAM), Amazon EKS, and AWS Lambda, serving to simplify safety, governance, and automation throughout the analytics system.
The resolution facilities on a contemporary lakehouse structure constructed on Amazon S3, which supplies sturdy and cost-efficient storage for Iceberg tables in Apache Parquet format. Apache Iceberg desk format supplies ACID transactions, schema evolution, and time journey capabilities whereas sustaining an open customary. AWS Glue Data Catalog serves because the central technical metadata repository, whereas Amazon Athena acts because the serverless question engine utilized by dbt-athena and for ad-hoc information exploration. Amazon EMR runs Yggdrasil’s legacy Apache Spark software in a totally managed surroundings, and AWS Lake Formation supplies centralized safety and governance for information lakes, permitting fine-grained entry management at database, desk, column, and row ranges.
The migration adopted a phased method:
The first section of the migration targeted on constructing a strong basis for the brand new information lakehouse structure on AWS. The purpose was to create a scalable, safe, and cost-efficient surroundings that might help analytical workloads with open information codecs and serverless question capabilities.
GOStack provisioned an Amazon S3-based information lake because the central storage layer, offering just about limitless scalability and fine-grained price management. This storage-compute separation allows groups to decouple ingestion, transformation, and analytics processes, with every element scaling independently utilizing probably the most acceptable compute engine.
To set up dataset interoperability and discoverability, the staff adopted AWS Glue Data Catalog because the unified metadata repository. The catalog shops Iceberg desk definitions and makes schemas accessible throughout companies similar to Amazon Athena and Apache Spark workloads on Amazon EMR. Most datasets, each batch and streaming, are registered right here, enabling constant metadata visibility throughout the lakehouse.
The information is saved in Apache Iceberg tables on Amazon S3, chosen for its open desk format, ACID transaction help, and highly effective schema evolution options. Yggdrasil required ACID transactions for constant monetary reporting and fraud detection, schema evolution to accommodate quickly altering gaming information fashions, and time journey queries to align with regulatory audit necessities.
GOStack constructed a customized schema conversion and desk registration service. This inside instrument converts source-system Avro schemas into Iceberg desk definitions and manages the creation and evolution of raw-layer tables. By controlling schema translation and desk registration instantly, the staff makes positive that metadata stays in line with the supply programs and supplies predictable, versioned schema evolution aligned with ingestion wants.
The preliminary setup made the next parts:
The redesign of the earlier GCP setup helped ship price-performance enhancements. Yggdrasil diminished ingestion and processing prices by roughly 60% whereas additionally decreasing operational overhead via a extra direct, event-driven pipeline.
After establishing the lakehouse structure, the subsequent step targeted on enabling real-time information ingestion from Yggdrasil’s operational databases into the uncooked information layer of the lakehouse. The goal was to seize and ship transactional adjustments as they happen, ensuring that downstream analytics and reporting mirror probably the most up-to-date data.
To obtain this, GOStack deployed Debezium Server Iceberg, an open-source venture that integrates change information seize (CDC) instantly with Apache Iceberg tables. It was deployed as Argo CD functions on Amazon EKS and used Argo’s GitOps-based mannequin for reproducibility, scalability, and seamless rollouts.
This structure supplies an environment friendly ingestion pathway – streaming information adjustments instantly from the supply system’s outbox tables into the Apache Iceberg tables registered within the AWS Glue Data Catalog and bodily saved on Amazon S3, bypassing the necessity for intermediate brokers or staging companies. By writing information within the Iceberg desk format, the ingestion layer maintained transactional ensures and quick question availability via Amazon Athena.
Because Yggdrasil’s supply programs emitted outbox occasions containing Avro information, the staff carried out a customized outbox-to-Avro transformation inside Debezium. The outbox desk saved two key parts:
The customized transformation module mixed these parts into legitimate Avro information earlier than persisting them into the goal Iceberg tables. This method preserved schema constancy and verified compatibility with downstream processing instruments.
To dynamically route incoming change occasions, the staff leveraged Debezium’s event router configuration. Each document was routed to the suitable Apache Iceberg desk (backed by Amazon S3) primarily based on subject and metadata guidelines, whereas desk schemas and partitioning had been ruled on the AWS Glue facet to take care of stability and alignment with the lakehouse’s information group requirements.
This setup helped ship low-latency ingestion with end-to-end streaming from database outbox to S3-based Iceberg tables in close to actual time. The staff managed operations finish to finish on Amazon EKS utilizing Helm charts deployed by way of Argo CD in a GitOps mannequin for totally declarative, version-controlled operations. ACID-compliant Iceberg writes verified that partially written information couldn’t corrupt downstream analytics. The modular transformation logic allowed future growth to new supply programs or occasion codecs with out rearchitecting the ingestion pipeline.
This Debezium Server resolution supplies quick, real-time information ingestion. GOStack considers it an interim structure. In the long run, the ingestion pipeline will evolve to make use of Amazon Managed Streaming for Apache Kafka (Amazon MSK) because the central occasion spine. Debezium connectors will act as producers, publishing change occasions to Apache Kafka subjects, whereas Apache Flink functions will devour, course of, and write information into Iceberg tables.
This deliberate evolution towards a Kafka-based streaming structure verifies Yggdrasil’s lakehouse stays not solely scalable and cost-efficient right this moment, but additionally future-ready – able to supporting richer streaming analytics and broader information integration situations because the group grows.
Once real-time information ingestion was established, GOStack turned its focus to modernizing the information transformation layer. The purpose was to simplify the transformation logic, cut back operational overhead, and unify the orchestration of analytical workloads throughout the new AWS-based lakehouse.
GOStack adopted a lift-and-shift method for a few of Yggdrasil’s information pipelines to help a quick and low-risk transition away from GCP. The light-weight Cloud Run functions that beforehand dealt with extraction duties – pulling information from file shares, SharePoint, Google Sheets, and numerous third-party APIs – had been re-implemented utilizing AWS Lambda. These Lambda features now combine with the identical exterior programs and write information instantly into Iceberg tables.
For extra complicated processing, earlier Apache Spark functions operating on Dataproc had been migrated to Amazon EMR with minimal code adjustments. This allowed it to protect the prevailing transformation logic whereas benefiting from the managed scaling capabilities of EMR and improved price management on AWS.
Over time, these processes will likely be regularly refactored and consolidated into containerized workflows on the EKS cluster, totally orchestrated by Argo Workflows. This phased migration permits Yggdrasil to maneuver workloads to AWS rapidly and decommission GCP sources sooner, whereas nonetheless leaving room for steady enchancment and modernization of the information system over time.
Finally, a number of analytical transformations that beforehand lived as BigQuery saved procedures and scheduled queries, that had been now rebuilt as modular dbt fashions executed with dbt-athena. This shift made transformation logic extra clear, maintainable, and version-controlled, enhancing each developer expertise and long-term governance.
With the ingestion pipelines migrated to AWS, GOStack turned its focus to simplifying and modernizing Yggdrasil’s analytical transformations. Rather than replicating the earlier stored-procedure–pushed method, the staff rebuilt the transformation layer utilizing dbt to assist enhance maintainability, lineage visibility, orchestration, and long-term governance.As a part of this redesign, a number of information fashions had been reshaped to suit the brand new lakehouse structure. The most important effort concerned rewriting a crucial Spark-based monetary transformation right into a set of SQL-driven dbt fashions. This shift not solely aligned the logic with the lakehouse design but additionally eliminated the necessity for long-running Spark clusters, serving to generate operational and value financial savings.For the curated information layers, changing the legacy warehouse, GOStack consolidated quite a few scheduled queries and saved procedures into structured dbt fashions. This supplies standardized, version-controlled transformations and clear lineage throughout the analytical stack.
Orchestration was simplified as effectively. Previously, coordination was cut up between Apache Airflow for Spark workloads and scheduled queries analytical transformations, creating operational friction and dependency dangers. In the brand new structure, Argo Workflows on Amazon EKS orchestrates dbt fashions centrally, consolidating the transformation logic inside a single workflow engine. While most transformations nonetheless run on time-based schedules right this moment, the system now helps event-driven execution via Argo Events, giving the chance to progressively undertake trigger-based workflows because the transformation layer evolves.
This unified orchestration framework can convey a number of advantages:
Through this transformation, Yggdrasil efficiently unified its information lakes and warehouses into a contemporary lakehouse structure, powered by open information codecs, serverless question engines, and modular transformation logic. The transfer to dbt and Athena not solely simplified operations but additionally helped pave the best way for sooner iteration, less complicated governance, and higher developer productiveness throughout the information surroundings.
While efficiency tuning is an ongoing journey, as a part of the transformation redesign, GOStack made few performance-oriented tweaks to verify Athena queries may be quick and cost-efficient. The Apache Iceberg tables had been saved in Parquet with ZSTD compression, offering sturdy learn efficiency and lowering the quantity of information scanned by Athena.
Partitioning methods had been additionally aligned to precise entry patterns utilizing Iceberg’s native partitioning. Raw information zones had been partitioned by ingestion timestamp, enabling environment friendly incremental processing. Curated information used business-driven partition keys, similar to participant or sport identifiers and date dimensions, to assist optimize analytical queries. These designs made positive Athena might prune unneeded information and persistently scan solely the related partitions.
Iceberg’s native partitioning options, together with transforms similar to bucketing and time slicing, change conventional Hive partitioning patterns. Because Iceberg manages partitions internally in its metadata layer, not all Glue or Athena partition constructs apply. Relying on Iceberg’s native partitioning helps present predictable pruning and constant efficiency throughout the lakehouse with out introducing legacy Hive behaviors.
To deal with the excessive quantity of small recordsdata produced by real-time ingestion, GOStack enabled AWS Glue Iceberg compaction. This routinely merges small Parquet recordsdata into bigger segments, serving to enhance question efficiency and cut back metadata overhead with out handbook intervention.
The staff adopted AWS Lake Formation as the first governance layer for the curated zone of the lakehouse, leveraging Lake Formation hybrid entry mode to handle fine-grained permissions alongside current IAM-based entry patterns. This hybrid mode supplies an incremental and versatile pathway to undertake Lake Formation with out forcing a full migration of legacy permissions or inside pipeline roles, making it a perfect match for Yggdrasil’s phased modernization technique.
Lake Formation presents centralized authorization, supporting database, desk, column, and, critically for Yggdrasil, row-level permissions. These capabilities are important due to the corporate’s multi-tenant working mannequin:
With Lake Formation hybrid entry mode, tenant-specific row-level entry insurance policies are persistently enforced throughout Amazon Athena, AWS Glue, and Amazon EMR, with out introducing breaking adjustments to current IAM-based workloads. This allowed Yggdrasil to implement sturdy governance for exterior shoppers whereas protecting inside operations secure and predictable.
Internally, Lake Formation can be used to grant the Analytics staff and BI instruments focused entry to curated datasets, simple however centrally managed to take care of consistency and cut back administrative overhead.
For ingestion and transformation workloads, the staff continues to depend on IAM roles and insurance policies. Services similar to Debezium, dbt, and Argo Workflows require broad however managed entry to uncooked and intermediate storage layers, and IAM supplies an easy, least-privilege mechanism for granting these permissions with out involving Lake Formation within the inside pipeline path.
By adopting Lake Formation in hybrid entry mode and mixing it with IAM for inside companies, Yggdrasil established a governance mannequin that may stability sturdy safety with operational flexibility – enabling the lakehouse to scale securely because the enterprise grows.
The new lakehouse, constructed on Amazon Athena, Amazon S3, and AWS Glue Data Catalog, now underpins superior analytics and AI/ML use instances similar to participant conduct modeling, predictive sport suggestions, and fraud detection.
The optimized lakehouse design permits Yggdrasil to quickly onboard new analytics workloads and enterprise use instances, serving to ship measurable outcomes:
Yggdrasil Gaming’s migration journey illustrates how organizations can efficiently transition from proprietary analytics programs to an open, versatile lakehouse structure. By following a phased method guided by AWS Well-Architected Framework rules, Yggdrasil maintained enterprise continuity whereas establishing a contemporary basis for his or her information wants.
Based on this expertise, a number of classes emerged to assist information your personal transfer to an AWS-based lakehouse:
To study extra about constructing trendy lakehouse architectures, discuss with “The lakehouse architecture of Amazon SageMaker”.
This web page was created programmatically, to learn the article in its unique location you possibly can go to the hyperlink bellow:
https://aws.amazon.com/blogs/big-data/building-a-modern-lakehouse-architecture-yggdrasil-gamings-journey-from-bigquery-to-aws/
and if you wish to take away this text from our web site please contact us
This web page was created programmatically, to learn the article in its authentic location you…
This web page was created programmatically, to learn the article in its authentic location you…
This web page was created programmatically, to learn the article in its unique location you…
This web page was created programmatically, to learn the article in its unique location you'll…
This web page was created programmatically, to learn the article in its authentic location you…
This web page was created programmatically, to learn the article in its unique location you…