Big Data | Cloud Academy Blog

How Cloud Academy Is Using Cube to Win the Data Challenge

Cloud Academy Team — Mon, 07 Nov 2022 01:00:00 +0000

At Cloud Academy, we manage a lot of data every day. We have different sources we get data from such as feedback, events, and platform usage, and we need to get it, apply transformations, and finally present the data to our internal stakeholders and our customers.

Because of the variety of the data that we provide, we recently implemented Cube, a Headless BI solution. It allowed us to handle, model, and present data through our BI tools smoothly.

What is a Headless BI Tool?

A Headless BI tool is a set of components that acts as middleware between your data warehouse and your business intelligence applications. It provides us with four main data-related components without the need of designing and implementing custom solutions. It allows us to work with data without hitting the data warehouse directly but leveraging the abstraction layer represented by the tool.

The name Headless comes because the tool allows us to work with the data, but it deliberately delegates the task to show and visualize it. This is the responsibility of the BI tool.

A Headless BI tool offers the following four components:

Modeling – It allows us to leverage data on the data warehouse and model it by defining dimensions and measures, usable by the BI tool
Security – It allows us to declare who can access the data, and restrict shown data if needed
Caching – It provides us with a caching layer to store the results of recent queries and speed up the next ones
APIs – It provides us with one or multiple APIs (such as RESTful and SQL) to hit your data

Why does Cloud Academy use Cube?

Data Modeling

We have a lot of data coming from multiple sources internal and external to Cloud Academy. We work with structured, semi-structured, and unstructured data. So, before hitting the data from our BI tool, we need an approach to prepare and model data in an effective way.

Cube allows us to create final entities composed of dimensions (attributes) and measures (aggregations of a particular numeric column), exposed through the API.

This way we have the whole collected data in our data warehouse we can query anytime for analysis purposes, and we have modeled specific data usable by the BI tool through the APIs.

Security

We handle data that can be publicly accessible, data related to specific customers, and data containing PII (Personal Identifiable Information). Because of this scenario, data security access is one of the most important components that Cube offers to us.

By using Cube, we have been able to implement the following security patterns:

Row Level Security – Depending on the user or entity that is accessing the data, some rows can be obfuscated. Suppose you are company A and want to get data about the platform usage of your users. You should not be able to access the usage data of company B. So rows related to company B are not provided when exploring the data.
Data Masking – Depending on the user or entity that is accessing the data, some attributes could be masked because of permissions assigned to the user or entity. This mainly happens when the attribute contains personal information such as a name, an email, or a phone number.

Caching

We provide a lot of insights and answers through the data to our internal stakeholders and Cloud Academy customers.

Every hour, a lot of queries are performed on our data, and most of them require the data warehouse to process millions of rows to succeed. Because of that, having a caching layer is crucial for us to avoid overloading the warehouse with common queries.

Cube provides us with a caching layer to temporarily store the result of early executed queries. So, the same queries won’t hit the data warehouse again if executed after a little time.

Leveraging the caching layer allows us to get the result of the query faster than hitting the data warehouse, and this translates into faster loading of the charts that our users visit. The caching layer provided us with a performance boost of about 70% when hit.

Data Access through APIs

Last but not least, we need to access our data quickly and through standard interfaces. Cube provides APIs to achieve this goal.

Depending on the tool you use, you could have multiple choices such as RESTful, or SQL.

In our scenario, we have two access points to the data:

Internal BI – It’s represented by all the data that we show to our internal stakeholders by using our internal BI tool: Superset.
Customer Facing BI – It’s represented by the dashboards we provide to Cloud Academy enterprise customers. They are built by using the Recharts library and served through a React.js front-end application. Through these dashboards, they get insights about the Cloud Academy platform usage of their employees.

Cloud Academy is a data-driven company, and Cube really helped us work and manage data that are crucial for our business.

It allowed us to model data in order to define dimensions and measures to be exposed to the BI tools. This way we’ve been able to mask all the underlying tables and logic.

By using Cube, we have been able to implement a strong security layer composed of data masking and row-level security, both for our internal usage (Superset) and for enterprise customer usage (React application).

Cube allowed us to significantly increase the speed of the queries, and so the time to retrieve a dashboard. Also, it provides us with pre-aggregations, a useful caching layer that we can manage in order to always speed up our queries (different from the native caching layer, which loses the data cached after a certain period).

We always try to keep updated with new approaches and tools that let us provide a better experience to our stakeholders on the available data. The ideal scenario would be for stakeholders that can easily access the data they need, in the manner they need, by letting us be compliant with security and privacy constraints.

The post How Cloud Academy Is Using Cube to Win the Data Challenge appeared first on Cloud Academy.

How Do We Transform and Model Data at Cloud Academy?

Cloud Academy Team — Tue, 07 Jun 2022 18:08:17 +0000

How Do We Transform and Model Data at Cloud Academy?

“Data is the new gold”: a common phrase over the last few years. For all organizations, data and information have become crucial to making good decisions for the future and having a clear understanding of how they’re making progress — or otherwise.

At Cloud Academy, we strive to make data-informed decisions. That’s why we decided to invest in building a stack that can help us be as data-driven as possible.

Business users constantly leverage our data in order to monitor platform performance, build and extract reports, and see how our customers use the Cloud Academy. We also provide data to our enterprise customers, which lets them keep track of their platform usage.

Where do we get data?

Before modeling and using data, we need to extract useful data in order to create valuable information. We have two primary data sources:

Internal: Provided by our operational systems that expose the core data of the Cloud Academy platform, like labs and course sessions.
External: Provided by external services and platforms that we use to collect data about events not strictly related to the Cloud Academy platform, like currency exchange rates.

The data extraction process

The first step of modeling and using data begins with extracting it from different sources and putting it in a library where it can be assessed: the Data Warehouse.

Once information’s stored in this analytical database, we can perform queries and retrieve helpful information for our internal users and customers alike.

The Data Warehouse is logically split into two main parts:

Staging Area: This is the area where raw (data extracted from sources as-is) and pre-processed (data extracted from sources that is then processed, such as applying common formats) data are stored. This data is not in the final form, so it is not used by the final users.
Curated Area: This is the area where the transformed and modeled data is stored. We model data by using the dimensional modeling technique following the Kimball approach. This data is pulled directly by end users through SQL queries or through Business Intelligence tools.

To extract data from sources, our team built data transformation pipelines. We use the programming language Python to create pipeline logic, then we use Prefect to orchestrate the pipelines.

In some cases, raw data is extracted and placed in the staging area; in others, a few transformations need to be performed. This process produces pre-processed data.

What’s next in the data transformation process?

As soon as raw and pre-processed data are in the staging area of the Data Warehouse, we can apply data transformations and data modeling. To do so, we use dbt (data build tool), a powerful resource that allows engineers to work and model data in the Data Warehouse.

dbt lets us declare desired transformations by defining SQL scripts, known as dbt models. Each model represents a new step in the transformation of data from the staging area to the curated area.

How are dbt models organized?

While performing the transformations, we consider a few different model categories:

Staging Models: Initial models that represent raw and pre-processed data extracted through Prefect pipelines.
Intermediate Models: Models that take data from staging models or from intermediate models (if multiple levels are defined).
Marts Models: Models that take data from the intermediate models and that represent the tables in the curated area. The marts models are usually dimensions (containing partially denormalized data about entities) and facts (containing normalized data about happened events).

Data quality with dbt

Data quality represents consistency and how well data is structured for solving problems or to serve specific purposes.

dbt is a great tool for this because it also enables us to perform tests on both source data and the data we produce. There are different native tests available in order to test the data (you can find available tests here), which can easily be used and integrated where you define data structure. Of course, you can also define custom tests if none of the available ones fit your testing scenario.

Here’s an example of dbt test usage:

By leveraging dbt tests, we ensure the data we provide to our business users always have quality and consistency checks.

Finally, using data!

After the data extraction and transformation process is completed, data is available in the curated area of the Data Warehouse.

Business users can leverage this data to get reports and insights. The most common scenario involves leveraging data through a Business Intelligence tool to build dashboards. In some situations, reports need to be extracted, so we provide them by performing custom queries on the Data Warehouse.

The post How Do We Transform and Model Data at Cloud Academy? appeared first on Cloud Academy.

What is Data Engineering? Skills, Tools, and Certifications

Cloud Academy Team — Fri, 28 Jan 2022 01:00:00 +0000

Data engineering is the process of designing and implementing solutions to collect, store, and analyze large amounts of data. This process is generally called “Extract, Transfer, Load” or ETL.

The data then gets prepared in formats to be used by people such as business analysts, data analysts, and data scientists. The format of the data will be different depending on the intended audience. Part of the Data Engineer’s role is to figure out how to best present huge amounts of different data sets in a way that an analyst, scientist, or product manager can analyze.

What does a data engineer do?

A data engineer is an engineer who creates solutions from raw data. A data engineer develops, constructs, tests, and maintains data architectures.

Let’s review some of the big picture concepts as well finer details about being a data engineer.

What does a data engineer do – the big picture

Data engineers will often be dealing with raw data. Many of them are already familiar with SQL or have experience working with databases, whether they’re relational or non-relational. They need to understand common data formats and interfaces, and the pros and cons of different storage options.

Data engineers are responsible for transforming data into an easily accessible format, identifying trends in data sets, and creating algorithms to make the raw data more useful for business units.

Data engineers have the ability to convert raw data into useful insights. Data scientists are very grateful for the work done by data engineers to prepare data so that they can turn it into insights.

What does a data engineer do – details

The architecture that a data engineer will be working on can include many components. The architecture can include relational or non-relational data sources, as well as proprietary systems and processing tools. The data engineer will often add services and tools to the architecture in order to make sure that data scientists have access to it at all times.

Earlier we mentioned ETL or extract, transform, load. Data engineers use the data architecture they create to load, extract and transform raw data. Raw data can often contain errors and anomalies such as duplicates, incompatibilities, and mismatches. Data engineers will review the data and suggest ways to improve its quality and reliability.

How data engineers use tools – a basic example

An import tool that can handle data could be used to ignore rows not meeting certain criteria and only import those rows. Data could be a string, a number, or a particular length.

You could use a Python script to convert or replace specific characters within those fields. Creative data engineers will be able to identify problems in data quickly and will be able to find the best solutions.

How to become a data engineer

Here’s a 6-step process to become a data engineer:

Understand data fundamentals
Get a basic understanding of SQL
Have knowledge of regular expressions (RegEx)
Have experience with the JSON format
Understand the theory and practice of machine learning (ML)
Have experience with programming languages

1. Understand data fundamentals

Understanding how data is stored and structured by machines is a foundation. For example, it’s good to be familiar with the different data types in the field, including:

variables
varchar
int char
prime numbers
int numbers

Also, named pairs and their storage in SQL structures are important concepts. These fundamentals will give you a solid foundation in data and datasets.

2. Get a basic understanding of SQL

A second requirement is to have a basic understanding of SQL. Knowing SQL means you are familiar with the different relational databases available, their functions, and the syntax they use.

3. Have knowledge of regular expressions (RegEx)

It is essential to be able to use regular expressions to manipulate data. Regular expressions can be used in all data formats and platforms.

4. Have experience with the JSON format

It’s good to have a working knowledge of JSON. For example, you can learn about how JSONs are integral to non-relational databases – especially data schemas, and how to write queries using JSON.

5. Understand the theory and practice of machine learning (ML)

A good understanding of the theory and practice of machine learning will be helpful as you architect solutions for data scientists. This is important even if working with ML models may not be part of your daily routine.

6. Have experience with programming languages

Having programming knowledge is more of an option than a necessity but it’s definitely a huge plus. Some good options are Python (because of its flexibility and being able to handle many data types), as well as Java, Scala, and Go.

Soft skills for data engineering

Problem solving using data-driven methods

It’s key to have a data-driven approach to problem-solving. Rely on the real information to guide you.

Ability to communicate complex concepts and visualize them

Data engineers will need to collaborate with customers, integration partners, and internal technology teams. Sharing your insights with people of various backgrounds and understanding what they are trying to convey is always helpful.

Strong sense of ownership

Take initiative to solve complex problems, because that’s what this job is about. You will be given a framework and a job goal – it’s up to you to figure out the rest.

Tools and resources for data engineering

The following are tools that are important in data engineering, along with courses that explain how to use them and where they fit in the job role.

Databases, relational and non-relational

It’s good to understand database architectures. Some basic real-world examples are:

Relational, SQL database: e.g. Microsoft SQL Server
Document-oriented database: MongoDB (classified as NoSQL)

The Basics of Data Management, Data Manipulation and Data Modeling

This learning path focuses on common data formats and interfaces. The path will help you understand common data formats you might encounter as a data engineer, starting with SQL.

MongoDB Configuration and Setup

Watch an example of deploying MongoDB to understand its benefits as a database system.

Apache Kafka

Amazon MSK and Kafka Under the Hood

Apache Kafka is an open-source streaming platform. Learn about the AWS-managed Kafka offering in this course to see how it can be more quickly deployed.

Apache Spark

In this lecture, you’ll learn about Spark – an open-source analytics engine for data processing. You learn how to set up a cluster of machines, allowing you to create a distributed computing engine that can process large amounts of data.

Apache Hadoop

Introduction to Google Cloud Dataproc

Hadoop allows for distributed processing of large datasets. In this course, get the real-world context of Hadoop as a managed service as part of Google Cloud Dataproc, used for big data processing and machine learning.

Python

Introduction to Python for Programmers

Python is a powerful and flexible scripting language that can handle many data types. This course is a quick summary of the theory and practice of Python for users who already have a programming background.

Java

Introduction to Java

Java is a robust, complicated, but proven language that forms the base of much data engineering work. This learning path covers the basics of Java, including syntax, functions, and modules. These courses teach you how to write Java applications and functions using object-oriented principles.

Data Engineering Certifications

There’s probably no better way to both educate yourself in data engineering and prove to employers what you know than through certifications from the big cloud providers.

The following certification learning paths provide updated, proven, detailed methods to learn everything you need about data engineering.

AWS Data Engineering

AWS Certified Data Analytics Specialty (DAS-01) Certification Preparation

This learning path covers the five domains of the exam. This includes understanding the AWS data analysis services and how they interact with one another. It also explains how AWS data services fit into the data lifecycle of storage, processing, visualization, and storage.

Azure Data Engineering

Foundational Certification

DP-900 Exam Preparation: Microsoft Azure Data Fundamentals

This certification path is for technical as well as non-technical individuals who wish to show their knowledge about core data concepts and how these are implemented using Azure data services.

You’ll learn about the basics of data concepts, relational and non-relational Azure data, and how to describe an Azure analytics workload.

Associate Certifications

DP-203 Exam Preparation: Data Engineering on Microsoft Azure

This certification learning path will teach you how to manage and deploy a range of Azure data solutions. This exam will test your knowledge in four areas: designing and building data storage; designing, developing and managing data processing; designing and monitoring security; and optimizing data storage.

Google Cloud Data Engineering

Google Data Engineer Exam – Professional Certification Preparation

This certification learning path helps you understand and work with BigQuery, Google’s managed cloud data warehouse. You’ll learn how to load, query, and process your data. You’ll learn how to use machine learning for analysis, build data pipelines, and use BigTable for big data applications.

What is Big Data Engineering?

You can call it a buzzword, but big data engineering is the umbrella term for everything in the data engineering world. Typically in big data engineering, you have to interface with huge data processing systems and databases in large-scale computing environments. These environments are often cloud-based to take advantage of the distributed, scalable nature of cloud solutions, as well as turnkey set up in order to speed up development and deployment.

What’s the difference between a data engineer and a data scientist?

These roles can be combined, but they work well together. Data scientists and data engineers are two roles that require different skills and have distinct tasks.

Data engineers design, test and maintain data. Data scientists organize and manipulate data in order to gain insight. Data engineers are responsible for creating data that scientists can use.

Although things aren’t always perfectly separated in the real world, think of the data engineer as the controller of the data and its infrastructure, and the data scientist as the specialist who gathers insights from the curated data.

Both roles are important and need cooperation and respect to work well together and achieve a successful outcome.

How much do data engineers make?

As of early 2022, some of the top salary sites online show the following numbers for an average base salary for a data engineering role in the United States:

Glassdoor: $112,000
Payscale: $93,000
Indeed: $116,000

FAQ

Is data engineering easy?

It’s not easy, and it’s not the easiest role to get into, but it’s definitely interesting and rewarding. Some industry experts complain that there is a huge gap between self-educated and actual-world data engineers. This is due to a lack of relevant college or university programs that prepare you for data engineering.

Do you need math for data engineering?

In general, data engineering is not math-heavy. It would be helpful to be familiar with statistics and probability to get a sense of what data scientists in your team will do. A good understanding of problem solving from a software engineering and cloud architect point of view will help for daily issues.

Are data engineers in demand?

Yes, data engineers are in demand, especially as companies realize that the hype of data science is built on the foundation of work from data engineers. The most marketable data engineers have multi-cloud experience to help them make an impact in any environment.

Do data engineers code?

Yes, data engineers can expect to do a lot of data pipeline coding so they should be comfortable with programming languages and debugging issues. It’s helpful to be fluent in SQL, Python, and R.

The post What is Data Engineering? Skills, Tools, and Certifications appeared first on Cloud Academy.

Cloud Migration Series (Step 4 of 5): Adopt a Cloud-First Mindset

Cloud Academy Team — Thu, 20 May 2021 23:00:41 +0000

This is part 4 of a 5-part series on best practices for enterprise cloud migration. Released weekly from the end of April to the end of May 2021, each article will cover a new phase of a business’s transition to the cloud, what to be on the lookout for, and how to ensure the journey is a success.

Be sure to subscribe to our blog to be notified when new content goes live!

Adopt a Cloud-First Mindset

Why should you adopt a cloud-first mindset? As you start on your cloud migration goals, it’s going to be key to repeatedly come back to the fundamentals as people lose focus or enthusiasm. We’re talking about the phenomenon known as the “trough of sorrow” where a dip in initiative and output occurs, usually when novelty wears off and learning becomes harder.

To be prepared, it’s important to refer back to your fundamental goals. The new mentality at your organization will consist of the following key understandings, which will eventually become part of the healthy baseline culture:

Being aware of cloud tools and services — this is the what in terms of the cloud: understanding the landscape, the providers, and the language so you can have a conversation with stakeholders, vendors, and consultants.
Being able to use cloud services effectively, economically, and safely — this is the how, part 1: Safety and economy first. You don’t want to cause any more problems or spend budget unwisely.
Understanding how to apply cloud tools to solve customer problems — this is the how, part 2: The other side of being able to use cloud tools safely is being able to apply them to real-world problem-solving.
Being able to use your cloud services together to create new products and solutions — this is the why: This piggybacks on the previous point…whether it’s a service or a product, you’re going to want to take advantage of the opportunity to make something new first in the market.

Getting through these steps is a cycle, an ongoing process. As mentioned at the beginning, one part of the journey that always happens as a large group progresses on a big change is the “trough of sorrow.”

There are always going to be peaks and valleys in your progress, and the faster you can get back to your goals, the better — so how do you do that? There are a few ways to build back momentum.

Certification campaigns

Our practitioners and instructors at Cloud Academy have had lots of opportunities to interact with enterprises at various parts of their cloud transition. What we’ve seen in other engagements is that internally commenced certification campaigns can provide personal motivation to individual team members. These cert campaigns help team members commit to gaining new domain knowledge.

Further, when leaders can incentivize people to get core certifications for a desired specialty — i.e., AI certifications on Azure in order to develop solutions — this helps the employees’ own professional development while at the same time putting the overall team in a strong position to tackle new product initiatives.

Product teams brought up to speed

Remember that it’s not just IT and engineering that needs to be educated and fluent in cloud. For your team to gain maximum benefit from the full offering of cloud technologies, you’ll need all product-oriented roles to be aware of how cloud-based services such as artificial intelligence tools and services can be applied to solve business issues. As a starting point, evaluate areas where you may be struggling with data. Can a turnkey AI solution help here? If not, what types of changes would need to take place in order to leverage some of the positives of a managed service (and later down the road, a custom-built service)?

Unsure of some of your staff’s levels? Get back to basics

We’re going to continue beating the drum on this, but it’s helpful: you will need to assess your entire team’s skill levels, and continually monitor as time goes on. This sounds like a lot of work, which is why a programmatic approach that can scale with your organization is the ideal way to keep learning momentum moving across the board.

Just make sure that the learnings are outcome-oriented: whether it’s certifications, specific job roles, or specific technical tasks, the learning paths that your employees take should have a clear goal.

The last challenge: working together

Once you take all these steps you’ll get back on track, and all teams will become aware of how to solve business issues. But let’s be honest: there will always be the challenge of getting people to work together.

To put it bluntly, the big challenge is how do we get people in cross-functional teams to work together with all these new services and practices?

The answer is to run practical exercises. These need to be cross-functional projects that are engaging, quick to start, and quick to yield results.

Engagement + Collaboration = Progress

What would be a practical exercise and why would it help? Find a partner with domain expertise both in cloud and upskilling employees, and you’ll be able to get guidance to create blueprint-like exercises that can be applied to projects.

These blueprints can work across teams, with contributions across IT, Engineering, Product, and collaboration between all managers. Further, the bar for the learning experiences gets higher every year, with learners wanting the ability to have very little friction when learning (think about coding labs starting in 30 seconds vs. 3 hours of installing and troubleshooting software). It also makes more and more sense to learn together with coworkers, as opposed to the current single-player experience. This shows that the area of collaboration is set to be huge and will drive a better user experience and faster, more communal learning.

Conclusion

This type of engagement at the individual and communal level, with real-time tracking and modification of progress and goals, is going to be key to helping any team stay focused as they go through their cloud transformation. Understand that no matter the size of the organization, there will need to be attention paid to learners and managers as their attention naturally wavers. Having a concrete plan before addressing this makes it infinitely easier to address natural speed bumps and challenges as they occur in the learning and cloud migration process.

Ready, Set…Cloud!

If you’d like a preview of what our blog series will cover in a more in-depth fashion, this guide is a great start. We share some best practices and insights gained from our experience helping many organizations on their journey to cloud success. Use it as a helpful reminder to stay on track.

The post Cloud Migration Series (Step 4 of 5): Adopt a Cloud-First Mindset appeared first on Cloud Academy.

Cloud Migration Series (Step 3 of 5): Assess Readiness

Cloud Academy Team — Thu, 13 May 2021 12:31:50 +0000

This is part 3 of a 5-part series on best practices for enterprise cloud migration. Released weekly from the end of April to the end of May 2021, each article will cover a new phase of a business’s transition to the cloud, what to be on the lookout for, and how to ensure the journey is a success.

Be sure to subscribe to our blog to be notified when new content goes live!

Assessing your ready state

Last time, we talked about detailed planning that forms the foundation of your cloud migration effort. Now it’s time to really understand what your team can do, and how you can help them get to a place where they’re empowered to enact a big organizational shift.

Making a big change like this is a huge undertaking when you work in a large organization. There’s little guidance out there for executive teams on just how much learning effort is required to turn the ship around. What’s needed is to have your Learning & Development team present a clear direction on the training goals — the result will go far to ensure that things go smoothly.

Why is the buy-in from L&D important? Some of the biggest mistakes happen in the early stages of cloud adoption when enthusiasm for new tasks and appetite for experimentation is high, but knowledge of best practices is low or non-existent. In order to get into a cloud console and start doing things, your staff doesn’t necessarily need to pass a cloud certification exam before they can spin up instances. It’s easy for people to make expensive mistakes in the early stages of adoption if they don’t know or understand the proper operating procedures. L&D needs to be the first line of defense in the first stages.

So our first goal needs to be understanding where your team stands with regard to technical skills, followed by a plan to get them to a ready state to accomplish the technical goals that you laid out in the planning phase.

Readiness is both a state and a process

You’ll start by accurately pinpointing your team’s baseline skills. Once they are on track, you’ll then want to upskill them again to make sure they stay up to date with constantly changing cloud technology.

This brings us to a main point. As you assess your team, grow their skills, and make sure they’re on track, you’ll realize that this is actually a process of continuous development. Just like agile software teams that work in sprints and constantly deploy code updates to certain parts of an app in order to stay current and update products, the readiness stage of your cloud migration is the same. It will be a culture change for your organization, one with positive impacts because your teams will be ready, and will be motivated by individual and group growth and success.

Here are some highlights of what a readiness program should entail:

Pre-assessment

To get started, you’ll need to determine your team’s current skills and capabilities. This means creating a breakdown for each individual member that can be updated as they progress through their learnings.

Skill Assessment

You don’t want to be in the dark about your team’s abilities, so you’ll need a full view in order to build on each member’s development. This helps you predictably upskill talent so you know exactly when they’re ready to tackle that new project. Useful metrics will include strengths, weaknesses, and areas of opportunity.

When you think about it, the ROI of your tech stack is only as strong as your team members operating on it. You’ll need to use the insights from multiple skill assessments to paint a picture of broader skill coverage, ideally from the individual, team, and organization level.

Dashboards

Yes, dashboards are all the rage and will continue to be so, for good reason. A well-designed dashboard helps you cleanly separate signals from the noise.

Ideally your dashboard to monitor organization readiness will contain the following items:

ability to assign assessments
current skill levels
an organization of teams that effectively mirrors your internal organization

These seem like simple things, but they often get overlooked. It’s key to focus on these tenets because while the many programs roll out in tandem, it will be easy to get overwhelmed with too much information.

Cloud-first: four key goals to keep in mind

It bears repeating that there will be a lot going on in your organization. This readiness and learning stage of a cloud migration is going to set you up for success as you adopt a cloud-first mindset.

The new mentality at your organization will consist of the following key understandings, which will eventually become part of the healthy baseline culture.

Being aware of cloud tools and services
Being able to use cloud services effectively, economically, and safely
Understanding how to apply cloud tools to solve customer problems
Being able to use your cloud services together to create new products and solutions

Conclusion

This is exciting stuff — changes, new technology, opportunity for growth. In part 4 of our series, we’ll delve into maintaining that cloud-first mentality, especially when the going gets tough and real-life challenges invariably pop up to get in your way.

Ready, Set…Cloud!

The post Cloud Migration Series (Step 3 of 5): Assess Readiness appeared first on Cloud Academy.

Cloud Migration Series (Step 2 of 5): Start Planning

Cloud Academy Team — Thu, 06 May 2021 14:14:16 +0000

This is part 2 of a 5-part series on best practices for enterprise cloud migration. Released weekly from the end of April to the end of May 2021, each article will cover a new phase of a business’s transition to the cloud, what to be on the lookout for, and how to ensure the journey is a success.

Be sure to subscribe to our blog to be notified when new content goes live!

Start planning your cloud migration

You’ve defined your cloud strategy. You understand why you want to take a journey to the cloud. Now that the high-level strategy is done, let’s focus on the nitty-gritty and the details. It’s time to make a plan for making the big change. Don’t get discouraged, as it’s going to be a lot of work. That’s why we’ve created this series to help you.

Budget for the journey

You’re going to have to sit down and have a realistic conversation with your technical leads and executives about how much a cloud migration is going to cost. Let’s be clear, a migration from legacy systems is not like flipping a switch, nor is it a one-time affair. Rather, like most change that lasts, it’s a process with milestones.

As you go through the continuum of change, you’ll eventually leave some or all of your legacy systems behind (depending on your views on hybrid cloud). But as that is happening, you’ll need to maintain old systems, scale new systems, and make sure you have the right employee talent to keep things progressing forward.

A few key facets to consider:

Current budget and fiscal year considerations
Ideal timeframe for transition
Product roadmap, short- and long-term

Check your infrastructure

Before you dive in and take the whole organization with you, let’s take a look at what’s being developed in the various groups in your business. Some of these might be more translatable to the cloud, such as lightweight mobile apps. But remember, there are so many cool technologies and buzzwords out there (Kubernetes, agile, data lakes, real-time everything) — you have to think hard about whether there’s really a business case to jumping in.

For example, maybe you run a monolithic app that’s been your bread and butter for years. Maybe it’s worth it to host the app in the cloud, but not change to microservices. Instead, since you’ve taken a good look at your product roadmap you might spin off a new product that can then benefit from some quick cloud-based solutions, such as turnkey managed services like AI libraries or data analysis.

Initial organizational alignment

The last thing you want is for people to leave meetings about your migration and have nothing happen. Fast forward six months and progress has been scattershot, not much morale, and zero inertia. But let’s not focus on the negative…what are actionable steps to take to guarantee forward movement?

In part one we talked about organizational buy-in. A good way to maintain internal accountability is to create a multi-disciplinary “Tiger Team” — a group of individuals who can meet to maintain focus and ensure that separate groups don’t become too siloed.

Modify this to your own needs and don’t bog people down with fluff, but do hold them accountable, whether it’s by unintrusive meetings or reports to leadership. Remember, this effort has to be supported from above, as a positive culture starting from the top can be contagious.

Assess your team

How are you going to get your team from point A to point B? Will they be ready to not just take steps toward a migration, but toward creating products in a new way?

Your product, engineering, and IT teams are experienced in designing, creating, testing, and deploying monolithic applications. They probably have some experience in cloud technology already, and it’s more than certain that they have a deep curiosity to learn more: that’s part of the creator’s mindset and that’s why they’re in this field.

Now you need to establish a baseline for where their skills are, how that aligns with your strategy and planning, and how to raise their skills to meet your strategy.

Develop a skills readiness plan

If you want to create an effective plan for your employees’ technical growth, you need a good way to assess their skills and develop them at scale. We’ll review this more in part three of this series, but here’s an overview of how it’s done.

Start by accurately pinpointing baseline skills

Test competence across multiple cloud platforms and technologies and track skill improvement
Test practical, hands-on tech skills
Streamline the assessment process with automated reminders
Understand where your team stands and how fast they’re growing

Quickly increase technical capabilities

Drive skill growth with hands-on cloud training programs built to master AWS, Azure, Google Cloud, DevOps
Build and assign training plans with 10,000+ hours of up-to-date cloud training
Keep your team accountable with built-in reminders and weekly reports
Track progress and completion on a real-time dashboard

Confidently know when your team is ready

Measure practical expertise through skill reports based on hands-on assessments.
Challenge your team with lab scenarios using actual AWS, Azure, and Google Cloud accounts
Establish a data-driven approach to learning and skills management
Understand your team’s strengths and identify skill gaps

Cloud adoption plan

Migrating to the cloud isn’t just about technology. The mindset and the repeated, tactful reminders to stay on course help to make all the difference.

You’ll find that one of the main challenges with transformation projects is keeping a clear sense of direction. Often with a transformation project, there isn’t a dedicated project resource to run it. That means it’s easy for people on the ground to lose focus a bit and end up working in silos or vacuums. Confusion can set in and the wheels can quickly fall off — everyone loses interest and inertia.

What can make a significant difference is when learning and development have a clear program structure to drive the behavioral outcomes that leadership wants to see. This can be the backbone to build your cloud adoption plan on.

On top of this framework, you can start to build the basics of how to use cloud services both securely and efficiently. Then you will layer the most important factor on top: your people. Your people will use the framework to both increase their skills and collaborate with new (and sometimes scary) tools and technologies.

Conclusion

Now you have a plan for how to get your arms around this whole digital transformation. Next, we’ll dig deeper into your people and how to assess readiness for your team. You’ll learn about what to look for and how to know you’re all set for your cloud migration and for whatever technical projects you choose in the future.

Ready, Set…Cloud!

The post Cloud Migration Series (Step 2 of 5): Start Planning appeared first on Cloud Academy.

Cloud Migration Series (Step 1 of 5): Define Your Strategy

Cloud Academy Team — Thu, 29 Apr 2021 04:00:18 +0000

This is part 1 of a 5-part series on best practices for enterprise cloud migration. Released weekly from the end of April to the end of May 2021, each article will cover a new phase of a business’s transition to the cloud, what to be on the lookout for, and how to ensure the journey is a success.

If you’ve already locked in your strategy, have a look at what you should do next.

Be sure to subscribe to our blog to be notified when new content goes live!

Getting Started

Cloud migration is the process of migrating IT components (data, applications, systems) from on-premises to the cloud, or from one cloud platform to another. Modern enterprises have embraced cloud computing for its superior speed and agility, cost savings, and always up-to-date, automated software releases. In fact, according to a survey conducted by Statista, about 50% of all corporate data is now stored in the cloud.

The evolution of technology, i.e., big data, machine learning, artificial intelligence, and the Internet of Things (IoT), has played a major role in enterprises making the shift to the cloud. At the same time, external factors like the COVID-19 pandemic, which forced companies to operate and onboard new employees in remote environments, have facilitated the further acceleration of cloud adoption. Gartner predicts worldwide public cloud spending to grow by 18.4% this year.

For those just getting started, you may be further along than you think. There’s a good chance that you already use the cloud in day-to-day operations without even realizing it. We’d bet that your email provider, file storage, and CRM are cloud-hosted applications — to name a few. But the cloud offers a lot more than that. Think about the possibilities of auto-scaling to meet any customer demand across the globe, or leveraging containers and microservices to modularize your applications, keeping your products running with high availability. These are just some of the benefits you’ll be able to take advantage of once you’re well on your journey.

Like any business transformation, getting started with cloud migration is often the most difficult and daunting challenge. There’s a lot to consider, which is why defining your strategy is a critical, yet often overlooked or underdeveloped, first step. Let’s dive in.

Identify Cloud Migration Goals

Most people are familiar with the generic benefits of cloud computing, but envisioning (and executing upon) them for your own organization is an entirely different story. Every business’s IT infrastructure, processes, and regulations are unique. And cloud value is perceived differently depending on industry and operating model.

Before any steps are taken toward migrating to the cloud, tech teams must first understand how such a move fits into the business’s overall strategy. Are there existing problems that could be fixed through cloud adoption? Would moving certain processes to the cloud save costs? How can the cloud further enable innovation?

Defining concrete goals based on KPIs that are relevant to business objectives will lay the foundation for all future initiatives. After all, if you can’t measure success, what’s the point of investing in the first place? Here are some topline ideas to mull over when thinking about your goals:

Reinforcing business continuity plans
Reducing costs and avoiding vendor lock-in
Improving execution on your product roadmap
Delivering better customer support or user experience
Increasing revenues as a result of improved customer retention

Gaining Organizational Buy-in

It was intentional when we said “business transformation” instead of “IT transformation” earlier. In many cases, leadership leaves these kinds of decisions to technology teams. But when it comes to a cloud migration effort, an all-hands-on-deck approach is required — and that starts at the top.

Let’s think for a moment about the benefits associated with moving to the cloud. From improving flexibility to saving costs and streamlining the customer experience, it’s logical to connect your business goals with your digital transformation activities. Leverage the expertise of enterprise architects to analyze applications, identify potential quick wins, and develop a best-case proposal for migrating on a larger scale.

By performing an analysis of the application portfolio and communicating anticipated benefits to the business, technology leaders can structure a measured approach to cloud adoption that is more likely to get backing from executives. From there, you can communicate next steps and value to affected parts of the organization. Strategy = defined!

Part 2 of this blog series will discuss what you must do during the planning phase.

Ready, Set…Cloud!

The post Cloud Migration Series (Step 1 of 5): Define Your Strategy appeared first on Cloud Academy.

Introduction to Streaming Data

Cloud Academy Team — Tue, 16 Jul 2019 12:50:40 +0000

Designing a streaming data pipeline presents many challenges, particularly around specific technology requirements. When designing a cloud-based solution, an architect is no longer faced with the question, “How do I get this job done with the technology we have?” but rather, “What is the right technology to support my use case?”

In this blog post, we will walk through some initial scoping steps and walk through an example. To learn more about data streaming, check out Building a Solution Using Artificial Intelligence and IOT.

The first questions

Regardless of exact details, four initial questions can be asked to start narrowing down what needs to be done:

1. What are the 3Vs of data?

This question is key to understanding the data itself. The 3Vs are Volume, Velocity, and Variety. Basically, the point of this question is to understand how much data there is (Volume), how fast it is coming in (Velocity), and how standardized it is (Variety). With this question, a non-descriptive statement such as, “We have 10gb/day of data” is further refined into “We have 1,000,000 10kb web-app logs a day that come in from our front-end web app.”

2. Where does the data need to go?

Data movement, especially between locations, can be a difficult challenge. This question is particularly relevant to the IoT field where data generation is often a remote site and not in the home data center. Even for non IoT use cases, understanding if the data is on an application server, a core database, or even an FTP server is key. Knowing where the pipeline starts and stops will allow you to hook into the existing systems with ease.

3. What format and condition is the data in?

This question overlaps some with the first one, 3Vs of data, but allows us to define with more granularity how much massaging is needed. Is the data in avro, csv, or already in a SQL database? If it is csv data, are there improper commas in the free-text fields? This question allows us to understand the “start” of the pipe’s requirements.

4. What do we need to do with the data?

Very rarely does a streaming pipeline need to simply pickup and drop off data. Often it needs to be enriched against a database, transformed, and cleaned before being dropped off. This question is potentially the most complicated to plan for as it could involve deep introspection into the contents and syntax of the data itself.

Streaming data vs batch processing

The difference between batch processing and streaming is contentious and outdated. Although there are academic differences between batch and stream processing, from a practical perspective, architects should ask themselves, “How much lag is acceptable between data availability and output?”

Being able to define a requirement as “three seconds between data being on the FTP and being processed into mysql” is much more descriptive and useful than “micro batching.” Technologies may lend themselves to lower latencies (e.g., NiFi, Spark Streaming, Beam) vs higher latencies(Map Reduce), but they should be thought of as a sliding scale vs distinctly different approaches.

Walking through an example

Recently, one use case we had to work through was helping a manufacturing company connect its machines to the cloud for centralized monitoring. At first, this can seem like a relatively large task, but it becomes much more manageable when broken down with scoping questions. (Disclaimer: Details and numbers have been changed for privacy reasons).

Scoping questions

What are the 3Vs of the data?

Production logs will be made available every 4-5 minutes and be an average of 500kb in size. They are extremely standardized as they are created by the instrument’s control software. Across all of the instruments, the expected data rate is below 200mb/hour.

This gives us insight as to what our system will need to handle. The size of incoming messages of 500kb may be a problem for some systems, but the overall data flow rate is manageable. Standardized messages are generally easier to handle than non-standard messages.

Where does the data need to go?

Files are being generated on machines that sit on the factory floor. These instruments have an SFTP server on them that allows remote access. Ultimately, the data needs to be put into the central quality tracking system. This system has an RDBMS at its core and exposes a JDBC interface for integration.

Both the pick-up and drop-off point are easily accessible with common, easy-to-use interfaces.

What format and condition is the data in?

The data is in standardized XML format. These are consistently generated per run.

What do we need to do with the data?

There is a lot of junk data in the machine logs. We need to strip out everything except a few fields pertaining to quality and throughput. This needs to be made available as quickly as possible to detect any errors or problems on the manufacturing floor.

Next steps

This blog post should be considered a primer to thinking about streaming data problems. For a deeper dive into case studies and a practical example, check out Calculated System’s Streaming Data Example or Automated Manufacturing Case Study.

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data. Cloud Academy offers a Getting Started with Amazon Kinesis Learning Path — objective-driven learning modules made up of Courses, Quizzes, Hands-on Labs, and Exams. In this Learning Path, you will learn the Amazon Kinesis components and use cases, and understand how to create solutions and solve business problems.

The post Introduction to Streaming Data appeared first on Cloud Academy.

New on Cloud Academy: Machine Learning on Google Cloud and AWS, Big Data Analytics, Terraform, and more

Cloud Academy Team — Thu, 03 May 2018 03:00:05 +0000

A 2017 IDC White Paper “recommend[s] that organizations that want to get the most out of cloud should train a wide range of stakeholders on cloud fundamentals and provide deep training to key technical teams” (emphasis ours). Regular readers of the Cloud Academy blog know we’ve been talking about this for a long time. Future-proofing your organization requires technical excellence, collective experience, business context, and shared understanding. In a word, culture.

Cloud Academy’s latest Learning Paths go broad and deep—covering CI/CD, machine learning, AI, big data, and even preparation for the first AWS certification designed for non-technical staff.

Here’s what’s new on Cloud Academy:

Solving Infrastructure Challenges with Terraform
DevOps and IT professionals managing infrastructure across public, private, and hybrid clouds can use this learning path to get started with Terraform. You’ll learn when to use it, the ins and outs of configurations, and how to work with providers and resources.

Applying Machine Learning and AI Services on AWS
This learning path will help those with some machine learning experience to begin applying core Machine Learning and Artificial Intelligence services available on the AWS platform including Amazon Rekognition, Amazon Lex, Amazon EMR, and more.

Machine Learning on Google Cloud Platform
Neural networks are the hottest approach to machine learning, and TensorFlow is the most popular toolkit for building them. Both come together on Google Cloud Machine Learning Engine. This learning path will help you get started building and training neural networks with TensorFlow and Google Cloud Machine Learning Engine.

AWS Developer Services for CI/CD
AWS developer tools and services take much of the undifferentiated heavy lifting out of building a DevOps practice with a structured CI/CD process. This learning path focuses on the practical and hands-on knowledge teams need to understand, operate, and master running DevOps processes on AWS Developer Services.

Cloud Practitioner Certification Preparation for AWS
For sales, marketing, finance, and other non-tech roles, this learning path trains the cloud basics and the most important AWS products from a business perspective, including hands-on experience in compute, storage, databases, and networking.

Big Data Analytics on Azure
Azure’s robust services make it easy and cost-effectivee to incorporate big data analysis into your cloud applications. This learning path focuses on getting your team up to speed using two key Azure services: Data Lake Analytics and Stream Analytics.

What’s Next?

Assign these Learning Paths with a Training Plan, customize them with Content Engine, or stop by our booth at AWS Summit London on May 9-10 to discuss how you can use all three to power your teams up the cloud capability curve.
Explore all of our learning paths, courses, and hands-on labs in the Cloud Academy Content Library.

The post New on Cloud Academy: Machine Learning on Google Cloud and AWS, Big Data Analytics, Terraform, and more appeared first on Cloud Academy.