Why Cloud Must Be a Part of Your Data/Analytics Strategy

Vikram Marathe

Vice President Data Platforms at WarnerMedia

Learning Objectives

Data is exploding and it is necessary for every organization to make sense and monetize this data in a timely fashion. Cloud makes it possible to try out and iterate quickly with many different solutions and allows access to highly scalable compute and storage on demand.. This presentation will highlight the main advantages of the cloud in the Data and Analytics space.


Key Takeaways:



  • Scalability

  • Number of solutions that are available quickly

  • Separation of Compute and Storage


*This video was originally filmed 18/03/2021 for the US Enterprise Tech sister event, CIO VISIONS VIRTUAL Summit.


"Some of the top benefits of cloud computing are the ability to scale quickly and access to multiple softwares from various vendors without a long acquisition process."

Vikram Marathe

Vice President Data Platforms at WarnerMedia

Transcript

Hi, welcome to today’s webinar. The topic that we’ll be discussing today is why cloud must be a part of your data analytics strategy. My name is Vikram Marathe. I work as a vice president of a data platforms team at WarnerMedia. This team includes data engineers, architects and DevOps teams. I’ve been working in the data field for close to 30 years. And I’ve seen many changes happen during this time, there was a time when databases could only hold in the order of gigabytes of data. Whereas today, we have platforms that can scale up to multiple petabytes of data. If there is one thing that I have learned over the years, it is that you need to be nimble, you always need to be on the lookout for the next thing that’s going to revolutionize your field. And you need to embrace it wholeheartedly. But I have to be honest, in spite of knowing that when the clouds started becoming really popular, I was extremely skeptical about its importance for a large enterprise like ours. I thought, a large company like ours, does not need to have, you know, a public cloud, we have our own data centers, we have staff that managed everything within those data centers, including hardware, software, and networking gear, I felt cloud was probably more important for small and medium sized businesses, those that could not afford employing many engineers to manage the computing needs, and invest in all the hardware and software resources upfront. Though, around 2013, it’s started dawning upon me that we did not have enough compute or storage resources to accommodate all the data that we were receiving. That’s when I started looking into the cloud as a real option for our company’s data processing. I’m sure most of you can relate with that. And you must be in various stages of this journey. Some of you may still be thinking about it. Others may have decided to migrate. And a lot, you may already be following the cloud first mantra. In today’s webinar, I’m going to talk about why the cloud should be an important part of every organization’s data and analytics strategy. And why I feel that if you’re not already there, then you should actually start thinking about your strategy to get there. We’ll go through some of the reasons why being in the cloud makes sense. If you are already there, then you can still benefit from some of the best practices and architectural tenets that we’ll go through here today. So when you decide to utilize cloud computing, computing, you have many different options. Let’s now look at the typical cloud offerings that are available out there. The more talk about cloud offering type is public cloud. This is where most of the big players such as Amazon, Microsoft, and Google have their popular offerings. These cloud environments typically include compute, storage, and many other specialized applications and offerings. In a public cloud, multiple customers are using same resources, maybe not at the same time. But over time. There are a lot of security procedures that ensure that one customers data is not available to another customer. Or one customer does not have access to another’s compute resources on applications. That is how they can offer you tremendous scalability, and your costs are reduced. Because you’re sharing underlying resources. This allows you to a customer to spin up resources on an on demand basis. The next type of offering is the private cloud. Here typically a service provider creates a separate cloud environment that is isolated from other customers. an enterprise may have their own private cloud that they create by themselves. But this is still different from having your data centers because data centers typically only make hardware and networking and some basic software available, but they do not make any services. plies offerings available to you. Some of the advantages of a private cloud include single tenancy and to some extent more secure


Or at least the perception of security. I’m saying that because I strongly believe that with proper care and monitoring, you can secure your resources in the public cloud environment. a compromise solution is the hybrid cloud. This has advantages of both the private and public cloud. So you have a bit more security, and some amount of compute and storage that is always available to you to serve your normal needs. At the same time, you get full access to the scale, you will also see those who are migrating to a cloud sometimes have some applications in the public cloud, while others are still on prem. So one could technically say that they are using a hybrid cloud environment as well. Another commonly used term you will hear is a multiple. This allows companies to utilize best of breed applications from different providers, while keeping their long term options open and reducing a lock in with the provider. If you’re wondering which of the clouds to embrace, that is not the only decision point that you need to figure out. There are more terms you need to understand in order to figure out how to proceed with your cloud journey. These terms are about the level of services you can take advantage of. Infrastructure as a Service essentially offers you basic compute storage, and networking. Hira is a platform as a service or past. This provides additional services such as development tools, database systems, BI tools, and middleware components. Software as a Service, or SAS provides complete software applications. If I, I would say Infrastructure as a Service is similar to someone putting a water line all the way up to your home, and then providing you with all the required pipes, walls and taps, as well as filtering equipment, you them and ensure filters are replaced regularly. And check the water quality before you consume that water passes where everything is connected and available. But you’re still on the hook to ensure that filters are the latest version, and you need to replace them regularly. This is similar to having to ensure all the OBS patches are there when you bring up a compute resource. Software as a Service, on the other hand, is similar to opening a tab and getting water to start showing up. You do not need to worry whether the right filters are being used and replaced on time. And you do not need to test the quality of the water. You just consume it when needed. They provide complete software applications. Now you might say that this is all good information. But why do I need to use the cloud for my data and analytics. So let’s look at what is different about today’s data and analytics landscape. And particularly today’s Big Data landscape. So let’s look into the five V’s of big data. Of course, depending on the you know person you’re speaking with, you may hear three V’s, or seven reads. But the message is really to convey how much the overall data landscape has changed. We all see that digital platforms and IoT devices are producing data Fast and Furious. And it is coming to our data platforms in large volumes with variety of formats. And at very high velocity. If you want to store all this data, and cleanse and transform it and to improve its veracity, and ultimately value for your business, you do need lots of compute and storage capacity. A compute requirement may change over time due to daily or seasonal fluctuations. If you’re trying to do this processing on prem, you will need to provision capacity for the highest peak that you expect. Most of that capacity will remain idle. And in case you receive more data than you plan for then you may end up not being able to process this data on time.


Which is not something you would want to tell your business users. So remember, having the data and using it appropriately is what separates successful companies from also RANS. Take a look at Google, Facebook, Netflix or calm countless other companies that don’t just deal with bits and bytes for their businesses. But without utilizing their data assets, they would not be where they are today. So it is the need of the time, you need to be data driven in your business decisions. Let’s look at how the cloud helps you solve some of these big challenges. I’m sure most of you have spent months trying to get that larger server or servers that you needed to run your databases. First, you need to get all the financial approvals, then signing MSS MSO W’s for the server, networking equipment, Sam, and many, many other things, then making sure all of it is delivered, racked, connected, and prepped. And then the database software is installed. And on and on. Now compare that with a cloud database. You can set it up in minutes with a few keystrokes, and then connect it to a BI tool. The entire process of getting up and running can be done in a day or less with your own data, you know, maybe not all of it, but at least some of your data. And especially if you have already been doing this, you could do this and far lesser time than even a day to set up a new environment and quickly test drive it. Of course, the big job of doing ETL and ELT still remains. But here too, you can scale up and scale down as needed. You can also utilize marketplace or cloud providers solutions without having to sign individual contracts, you have the ability to try multiple solutions available to perform your PLCs there is no need to sign multiple PLC agreements with multiple vendors. Also, the setup required to bring up necessary hardware and software is very easy. This gives you the ability to focus on your PLC and spend quality time there. This means if things don’t work out, you can fail fast and get to a solution that is the most optimal one. for your use case, you can actually get there much faster than trying to do this on prem. Similarly, to ensure quick turnaround for patching for security or any other software patches, you can spin up a brand new test environment with required patches and tested there before you can move it to production. This is possible to do for on prem environments, but typically you need more hardware. And that is even if you use virtual machines or containers, there are a lot of different types of compute nodes available, which can have different compute characteristics. So some of them may be based on a CPU, GPU or arm based service. Or you can use various types of storage, such as magnetic SSD, you can pick whatever you want, whenever you wanted, and create an optimal solution. I for one, definitely get the feeling that a young child would get in a candy store. Imagine having on demand access to so many things with just a few keystrokes. Try doing that in an on prem environment. The final point in this area is you have the ability. Of course depending on the software’s you choose to separate out compute from storage. To me, this is one of the most important things you should look for in your cloud software’s whether it is a database, or an analytical tool. This gives you the ability to utilize your data with multiple independent tools at the same time. You do not need to copy the same data multiple times to be used by these tools. The side benefit here is that it is easier to comply with various privacy regulations. If there is only one copy of data, you need to erase it only from there. This makes individually Ranger request compliance is easy. You can do rapid prototyping, using multiple software’s against the same data.


Another very important benefit is the scalability it affords the separation of compute from storage allows you to have multiple clusters run against the same data, allowing you to provide elastic compute That can scale with the user demand. The ability to scale compute as needed, is one of the most important reasons why your data and analytic workloads should be in the cloud. This elastic nature allows you to spin up multiple environments for Dev, QA, or even par as needed, and when not needed, you can tear them down very quickly. Just as compute is elastic, cloud prohub providers allow you access to a very large pool of storage. And multiple tiers allow you to separate your heart objects from your cold objects, and does spending far less money that way. When you utilize either the object stores, or the elastic file systems that are available with your cloud provider, you can for all practical purposes have unlimited storage. These storage systems typically have very good performance, and high availability and durability. The data is stored redundantly across multiple availability zones, you can also do cross region replication for disaster recovery, or to solve user latency issues. The cloud definitely reduces the staff needed to take care of infrastructure, as well as database and app management. But you need to focus more on security. So I would say you can deploy those resources to focus on security and privacy compliance. And finally, you get easier disaster recovery. So in a cloud environment, it is quite easy to just make sure you have designed from you know, you need to design it from the ground up, of course, but do you have with a few clicks of your mouse, you can actually bring back any data that you may have lost from your application, but it’s stored in an object environment. Let’s now look at some basic tenets, you should follow in your architectural patterns for the cloud. When migrating your on prem applications and databases to the cloud, you may be tempted to take things the way they are and just put them in the cloud. However, this lift and shift may be good for the short term. But it does not allow you to take advantage of most of the cloud features that we have discussed so far. So you should choose applications and databases that allow you to separate storage from your compute. If not, your applications will not be able to scale easily. And you will keep resources that are not being used on forever, which could actually make your spend higher your on prem environment. also utilize infrastructure as a code approach. This will ensure that you can automatically scale compute when necessary, or replicate code to create new resources. If your admins create new compute, storage, or any other resources for add users, and grand and various permission through a UI, then every time you want to replicate that environment, you will need manual interventions to figure out everything within your environment. And in that case, you may not be able to bring back all of your infrastructure in another region, or even in the same region in case of a disaster. So whenever possible, utilize serverless architectures. These allow you to scale up and down very easily. And normally, these require you to separate storage from compute anyway. So that’s an additional bonus. Of course, you may not always have serverless options. But with each passing day, cloud providers are adding more and more serverless options. When you are in a public cloud environment. It is important important to focus on security, because on prom, may afford you some security out of the box, just because it is a single tenant solution. Of course, I’m not saying that security should not be at the center of whichever environment you’re operating in. However, we continue to hear incidences where we learned that somebody left their storage buckets open


with public access, and a lot of information was leaked on some of the areas that you should focus on focus on include securing your compute and storage, isolating your environments with V PCs, encrypting data at rest and at transit using multi factor authentication for logins and utilizing the principles of least privilege And whitelisting only those IP addresses that should have access to your environment. Finally, paying close attention to cost is extremely important. The whole point of going to the cloud is you can flex resources as necessary. So you should never reserve or commit for all the resources you need. Always shut down resources that you do not need, and use storage tiers, from hot to cold storage. This will allow you to optimize storage costs. I always like to remind people that difference between a cloud versus your on prem environment is similar to difference between buying and maintaining a car versus renting a Zipcar which allows you to pick up a car in certain parking lots, and pay by the hour, and then drop them off at certain locations. This is like a platform as a service, or using an Uber is similar to a software as a service. Your private car can only carry a limited number of people at any given time. But you can get as many zip cars or Ubers as you want, as number of passengers you have. And as number of passengers increases, you can go to as many different destinations as you want, with all these different passengers going to different places at the same time. However, if you always keep the maximum number of zip cars, that, you know that’s going to cost you a whole lot. So So don’t keep them on for like or don’t keep the zip cars with you for 24 by seven 365 days a year. This is not the way you would do that with a Zipcar or an Uber. And that’s what you need to remember, that’s not the way in the cloud either. So shut down any resources that you are not using. That is the only way to have a good cloud environment. Now finally, let’s summarize you know, and look at some key takeaways for those who will be deciding whether to migrate to the cloud, or in case if you already are in the cloud, then how to optimize your current cloud data and analytics deployment. So let’s first look at some of the top benefits of cloud computing. The first is ability to scale quickly, access to multiple software’s from various vendors, without a long long acquisition process, and then later to separate compute storage. That gives you many different benefits like scalability, or shutting down things when not needed. Finally, as we wrap up, here are some final thoughts. The cloud is no longer an option, whether you are a small or medium business, or a large enterprise. Cost Management is extremely important. In an all you can eat, pay as you go cloud environment, where one can easily spin up resources and neglect to shut them down and not needed. And that can end up costing you a lot. Make sure you utilize one of the many available tools to identify such wastage, to comply with privacy regulations, reduce keeping multiple copies of your data, which is very easy with the cloud based storage options that you have, like object storage, for example. Focus on security, deploy savings and infrastructure, human resources into security. That will pay you huge dividends down the line. And then focus on democratic democratizing your data with bring your own Tool Options. Being in the cloud, it’s very easy to bring many different tools, you don’t need to go through the long process, as we have talked about it for. So allow people to have many different options. And in case you’re not there yet, start small. And you know, you don’t have to put everything at the same time. That I would like to thank all of you for your time today. And I hope that this was useful for you and gave you some important considerations to focus on.


If you have any questions you can reach out to me either via quolls network or via LinkedIn. Thank you and have a great day.


Get full Q/N Access

Sign up to Q/N with a few details to watch this presentation.

  • Hidden
  • Hidden