cseferlis

cseferlis

Key Terminology in Azure Databricks

In a previous post, I talked about Azure Databricks and what it is. In review, Azure Databricks is a managed platform for running Apache Spark jobs. As it’s managed, that means you don’t have to worry about managing the cluster…

Business Continuity Strategies in Azure

Keeping businesses online and operational is a key concern, no matter the nature of your downtime. Most companies don’t focus on business continuity until it’s too late or have incomplete, untested barebones recovery plans. High Availability, Disaster Recovery and Backup…

Overview of HDInsight Kafka

Continuing with my HDInsight series, today I’ll be talking about Kafka. HDInsight Kafka will sound much like Storm but as I get into the nuts the bolts you’ll see the differences. Kafka is an open source distributed stream platform that…

Overview of HDInsight Storm

Next in my series on HDInsight, today I’ll be talking about Storm. HDInsight Storm is a distributed stream processing computational framework. It uses spouts which define information sources and bolts which are manipulations in processing to allow batch distributed processing…

Overview of HDInsight HBase

In continuation of my series on HDInsight and the different clusters within it, today I’ll cover HBase. HBase is a NoSQL database that provides random access and strong consistency for structured, unstructured and semi-structured data. It’s a schema-less (or organized…

Overview of HDInsight Spark

Today I’m continuing my series on HDInsight with the focus on Spark clusters. HDInsight Spark clusters provide the required baseline for in-memory cluster computing. This technology has gained momentum over the last few years as the required levels of memory…