How are Big Companies using Apache Spark

Dockers Monitoring Tools
December 9, 2017
What do Big Data Analysts actually do?
April 28, 2018
Show all

How are Big Companies using Apache Spark

The big data marketplace is growing big every other day. The competitive struggle has reached an all new level. This is why open source technologies like Hadoop, Spark, and Flink must find valuable use cases to top the big data marketplace. A new approach to tackle problems is always needed and this is what all the open source technologies are trying to attain. Catering to something unique as compared to the rivals is a must and this is what Apache Spark is all about!

Concentrating on technical features and capabilities is a must when it comes to early adopters. Application development progress happens when companies develop confidence about the reliability and scalability in terms of larger data volumes. Spark with over 90 contributors from 25 companies fulfils all such necessities. So let us have a look how big companies are using Apache Spark and how successful has it been till date?

Spark’s Common User Case

Companies heavily rely on a wide variety of data sources. This is used for their analytical products. Processing like cleaning, transforming, and fusing unstructured external data with internal data sources are all included in these data processing workflows. Especially when it comes to successful Startups, Spark is proving to be of great use. For non-programmers, certain companies have also created simple user interfaces which open up batch data processing tasks.

  • Stream Processing

For BDAS, the most famous components are Spark and Shark. But Spark Streaming real-time processing and PySpark Python API is also in the competition! The key feature of Spark Streaming is that the code used for batch processing can also be used for real-time computations (with minor tweaks). This refers to programmer productivity. Due to this amazing feature, many companies have started using Spark Streaming. Applications like stream mining, real-time scoring2 of analytic models, network optimization, etc. are pretty much included. Also, CloudPhysics is using Spark Streaming for detecting patterns and anomalies. It is noted that 52% of the companies prefer Apache Spark when it comes to real-time streaming.

  • Advanced Analytics

Spark has its own wonderful advantages which always helped in attracting users. The speed and suitability for handling iterative computations as compared to Hadoop are far better. Iterative computations are especially used for advanced analytics. Working with Spark is suitable for companies and from early on itself, companies started writing their own Spark libraries for regression, classification, and clustering. Modern world problems like online advertising and marketing, fraud detection, and problems related to scientific research are being solved using Spark tools and libraries. The good thing is that it is becoming easier to develop such libraries for graph and machine-learning analytics. Approximately, 64% of the companies use Apache Spark to leverage advanced analytics.

  • Business Intelligence & Visual Analytics

Now, this is one of the most important aspects of any company. While MPP databases, open source SQL-on-Hadoop solutions Shark and Impala are gaining traction3, companies have now started using Shark and BlinkDB for interactive SQL analysis as well! While many companies are following the general approach, some of them have developed custom interactive dashboards. These are powered by Spark and Shark. Companies now use visual analysis tools like Tableau in harmony with Shark which sounds better as compared to static reports and query analysis only. More than 91% companies use Apache Spark because of its performance gains.

Why are big companies switching over to Apache Spark?

  • Yahoo: Advance Analytics using Apache Spark

Yahoo is already using Apache Spark and is successfully running projects with Spark. Yahoo itself is a web search engine and has one such project which offers the perfect content for the perfect visitor which is known as personalization. This is possible because of Spark. The most important part of this project is machine learning algorithms which identify individual visitors’ and their interests. This further helps in catering to the news which they love to read/watch. So when a user visits Yahoo, the search engine makes sure that he/she is catered what he/she loves. To achieve such a precise level of personalization, real-time processing power and high-speed is needed. This is certainly attained with the help of Apache Spark!

  • ClearStory: Multiple Data Sources

A startup which is known as ClearStory recently built a platform which allows users to fuse data multiple sources in no time! It also produces interactive visualizations. The below-given image explains it further:


In the finance industry, banks are using Spark as the alternative to Hadoop. Spark is especially used to access and analyze social media profiles, call recordings, emails, etc. This helps them for making correct business decisions for target advertising, customer segmentation, and credit risk assessment.

  • Financial Institution 1: Retail Banking & Brokerage Operations

A financial institution which is into retail banking and brokerage operations has been using Apache Spark and it has led to a reduction in its customer churn by a whopping 25%. The platform is divided into retail, banking, trading, and investment. For a 360-degree view of the customer details, the bank uses Apache Spark which acts as a unifying layer. The bank now automates analytics with machine learning. The data of each customer repository can be accessed and is then correlated to a single customer file. This file is then forwarded to the marketing department.

  • Financial Institution 2: Analyzing

A financial institution uses Apache Spark for analyzing the text inside the regulatory filing. It also analyses its competitor reports. also helps in discovering the patterns regarding what’s happening and the market competition.

  • Financial Institution 3: Real-Time monitoring

Another multinational financial institution has implemented a real-time monitoring application which runs on Apache Spark and MongoDB NoSQL. These applications actually help the bank monitor client’s activity and identify issues. With the risk-based assessment, Apache Spark works well for financial institutions.

As we all know, E-Commerce industries are growing fast and the importance of real-time information is immense for them. This information can be passed further for streaming clustering algorithms, for example, K-means clustering algorithm. The results obtained are then combined with sources like social media profiles, comments, product reviews, recent search, etc.

  • Alibaba: Apache Spark

As most of us know, Alibaba is the largest e-commerce platform globally. Surprisingly, it also runs some of the largest Apache Spark jobs in the world! While some of these jobs analyze thousands of petabytes data, others are busy performing extraction on image data. Each & every user interaction at Alibaba is displayed on a large graph & Apache Spark is used for deriving precise results and getting fast processing.

  • eBay: Apache Spark

Another well-known e-commerce giant eBay uses Spark. It helps eBay in marketing for targetting specific offers and enhancing customer experiences. Hadoop YARN leverages Apache Spark at eBay. YARN manages all cluster resources which helps in running generic tasks. Hadoop clusters are leveraged by eBay Spark users ranging from 2000 nodes to 20,000 cores and 100TB of RAM via YARN.


With such progressive companies using Apache Spark to assist in business development and offering optimum client services, it is sure that Apache Spark definitely has a bright future!

Tao is a passionate software engineer who works in a leading big data analysis company in Silicon Valley. Previously Tao has worked in big IT companies such as IBM and Cisco. Tao has a MS degree in Computer Science from University of McGill and many years of experience as a teaching assistant for various computer science classes.


  1. Ashwani says:

    Quite informative! Spark is definitively winning the big data war!

  2. Harry says:

    Interesting read, thanks!

  3. andy chang says:

    I didn’t know spark is widely used in financial industry as well….

  4. manoj says:

    what is shark?

  5. Henrycrexy says:

    Paper Writing Service –

    We value excellent academic writing and strive to provide outstanding [url=]paper writing services[/url] each and every time you place an order. We write essays, research papers, term papers, course works, reviews, theses and more, so our primary mission is to help you succeed academically.

    Don’t waste your time and order our paper writing service today!

    Best Essay Paper Writing Service -

  6. AIndifieft says:

    Compression est comment calleux votre sang pousse contre les parois de vos arteres lorsque votre coeur determination pompe le sang. Arteres sont les tubes qui transportent prendre offre sang loin de votre coeur. Chaque age votre determination bat, il pompe le sang a tous egards vos arteres a la reste de votre corps.

Leave a Reply

Your email address will not be published. Required fields are marked *


get free access to this free guide, downloaded over 200,00 times !

You have Successfully Subscribed!

Level Up Big Data Pdf Book


get free access to this free guide, downloaded over 200,00 times !

You have Successfully Subscribed!

Jenkins Level Up

Get started with Jenkins!!!

get free access to this free guide, downloaded over 200,00 times !

You have Successfully Subscribed!