Big data Use Cases
1. Credit Card Fraud Detection
As millions of people are using a credit card nowadays, so it has become very necessary to protect people from frauds. It has become a challenge for Credit card companies to identify whether the requested transaction is fraudulent or not. A credit card transaction hardly takes 3-4 seconds to completion. So, the companies need an innovative solution to identify the transactions which may appear as fraud in this small time and thus protect their customers from becoming its victim.
An abnormal number of clicks from the same IP address or a pattern in the access times although this is the most obvious and easily identified form of click fraud, it is amazing how many fraudsters still use this method, particularly for quick attacks. They may choose a to strike over a long weekend when they figure you may not be watching your log files carefully, clicking on your ad repeatedly so that when you return to work on Tuesday, your account is significantly depleted. Part of this fraud might be unintentional when a user tries to reload a page.
Again, if you have made any transaction from Delhi today and the very next minute there is a transaction from your card in Dubai. Then there are chances that this transaction may be fraud and not done by you. So, companies need to process the data in real time (Data in Motion analytics DIM) and analyze it against individual history in a very short span of time and identify whether the transaction is actually fraud or not. Accordingly, companies can accept or decline the transaction based on the severity. To process the data streams, we need streaming engines like Apache Flink. The streaming engine can consume the real-time data streams at very high efficiency and process the data in low latency (without any delay).
2. Sentiment Analysis
Sentiment Analysis is the process of determining whether a piece of writing is positive, negative or neutral. It’s also known as opinion mining, deriving the opinion or attitude of a speaker. In sentiment analysis, language is processed to identify and understand consumer feelings and attitudes towards brands or topics in online conversations i.e., what they are thinking about a particular product or service, whether they are happy or not with it, etc.
If a company is launching a new product, using sentiment analysis we can identify users opinion about the same, based on the users opinion product can be improved. Sentiment Analysis enable business to make early decision rather than wait for sales reports. (Today you are launching your product. Today end of the day you will get information whether people are saying positive or negative.)
For example, if a company is launching a new product, it can find what its customers are thinking about the product. Whether they are satisfied with the product or not or they would like to have some modifications in it can be found out using Big data by doing sentiment analysis i.e., using sentiment analysis we can identify users’ opinion about the same. Then the company can take action accordingly to modify or improve the product to increase their sales and to make customers feel happy with their product.
Real example of sentiment analysis
A large airline company started monitoring tweets about their flights to see how customers are feeling about upgrades, new planes, entertainment, etc. Nothing special there, except when they began feeding this information to their customer support platform and solving them in real-time.
One memorable instance occurred when a customer tweeted negatively about lost luggage before boarding his connecting flight. They collect the tweets (having issues) and offer him a free first-class upgrade on the way back. They also tracked the luggage and gave information on where the luggage was, and where they would deliver it. Needless to say, he was pretty shocked about it and tweeted like a happy camper throughout the rest of his trip.
With Hadoop, we can mine Twitter, Facebook and other social media conversations for sentiment data about you and your competition, and use it to make targeted, real-time, decisions that increase market share. With the help of quick analysis of customer sentiment through social media, company can immediately take decision and action and they need not wait for the sales report (which might take 6 or more months also) as earlier to run their business in a better manner.
3. Retail – Data Processing
Let us now see an application for Leading Retail Client in India. The client was getting invoice data daily which was of about 100 GB size and was in XML format. To generate a report from the data, conventional method was taking about 10 hours of time and client had to wait for this time to get the report from the data.
This conventional method was developed in C/ Perl and was taking a huge time which was not a feasible solution and the client was not happy with it. The invoice data was in XML format which needs to be transformed into a structured format before generating the report. This involved validation, verification of data and implementation of complex business rules.
In today’s world when things are expected to be available anytime when required, waiting for 10 hours was not a proper and acceptable solution. So, the client approached Big data team of one of the companies with their problem and with a hope to get a better solution. The client was even able to accept time reduced from 10 hours to 5 hours or little more also.
When Big Data team started working on their problem and approached them back with the solution, the client was amazed and could not believe that the report which they were getting in 10 hours could now be received in just 10 minutes using Big Data and Hadoop. The team used a cluster of 10 nodes for the data getting generated and now the time taken to process data was just 10 minutes. So, you can imagine the speed and efficiency of Big Data in today’s world.
4. Sears Holding
Sears has 4000 stores with millions of products and 100mn customers, had collected over 2PB of data so for
Legacy systems incapable of analyzing large amounts of data to personalize and loyalty campaigns.
Conventional approach for analyzing data
- Analyzed just 10% if customer data for personalizing loyalty campaigns on mainframes, Teradata and SAS
- Processing time to analyze 10% of data: 6 weeks
Big Data Approach
- Shifted to Hadoop with 300 nodes of commodity servers
- Time taken to process 100% of customer data now: 1 week!!
- Interactive reports can be developed in 3 days instead of 6 to 12 weeks
- Saved millions of dollars in mainframe and RDBMS cost and got 5000% better performance
- Increased revenues through better analysis of customer data
Orbitz is a leading travel company using latest technologies to transform the way clients around the world plan the travel. They operate the customer travel planning sites Orbitz, Ebookers and Cheap Tickets. It generates 1.5mn flight searches and 1mn hotel searches daily and the log data being generated by this activity is approximately 500GB in size. The raw logs are only stored for a few days because of costly data warehousing. To handle such huge data and to store it using conventional data warehouse storage and analysis infrastructure was becoming more expensive and time consuming with time.
For example, to search hotel in database using conventional approach which was developed in Perl/ Bash, extraction need to be done serially. The time it was taking to process and sort hotels based on just last 3 months data was also 2 hours which was again not acceptable and feasible solution today when customers are expecting results to be generated on just their click.
This problem was again very big and needed some solution to protect the company from losing their customers. Orbitz needed an effective way to store and process this data, plus they needed to improve their hotel rankings. It was then tried using Big Data and Hadoop approach. Here HDFS, Map Reduce and Hive were used to solve the problem and just amazing results were received. A Hadoop cluster provided a very cost-effective way to store vast amounts of raw logs. Data is cleaned and analyzed and machine learning algorithms are run.
Earlier when it was taking time of about 2 hours to generate search result on hotel data of last 3 months, the time was reduced to just 26 minutes to generate the same result with Big Data. Big data was able to predict hotel and flight search trends much faster, more efficiently and cheaper than the conventional approach.