Stream Processing

As The Cluetrain Manifesto points out, “Real-time marketing is the execution of a thoughtful and strategic plan specifically designed to engage customers on their terms via digital social technologies.” Adding to that description, Wikipedia notes that real-time marketing is:

“Marketing performed ‘on-the-fly’ to determine an appropriate or optimal approach to a particular customer at a particular time and place. It is a form of market research inbound marketing that seeks the most appropriate offer for a given customer sales opportunity, reversing the traditional outbound marketing (or interruption marketing) which aims to acquire appropriate customers for a given 'pre-defined' offer.”

Real-time marketing can be inexpensive compared to the cost of traditional paid media. “Expensive research, focus groups, and awareness campaigns can be replaced with online surveys, blog comments, and tweets by anyone or any business,” add Macy and Thompson in their book The Power of Real-Time Social Media Marketing.[iii] Just to be clear, the expense of real-time marketing might be low compared to running through traditional media channels, but setting up an IT operation that can hit a level of personalization that will wow a customer is anything but cheap.

In his article How Real-time Marketing Technology Can Transform Your Business[iv], Dan Woods’ amusing comparison of the differing environments that marketers face today as compared to what their 1980s counterpart faced is highly instructive as today’s marketing executives don’t have time for a market research study in his sort of figurative first-person-shooter game. “The data arrives too late and isn’t connected to the modern weapons of marketing. The world is now bursting with data from social media, web traffic, mobile devices, and tripwires of all kinds,” Woods warns.242

Today, most large companies have massive amounts of data pertaining to consumer behavior coming at them constantly, from all angles. The challenge is to make sense of the data in time to matter, to understand how consumer attitudes and behaviors are changing and how they are being changed by marketing and advertising efforts; to grab the treasure and avoid the pitfalls of unleashing a Pandora’s box full of furies.

The challenge in understanding the modern consumer is making sense of all of the customer data, coming in from these vast unstructured sources.242 Some of this information explains the broad fluctuations in mass opinion, while other evidence clarifies what consumers might be doing on a company website.242 Others still explain what consumers have done, en masse or as individuals.242 Still other data can be collected after a customer trip in the form of surveys, whether they are mobile or physical surveys.

In his article When do you need an Event Stream Processing platform?[v], Roy Schulte states that:

“An event is anything that happens. An event object (or ‘event,’ event message, or event tuple) is an object that represents, encodes, or records an event, generally for the purpose of computer processing. Event objects usually include data about the type of activity, when the activity happened (e.g., a time and date stamp), and sometimes the location of the activity, its cause, and other information. An event stream is a sequence of event objects, typically in order by time of arrival.”

Large casino companies typically have three kinds of event streams:

  • Copies of business transactions, such as customer orders, insurance claims, bank deposits or withdrawals, customer address changes, call data records, advance shipping notices, or invoices.243 These are generated mostly internally, and reflect the operational activities of the company.243
  • “The second are information reports, such as tweets, news feed articles, market data, weather reports, and social media updates, including Facebook and Linkedin posts.243 According to Schulte, “most of these sources are external to the company, but may contain information that is relevant to a decision within the company.243
  • “The third, and fastest growing, kind of event stream contains sensor data coming from physical assets.”243 Generally known as IoT data, this includes “GPS-based location data from vehicles or smart phones, temperature or accelerometer data from sensors, RFID tag readings, heart beats from patient monitors, and signals from supervisory control and data access (SCADA) systems on machines.”243

The reason for performing analytics on one or more event streams is to obtain information value from the data.243 As Schulte explains, “A stream analytics application converts the raw input data (base events), into a form, derived events, that is better suited for making decisions. The derived events are complex events, which means that they are events that are abstracted from one or more other events.243

Stream analytics are executed in one of two ways, push-based, continuous intelligence systems, which recalculate as new data arrives without being asked to, or pull-based systems that run when a person enters a request or a timer sends a signal to produce a batch report. Event Stream Processing (ESP) platforms are mostly relevant in highly demanding, push-based systems, but they are occasionally used for pull-based analytics on historical data.243

When people think of ESP, they usually think of push-based continuous intelligence systems, which ingests ongoing flows of event data and provide situation awareness, while also supporting near-real-time, sense-and-respond business processes.243 “Continuous intelligence systems typically refresh dashboards every second or minute, send alerts, or implement hands-free decision automation scenarios,” Schulte explains. “They may be used to monitor a data source, such as Twitter, or a business operation, such as a customer contact center, supply chain, water utility, telecommunication network, truck fleet, or payment process,” Schulte adds.243

Schulte explains that:

“ESP platforms are software subsystems that process data in motion, as each event arrives. The query is pre-loaded, so the data comes to the query rather than the query coming to the data. ESP platforms retain a relatively small working set of stream data in memory for the duration of a limited time window, typically seconds to hours—just long enough to detect patterns or compute queries. The platforms are more flexible than hardwired applications because the query can be adjusted to handle different kinds of input data, different time windows (e.g., one minute or one hour instead of ten minutes) and different search terms.”

According to Schulte, continuous intelligence applications are best implemented on ESP platforms if:

  • A high volume of data (thousands or millions of events per second).
  • Frequently recalculated results (every millisecond or every few seconds).
  • Multiple simultaneous queries are applied to the same input event stream. 243

Schulte gives the example of Twitter’s ESP platforms, Storm and Heron. These DWs are “used to monitor Twitter, which averages about 6,000 tweets per second. A simple query might report the number of tweets that included the word ‘inflation’ in the past ten minutes. However, at any one time, there may be thousands of simultaneous queries in effect against Twitter, each looking for different key words or different time windows.”243

“In high volume scenarios, ESP platform applications can scale out vertically (multiple engines working in parallel on the same step in a processing flow) and/or horizontally (split the work up in a sequence or pipeline where work is handed from one engine to the next while working on the same multistep event processing query (i.e., an event processing network),” explains Schulte.243

Schulte notes that, “On-demand analytics are pull-based applications that support ad hoc data exploration, visualization and analysis of data.”243 On-demand analytics can be used with historical event data to build analytical models.243 In this context, “historical means stored event streams that are hours, weeks or years old.”243 Schulte explains that the “analytical models can be used for either of two purposes:

  • “To design rules and algorithms to be used in real-time continuous intelligence applications (see above), or
  • To make one-time, strategic, tactical and long-term operational decisions.”243

The most common tool for on-demand analytics with historical data is a data discovery product like Qlik, Tableau, SAS, TIbco, etc., etc., however, “companies occasionally use ESP platforms to run analytics on historical event streams by re-streaming the old event data through the ESP engine.”243 “This is particularly relevant when developing models for subsequent use in real-time, continuous intelligence ESP applications.”243

ESP platforms are not the only type of software optimized for high performance analytics on event stream data. Some stream analytics products like First Derivatives KDB+, Interana Platform, Logtrust Platform, One Market Data OneTick, Quartet ActivePivot, and Splunk Enterprise combine analytics and longer term data storage in one product.243 “These products typically provide on-demand, pull-based analytics, but some are also used for continuous, push-based continuous intelligence. They ingest and store high volume event streams very quickly, making the ‘at rest’ data immediately available for interactive queries, exploration and visualization,” explains Schulte.243

For a real-time platform to work, data must be gathered from multiple and disparate sources, which can include Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), Social CRM (SCRM) platforms, geofencing applications (like Jiepang and Foursquare), Over-The-Top services (like WhatsApp and WeChat), mobile apps, augmented reality apps, and other mobile and social media systems. This data must be collected and then seamlessly integrated into a data warehouse that can cleanse it and make it ready for consumption.64 As the authors’ state in Mobile Advertising:

“The analytical system must have the capability to digest all the user data, summarize it, and update the master user profile. This functionality is essential to provide the rich user segmentation that is at the heart of recommendations, campaign and offer management, and advertisements. The segmentation engine can cluster users into affinities and different groups based on geographic, demographic or socio-economic, psychographic, and behavioral characteristics."19

Perhaps the future of real-time marketing was on display during the 2014 World Cup. “On eight different occasions during the 2014 World Cup, Nike and Google cranked out online display and mobile ads in 15 different countries across the globe. The campaigns ran in real time—meaning they went live during the games, and concluded once those games were over.”244

For example, “on June 23, during a match between Brazil and Cameroon, Nike pumped out an ad featuring Brazil’s star, Neymar da Silva Santos Júnior, who scored two goals that day.”[vi] “Within seconds, an ad featuring the star was featured throughout the Google Display Network, pushing it out to thousands of sites and mobile apps across the web, the search giant says.”244

“Besides being super timely, the Neymar Jr. ad featured some unique 3D technology that utilized the gyroscope found in most smartphones. Mobile users could rotate their phones and see images of the Nike star in the ad at different angles.”244 Gimmicky, yes, but, probably, effective as fans could interact with these 3D ads as well as add personal touches. Once viewed, users could share the ads via Twitter, Facebook and/or Google+.244 The eight different World Cup real-time campaigns generated two million fan interactions across 200 different countries.244

In his article Real-Time Stream Processing as Game Changer in a Big Data World with Hadoop and Data Warehouse16, Kal Wähner states that:

“Stream processing is designed to analyze and act on real-time streaming data, using ‘continuous queries’ (i.e. SQL-type queries that operate over time and buffer windows). Essential to stream processing is Streaming Analytics, or the ability to continuously calculate mathematical or statistical analytics on the fly within the stream. Stream processing solutions are designed to handle high volume in real time with a scalable, highly available and fault tolerant architecture. This enables analysis of data in motion.”

As a batch processing framework, Hadoop can’t handle the needs of real time analytics. As the first open source distributed computing environment, Hadoop has garnered a lot of attention recently, but it is not necessarily the best platform for real-time analytics of dynamic information.[vii]

One recent development in stream processing methods is the invention of the “live data mart”, which “provides end-user, ad-hoc continuous query access to this streaming data that’s aggregated in memory,” explains Wähner.16 “Business user-oriented analytics tools access the data mart for a continuously live view of streaming data”16 and a “live analytics front ends slices, dices, and aggregates data dynamically in response to business users’ actions, and all in real time,” adds Wähner.16

For a casino company, streaming data could be coming in from facial recognition and geo-location software, fraud or anti-money laundering solutions, slot and table games systems, patron card and campaign management databases, redemption systems, social media feeds, IoT data, as well as wearables and employee/labor data sets.

Stream processing excels when data has to be processed fast and/or continuously. Many different frameworks and products are available on the market already, however the number of mature solutions with good tools and commercial support today is quite small.

Apache Storm is a good, open source framework, but it suffers from its open source nature and custom coding is required because of limited developer tools. The typical “commercial solution vs. open source” questions must be answered; do I want a pre-built product that will require limited—and sometimes not so limited implementation costs—or do I want to start with a solid solution and be required to customize everything?

As Wähner explains, a stream processing solution has to solve different challenges, including:

  • “Processing massive amounts of streaming events (filter, aggregate, rule, automate, predict, act, monitor, alert).
  • Real-time responsiveness to changing market conditions.
  • Performance and scalability as data volumes increase in size and complexity.
  • Rapid integration with existing infrastructure and data sources: Input (e.g. market data, user inputs, files, history data from a DWH) and output (e.g. trades, email alerts, dashboards, automated reactions).
  • Fast time-to-market for application development and deployment due to quickly changing landscape and requirements.
  • Developer productivity throughout all stages of the application development lifecycle by offering good tool support and agile development.
  • Analytics: Live data discovery and monitoring, continuous query processing, automated alerts and reactions.
  • Community (component/connector exchange, education/discussion, training/certification).
  • End-user ad-hoc continuous query access.
  • Push-based visualization.”16

For a casino or a sports book, events could include the following:

  • Real-time gaming systems information.
  • Coupon and/or comp redemptions.
  • Real-time streaming game odds.
  • Table games revenue management.
  • Social media data feeds, including:
    • Twitter
    • Facebook
    • Weibo
    • WeChat
    • Live streaming apps such as YouKu, PandaTV, Periscope, etc.
  • Predictive asset maintenance data streams.

 

[i] www.cluetrain.com

[ii] https://en.wikipedia.org/wiki/Real-time_marketing

[iii] Macy, B. a. (2011). The Power of Real-Time Social Media Marketing. New York: McGraw Hill, 2011.

[iv] Woods, D. (2011, May 6). How Real-time Marketing Technology Can Transform Your Business. Retrieved from Forbes.com: http://www.forbes.com/sites/ciocentral/2011/05/06/how-real-time-marketing-technology-can-transform-your-business/

[v] Schulte, Roy. (May 23, 2017). When do you need an Event Stream Processing Platform? Logtrust.com. https://www.logtrust.com/need-event-stream-processing-platform/ (accessed 30 August 2017).

[vi] Shields, M. (2014, July 14). Inside Google's World Cup Real-Time Marketing Experiment with Nike. Retrieved from Wall Street Journal Blog: http://blogs.wsj.com/cmo/2014/07/14/inside-googles-world-cup-real-time-marketing-experiment-with-nike/

[vii] Deng, Lei, Gao, Jerry, Vuppalapati, Chandrasekar. March 2015. Building a Big Data Analytics Service Framework for Mobile Advertising and Marketing. Online: https://www.researchgate.net/profile/Jerry_Gao/publication/273635443_Building_a_Big_Data_Analytics_Service_Framework_for_Mobile_Advertising_and_Marketing/links/5508de220cf26ff55f840c31.pdf

© 2017-2018 Intelligencia Limited. All Rights Reserved.

Contact

MACAU:

Rua da Esrela, No. 8, Macau

Macau: +853 6616 1033

 

HONG KONG:

505 Hennessy Road, #613, Causeway Bay

HK: +852 5196 1277

andrew.pearson@intelligencia.co