The financial services industry has always used data to inform its investing decisions. After all, data-backed decision making mitigates risk and instills confidence in investors on the part of clients. Until recently, however, firms gathered data through “traditional” data sources like press releases, SEC filings, earnings reports, and credit scores.
Technology has enabled the development of the ability to ingest, process, and analyze large amounts of data that is both structured and unstructured from new and never before used sources. As a result, firms now have the ability to harness these alternative data insights to use them to inform their investment decisions, make more accurate investments, and ultimately generate alpha.
What you’ll learn in this article:
This article explains everything you need to know about alternative data. Specifically, you’ll learn:
- What is alternative data?
- How is alternative data generated?
- Benefits of using alternative data
- How alternative data can be used
- Alternative data providers
- Key requirements of an alternative data program
- Challenges in implementing an alternative data program
- Architecting for alternative data
Alternative data is data that is not generated by “traditional” financial data. It is used in the investment process to inform investing decisions. Taking the form of anything from scraped web content to social media sentiment analysis, to satellite imagery, it can provide unique and timely insights into investment opportunities that investors can’t get from traditional data sources.
Is it worth the hype?
Short answer: yes. Spending on this data is expected to surpass $1 billion by 2020. In fact, more than 400 alternative data suppliers exist on the market today, so demand and use will only increase.
The number of employees dedicated to alternative data full time has grown by 450% in the past five years. In fact, according to a 2018 study, almost 80% of investors turn to such data to inform their investing decisions. Given that investors want access to the most granular insights available to make the best possible investment decisions, it is here to stay.
Over the past two years alone, we have generated 90% of the data in the world. Technology connects humans and businesses more than ever before. By the year 2022, 29.5 billion networked devices will exist and 4.8 billion people will use the internet, according to Cisco.
There is no comprehensive list of what this data is. Technology today can track social media, media, sentiment, IoT, geolocation, eCommerce buying habits, airline bookings, retail inventory data, mortgage data, entertainment events data, hotel bookings, satellite images, and more.
Experts divide alternative data roughly into three categories: data generated by individuals, data generated through business processes, and data generated by sensors.
Individuals generate unstructured data. Coming primarily from web traffic, app usage, and social media, this data is valuable for detecting sentiment and consumer behavior. With 1.56 billion daily active users on Facebook and 126 million daily active users on Twitter, scraping social media for sentiment and feedback has become commonplace. For example, understanding sentiment helps brands to understand brand performance, reduce churn, and increase lifetime customer value.
Businesses also generate data in the form of banking records, credit card transaction records, commercial transactions, supply chain data, and government and corporation data. This data is usually structured and is a good overall indicator of business performance. In addition, it is a good predictor of company sales.
Data generated by sensors is the third major source of data. This data comes from satellite images, weather forecasts and predictions, and geolocation data through wifi signals. Data from sensors is usually the largest and is unstructured. Among other things, businesses use sensor data to track foot traffic and detect the health of stores.
You know what it is and how it’s generated, but what are the benefits of using this data?
Alternative data can save traditional money managers time. By using programs that sift through news and data on their behalf, businesses produce more accurate insights and unbiased decisions. In addition, analysts can get a better signal about what’s going on in the market in real time by using these larger and more dynamic alternative datasets.
Alternative viewpoints and unforeseen insight
Today, analysts have access to thousands of data sources. Using this data gives analysts access to different viewpoints that you would not normally have access to when using traditional data sources. As a result, they can generate new investment ideas, discover unforeseen insights, and even predict future market moves.
Transparency into company performance
Traditionally, data used by portfolio managers only gave historical insight into company performance. Having to wait for quarterly earnings reports and financial statements meant being reactive with investment strategies. Now, integrating alternative data allows portfolio managers and investors to get real-time signals into company performance.
Using alternative data gives you a competitive edge over other firms in your industry. Consequently, leveraging such data means your firm can understand everything it can about a possible deal or investment beforehand. While almost 80% of firms today use some form of alternative data to inform their investment strategy, financial firms can still take advantage of what it has to offer.
Firms can use alternative data in several different ways. Below are just a few examples of how firms have leveraged it.
A hedge fund wanted to predict which industries would be affected by Brexit. Building bots to scrape data from Bloomberg, Financial Times, and other publications, they looked for “Brexit” and SEC-delineated industries to figure out what was being covered and focused in on specific industries of focus.
By using publicly available flight information, a company used flight data to track the flight patterns of corporate jets. They trained a machine learning model to make predictions about repeated flight patterns for certain companies to predict M&A deals.
For example, executives from Cisco were flying repeatedly to Carlsbad, CA, where Luxora is headquartered. They later acquired Luxora. HCA Health executives were visiting Asheville, NC often, and they later acquired Mission Health, which is headquartered in Asheville.
Testing new ideas and identifying new opportunities
A leading long/short equity hedge fund created a flexible and scalable alternative data program. Their program continuously and rapidly digests any number, size, or format of data sources, converts unstructured data into machine-readable formats, and incorporates data quality checks.
Analysts at the hedge fund now onboard new data in less than one hour. With access to high-powered and quickly integrated alternative data insights through its custom-made data platform, the hedge fund has completely changed how it processes and investigates data. Next-gen insights gathered from alternative data now enhance its investment strategies. As a result, analysts now make better-informed investment decisions. Learn more.
What to look for in a provider:
When looking for an alternative data provider, it is important to find those that have the data that you need, and enough of it to inform your investing decisions. Other important questions to ask when looking at vendors:
- Can you integrate the source with your existing system?
- How much will the data cost?
- Will the alternative dataset be a good ROI, or will it just add noise to your analysis?
- How long will it take to integrate the data into your system?
Alternative data provider examples:
Today, over 400 vendors supply alternative data in the marketplace. This list is not exhaustive, but it highlights some of the top data providers in the market that feature different data sources and purposes.
Quandl: Quandl is a data aggregator that leverages relationships to source alternative data from IoT, consumers, natural resources, logistics, B2B, and more to enhances trading strategies for its users.
YipitData: YipitData is a data aggregator that sources web data, [anonymized] consumer receipts, and survey data weekly and monthly from over 70 companies across multiple industries and locations.
Dataminr: Dataminr scrapes public tweets and turns them into real-time alerts and sentiment analysis to assist its clients across several industries in trading, market awareness, client advisory, and thesis generation.
UBS Evidence Lab: UBS Evidence Lab sells insight-ready datasets of quality and vetted data for other financial services firms to integrate with their own data.
1010data: 1010data aggregates credit card data from third-party providers
S&P Global Market Intelligence: S&P Global Market Intelligence uses natural language processing to understand sentiment from earnings calls. It has data on 8,300 companies dating back to 2004.
AppAnnie: AppAnnie collects mobile app usage data and trends to enable its users to make more informed decisions on consumer behavior.
An effective program needs to do more than just bring in data. Your alternative data platform needs to be able to:
- continuously ingest any number of raw data sources of any size, volume, or structure, including structured, semi-structured, and unstructured data; large quantities of small data files; large quantities of large data files, etc.
- rapidly onboard new data sources of any kind
- enable discovery and hypothesis testing on ingested data in order to create and refine use cased based on a subset of the available data
- productionalize the preparation of newly ingested data towards all established use cases and rapidly deploy new use cases
- update a visualization layer that enables analysis and real-time tracking of use cases
Using alternative data to enhance and inform investing decisions might sound great, but establishing and implementing a program comes with unique challenges. When creating one, companies should ensure their data programs are capable of the below points.
Onboarding and enabling new data sources with minimal effort
In order to take advantage of the potential of all data, your system needs to be able to onboard and enable new data sources with minimal effort. You’ll want to avoid the need for additional development or code refactoring to accommodate something unforeseen in a new data source that you want to onboard.
The creation of a universal data extraction process
Flexibility is key when using any big data. An extraction process has to be capable of extracting all the data points that are contained within a given file format. Relying on a hard-coded schema for each individual data source will lead to problems down the line.
Accommodating schema shifts in the source data
Accommodating schema shifts in the source data is particularly important with unstructured or self-gathered datasets because you need to be flexible if the data change.
Scalability and avoidance of bottlenecks
You want to be able to digest and process each individual data source in a timely manner and be able to accomplish this in parallel with different data sources that will be landed at the same time. If your system needs to suddenly ingest a new, large dataset while simultaneously updating datasets already in its arsenal, it will need to be flexible and scalable to avoid bottlenecks and possible downtime.
Privacy, sensitivity, and confidentiality
Your system needs effective data governance in place to be used safely and efficiently. Companies should also keep laws like GDPR and the California Consumer Privacy Act in mind when collecting, ingesting, and using data.
Data governance is essential to ensuring your program is an effective part of your overall enterprise. An effective strategy prevents organizational issues and conflicts resulting from the mismanagement of data. In addition, it allows the appropriate people to access certain levels of the data.
Furthermore, you should keep the following best practices in mind:
- Create an alternative data framework
- Automate mundane tasks
- Embed data cleansing and integration
- Use a big data ecosystem
- Embrace change
- Leverage consultants
- Sift through the noise and find data that will give your firm an edge over the competition
Caserta’s Alternative Data Supply Chain
At Caserta, our alternative data engagements involve the same process, based on the supply chain. The alternative data supply chain ensures a cohesive and effective, alpha-generating alternative data program is created with maximum time-to-value.
Plan: First, we identify the business initiatives, plan the alternative data architecture, including data platform, data pipelines, ML/AI tools, and data visualization.
Source: Second, we define an implementation plan and success criteria. Next, we source the data and make decisions such as build vs. buy, change data capture (CDC) techniques, and persist raw data in the internal data lake.
Construct: Third, we integrate the data through testing. Next, we transform it into usable formats and structures. Finally, we integrate new data assets with enterprise data, train and refine models, and create new data assets for delivery.
Deliver & Consume: Finally, we deliver the alternative data. With effective management of governance of new data assets, availability to business users, data usage patterns and refined dissemination methods, and coordination of data sharing with external partners, the alternative data program is ready to use.