Big Data Doesn’t Always Mean Big Company
BIG Data does not necessarily mean BIG Company. Although these technologies were born by large tech companies to deal with rather extraordinary data challenges, applications do exist for small and mid-size companies. Note this is not a blanket statement and whether or not these technologies are appropriate depends largely on the use cases that exist. Below are several use scenarios where Big Data may help:
- Data archival and retrieval – Hadoop is a cost-effective system to store archived data. The cost of ownership can be orders of magnitude lower than traditional backup methodologies or trying to scale existing infrastructure to keep more data online. Most importantly the data is still available and capable of being queried or processed. Using languages such as pig or hive, and even leveraging Solr search the archived data can still be easily retrieved or searched without the need to recall backup media or tapes.
- Machine learning – If you have a classification, clustering, or recommendation requirement there is a rich set of libraries available in projects such as Mahout to address these needs. Even in mid-size data sets these algorithms can be extremely resource intensive. Leveraging the map reduce framework the processing can be spread across commodity machines. Additionally, much of the code is open source so it is available for modification if needed.
- Don’t forget NOSQL – Although typically pegged for large scale applications there are a number of functional benefits to these technologies as well. After coming from the relational world writing an application against a NoSQL database (such as Mongo) will seem like cheating. The platforms easily handle sparse data, or data with inconsistent or volatile schemas. They are typically modeled around query usage so there is very little abstraction between the database and application. And as the scale of the application increases, you can be confident your database can keep up.
Big Data is not a one size fits all solution. It is important to understand your use case and choose the right technology to support it. Contact us to learn more.