Use Case: RapidsDB Helps Sentient Integrate Data Seamlessly and Reduce TCO by 80%


INDUSTRY: Technology and Software

SOLUTION PACKAGE: RapidsDBRapids FederationAIworkflow

CLOUD: Google Cloud is an AI API microservice provider located in Singapore that helps companies across the world build and deploy AI solutions. TeamSpaces, a workspace embedded within the native platform, promotes secure and collaborative developmental work within a community of 10,000+ developers and business users. Having a unified view of all datasets and a highly reliable and trusted data infrastructure for team collaboration through the use of APIs regardless of sources is crucial to the customers’ success of AI development. In the past, Sentient took a traditional siloed-approach and adopted the database technologies of Redis, Elasticsearch and BigQuery in Google Cloud.

Special-purpose databases created data silos and increased Total Cost of Ownership (TCO)

Redis is an open-source, in-memory key-value database. Compared with disk-based traditional databases, Redis runs much faster due to its in-memory nature. Frequent disk I/O can be largely avoided. However, while Redis is ideal for creating a caching system, it is not designed for analytical workload. It lacks many key qualities of a database. Fundamentally, it does not support structured query language known as SQL. In a key-value store, keys serve as unique identifiers for their associated values. While complex queries often require multi-table joins, conditional aggregation, sub-queries, etc., Redis’ own set of commands for managing and access data is only helpful to retrieve very simple query results based on key-value pairs.  As a result, Sentient used Redis just to store session data for session caching.

Due to the technical limitation of Redis, Sentient had installed Elasticsearch to conduct full-text search. Elasticsearch is an open-source, distributed, analytics search engine with an HTTP web interface and schema-free JSON documents. It is good at searching any kind of document and very useful to work with unstructured data. In order to guarantee the continuous accessibility and availability of data stored in Redis and Elasticsearch, Sentient had invested in seven servers with two dedicated to Redis and the rest to Elasticsearch, respectively, in order to achieve high availability (HA).

Moreover, the company had also added BigQuery to the data stack to perform complex business analytics. BigQuery is a fully managed, serverless data warehouse with built-in machine learning capabilities and ANSI-SQL language support. Sentient mainly leveraged BigQuery’s location intelligence for geospatial data analysis and storage. BigQuery worked well with extremely large datasets. However, with the ever-growing data volume and level of performance required by Sentient’s customers, the cluster size continued to expand. As BigQuery pricing is mainly based on data storage, querying processing & streaming inserts, queries that had not been fine-tuned for performance or queries that had returned a large amount of redundant data had quickly led to dramatically increased cost.

These different technical solutions for different uses cases ultimately created more and more data silos. Sentient had to connect and integrate data on its own, which resulted in high development, O&M and operational staff training costs. The disparate data systems made it challenging for users to access and analyze all of the available data and obtain unified query results quickly to boost the collaborative application development effort.

A unified approach to integrate data seamlessly and enable complex queries to be answered in real time

What Sentient needs is a unified data analytics platform that can break data silos with high performance and low cost. It should be reliable to handle TB-level of data volume while guaranteeing HA. The system should provide seamless data integration to support real-time data access and complex data analytics. Moreover, a simplified data management architecture would also be ideal to reduce the complexity of data security and governance.

RapidsDB is a fully parallel, distributed, in-memory federated query system that is designed to support complex analytical SQL queries running against a set of heterogenous data stores. Based on the advanced in-memory database technology independently developed by Borrui Data, the database provides exceptional performance to analyze massive data with high concurrency and low latency.

The unique Rapids Federation, a logical grouping of a set of one or more dedicated or generalized RapidsDB Connectors for accessing various data sources, is the most cost-efficient way to break data silos and integrate data seamlessly. One single SQL statement can combine structured, unstructured and semi-structured data derived from a variety of sources in lieu of expensive and time-consuming ETL tools. Query results will be logically consolidated and presented to the user as a single, federated database without any data movement. With all available data in one centralized location, data governance is much easier to achieve as well.

In addition, RapidsDB provides complete backup and recovery solutions. Although hot data is usually stored in memory, copies of data are maintained on server disk drives. When RapidsDB runs in a fully persistent mode, transactions are committed to transaction logs on disk and compressed into database snapshots to take much less disk space.

Faster insights increase workforce productivity and user experience while reducing TCO

With the RapidsDB unified big data analytics platform, Sentient customers are able to query large amounts of structured and unstructured data, including geospatial data and text files, in real time. It boosts workforce productivity as different views of data can be provisioned efficiently for different analytic purposes. With TeamSpaces, the collaboration among application developers and business users have been strengthened as the live integration of hot and cold data powered by Rapids Federation enables them to run highly sophisticated ad-hoc queries conveniently and concurrently.  The results can be returned within seconds, tremendously improving user experience and teamwork efficiency. With its advanced data compression technology and super-high performance, Sentient now only uses two servers and is able to achieve the results supported by the original data stack, reducing TCO by 80%!