Digital Nebula was recently contracted to enable the analysis of API logs from a Fortune 500’s primary product. Our client wanted the flexibility and ease of use of using SQL, while being able to perform complex queries on large data sets. Being able to visualize the data would be a huge win as well. From a security perspective, the data was in a highly secured environment on a Microsoft SQL server, and options were limited on how to extract the data into another platform.
After consulting with Google’s partnership team and our client’s business teams, we decided on a solution involving loading the data into Big Query, and use Google Data Studio on top of for visualizations. Our method of extraction would be a custom ETL framework which Digital Nebula has developed. This allowed us to pull the data from a server on premise within the client’s data center. Because we were on premise and pushing the data into Big Query, we were able to circumvent the need for allowing the database to be reachable via public internet.
Digital Nebula’s ETL framework was able to run on a single server on premise. It first pulled the data from SQL Server, transformed it, compressed it, then forwarded it to Google Cloud Storage. From there, we kicked off a job in Big Query to allow automated loading of the data from Google Cloud Storage into a Big Query dataset. Using this strategy, we were able to transfer over 2 Billion rows of data, at a rate of over 25,000 rows per second using commodity hardware for the extract transform and load process. We then connected Google’s Data Studio product on top of the Big Query table, which enabled brilliant visualizations and flexible reporting.
Our client now has the capability to analyze API log data which will allow them to make informed decisions for their product roadmap. We were able to do this while maintaining data fidelity, scalability, and cost efficiency. The project took one week from start of implementation to end, and the ETL framework is still in place, giving the client the capability to continue to have updated reports.