At the start of the year, we launched an internal effort to build an internal dashboard for employees, to provide a full view of employee personas, performance, the growth of employee skills, machine log monitoring etc. Especially for the machine log, the server produces 1 terabyte of information every week, on which we were able to extract the warn and error log and do the analysis of the most frequent root causes and other metrics. We’ve selected several big data technologies and platforms to transform and store the structured & unstructured data.
The major components in the architecture include Java Spring framework, MongoDB, Spark core/SQL/MapReduce, and the use of echarts and vuejs to visualize the data. We completely drove the design and development process with Scrum methodology by taking advantage of a series of agile tools such as gitlab, git, Docker etc. This enables the continuous integration and continuous delivery to ensure the code quality.
The Future of Big Data
With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.
Data Sources. We were primarily extracting and handling the structured data from MySQL and Oracle, streaming data from rest API and machine logs. The images and videos will be searched and crawled from the social networks such as wechat, interested websites and weibo (China twitter) etc.
Spring Framework. We built out the Java program mainly under Spring boot to construct a configurable ETL engine that will pick up the rows from MySQL/Oracle, as well as external rest API and consume it. The data will be loaded to MongoDB as the stage.
Spark Engine. There is one Master node and several Slave nodes set up for Spark engine. The spark core will handle the large data sets such as the server log. Spark streaming will handle the real-time message that was stored in Kafka.
MongoDB. Mongo is widely used in enterprise application development, as well as in the big analytics field. It has a mature community and lots of available API interfaces, which enable the developers to implement some ad-hoc use cases.
Data Visualizing. The rest API being provided by the MongoDB and the Spring boot will be consumed by vuejs and echarts in the front end JS layer. We are also planning on introducing PowerBI to consume the data from MongoDB.
I’d like to thank the team for their contributions and passion. Everyone in the team has taken on multiple roles – they were busy with other projects and sacrificed their personal time for this effort. My colleague will also be writing more blogs to explain some technical details around this dashboard, for Java, Continues Integration, DevOps, and the Frontend framework. Please stay tuned!