Skip to main content

Kent Jiang

Currently I was working in Perficient China GDC located in Hangzhou as a Lead Technical Consultant. I have been with 8 years experience in IT industry across Java, CRM and BI technologies. My interested tech area includes business analytic s, project planning, MDM, quality assurance etc

Blogs from this Author

How to Run an Existing Liferay Application in Docker Container

As an instrumental containerization tool in the DevOps technologies, Docker is useful and being widely applied in the continuous integration (CI), continuous delivery (CD) and auto deployment practice. Perficient China portal team has just completed a Liferay application dockerization. This implementation makes our entire application/instance to be lightweight in the deployment and lay the foundation […]

Configure Authorization and Authentication in Liferay 6.2

Out of several portal technologies and tools, Liferay, written in Java, is a free and open source enterprise software product that provides web content management and application management. As an essential component, authorization and authentication are almost always needed for each type of enterprise system tool. In the past few weeks, we completed some security […]

Druid – A Data Store to Support History and Real-time Analytics

As a part of Perficient big data practice, I have been working on identifying some open source data stores and search frameworks that enable the user to quickly query what he needs, and to process massive events/message stream, in addition to several frameworks such as Spark, ELK, Hadoop, HBase, Cassandra, I get to know about […]

Story Points Estimation on a Data Warehouse Project

It has been decades since people started to develop data warehousing (DW) systems. In fact, most of the delivery strategy and approach on DW is to follow the traditional waterfall cycle – discovery, requirement, design, development, test, training, and transition, etc. With this pattern, only the previous step is completed; the team will start on […]

How to Load Log Data into HDFS using Flume

Data acquisition is a very important part of building a big data ecosystem. Data acquisition allows you to extract various types of data such as a file, DB, streaming, web page etc. If you are just setting up your local environment, not in the real business scenarios, you can resolve data acquisition by making use […]

Machine Learning in Local with Microsoft CNTK Package

In July 2016 there was an international joint conference on artificial intelligence held in New York City where many experts and professors meet together to share their recent research and the commercial use cases. Machine learning (ML), deep learning(DL) and natural language processing (NLP) were the hot topics on the agenda. There are some interesting […]

Fog Computing – Next Buzzwords?

There is no doubt that cloud computing has been the buzzword for several years and will continue to dominate the IT and business world for quite a long time. In the cloud computing world, the computation resources, storage, algorithm, application and big data analytics are centralized and their service is provided to the consumer just […]

Hangzhou Spark Meetup 2016

Last weekend there was a meetup in Hangzhou for the Spark community, and about 100 Spark users or committers attended. It was great to meet so many Spark developers, users and data scientists and to learn about recent Spark community update issues, road maps and real use cases. The event organizer delivered the first presentation […]

Continuous Integration in the Analytics Project

Many people may have known that Continuous Integration (CI), Continuous Delivery (CD) is great part of the agile activity. In the Java related project, there were lots of open source tool such as Hudson, Continuum, Jenkins etc to support this automation process. However, if you are going to look for some tools to support the […]

How to Load Oracle Data into SparkR Dataframe

In the Spark 1.4 and onward, it supplied various ways to enable user to load the external data source such as RDBMS, JSON, Parquet, and Hive file into SparkR. Ok, when we talk about SparkR, we would have to know something about R. Local data frame is a popular concept and data structure in R […]

SparkR for Data Scientists

Although the title Data Scientist is not mentioned as often as other IT job titles, it has been in the IT world for a while and is becoming more important with the popularity of the Internet and eCommerce. What kind of skills should a data scientist have? It could be a long list, but I […]

A Spark Example to MapReduce System Log File

In some aspects, the Spark engine is similar to Hadoop because both of them will do Map & Reduce over multiple nodes. The important concept in Spark is RDD (Resilient Distributed Datasets), by which we could operate over array, dataset and the text files. This example gives you some ideas on how to do map/reduce […]

Load More