Because of the popularity of mobile devices and other reasons, nowadays more and more companies have a large data store than before. The unit of data in big Internet companies like Google or Alibaba is PB or even EB. How to store and fetch big data efficiently becomes a very interesting topic. There are three mainstream database technologies:
1) SQL: The relational databases which use SQL have dominated the database markets for a long time since 70’s. It beat other technologies because of the following advantages:
(1) High level and non-structural querying language – SQL
(2) Uniform language (DDL) for different roles like developer, dba and users.
(3) Standard language for different RDBMS.
It also has its own disadvantages:
It’s difficult to scale, performance deteriorates exponentially and database sharding is unavoidable if data size exceeds some limit and sharding is not a trivial job and can’t be done transparently.
2) NoSQL (Not Only SQL): This is a very popular technology lately. The main products include Google ‘BigTable’, ‘HBase’, ‘Cassandra’, ’MongoDB’, ’CouchBase’ etc. The biggest advantage is: they are schema free and easy to scale and shard. But the price is they don’t provide ACID transaction guarantees. They are built upon distributed systems. According to the CAP theorem: it’s impossible to simultaneously provide Consistency, Availability and Fault Tolerance. Many of them only provide eventual consistency. This could be a big problem under certain circumstances.
3) NewSQL: this is a brand new technology seeking to provide the same scalable performance of NoSQL systems while still maintaining the ACID guarantees. This includes general-purpose databases like Google Spanner, NuoDB or In-memory databases like VoltDB.
NewSQL seems very promising because it has the advantage both of SQL and NoSQL. But the problem is currently most of NewSQL databases are proprietary software, some others are only apply to some specific scenarios like VoltDB. Then how to choose SQL or NoSQL? It also depends on the scenario, if you need strong consistency and ACID like bank systems, SQL is still the best choice. By using index, SQL optimization and other technologies, SQL database like Oracle can support millions of records efficiently. If your data can’t conform to a uniform schema like JSON data and doesn’t need strong transaction guarantee, NoSQL could be a better choice. It’s very interesting to see what will happen with NewSQL.