Wednesday, 15 August 2012

Celebrating 66th Independence Day


Happy Independence Day


Wishing 66th Independence Day to all Indians. A moment of celebration and pride, a moment to achieve high and a moment to wish long live independence from all darks and greys..


Jai Hind


Monday, 13 August 2012

NoSQL !! Not Only SQL

 

It was not before I heard of murmurs of Big data in the database glossaries that I heard of NoSQL databases. Almost all the Web 2.0 companies and big guns in the industry are diverting their radar on RDBMS alternatives. NoSQL is an unplanned product of all such researches and explorations. Started for something, did something and found something in the form of NoSQL. I am not in a criticizing tone but I am trying to defend the capabilities of RDBMS, which of course and perhaps would not be overshadowed by the scaling out features of NoSQL.

I thought it would be might of help to the community to pen down my findings on the topic. Its not complete but the findings are still in alive mode.

NoSQL: The Concept

The NoSQL Database is a new infant in the database world. The concept evolved in 2009 as the outcome of brainstorming done in the area of high volume storage. NoSQL provides a database model which doesn’t complies neither with the relational model of database nor with the ACID features of the database language. By name, it appears as a counter of SQL language, but its a myth. Both SQL and NoSQL can coexist in a system and share no relation between them. NoSQL stands for Not Only SQL.

By virtue of its violation of basic database features, NoSQL cannot be referred as the database by soul, but it appears just as a data store or repository where the acmodel is majorly need oriented. Till date, there are more than 160 NoSQL databases available in the market. The major users of NoSQL database model are Facebook, Linkedin, Twitter, Google and Amazon.

NoSQL offers a flexible database model which can be accessed and monitored from middle tier. It has no specific language of its own (unless UnQL comes in). One of the most famous model is key value pair model. Other model can be document centric, graphical, tabular, column oriented and object databases.

The Need

The basic idea after the NoSQL evolution is to design a distributed data store with large scale data flow. The WEB2.0 platform discovered new attributes of data access. The web data is not only read only but the readers are also allowed to interact with the web data. Subsequently, the web data generates huge traffic and the data size increases steeply. This exponential growth of industrial data (mostly from social media and search engines) require massive scalability, low latency and data on demand facility in a simplified database model.

The relational database worked well with the information storage philosophies but failed to justify the revolutionary growth of data in the current times. In addition, the relational database system provides a non distributed, vertically scalable, schema oriented and licensed platform for data management activities. NoSQL on the other hand, is an open source, non relational, distributed and horizontally scalable database system which can withstand high volume of mixed-up data with low latency but high availability.

NoSQL Features

The major accomplishments of NoSQL database are

  • Distributed architecture allows the implementation of replication mechanism to ensure consistent and unbreakable data flow

  • Horizontally scalable ensures that new server nodes can be added, if required to enhance the performance and efficiency. Note that RDBMS has no such property to scale horizontally.

  • Not schema dependent and non relational. Data storage paradigm is flexible as per the developer. In addition, there are no tables, constraints, join or relations to deal with. It completely behaves as a data store.

  • Compliance with BASE (Basically Available, Soft state, Eventually consistent). The BASE model talks more on the data availability and replication consistency of master servers. “Basically available” implies that the data must always be available partially and progressively after a transaction. The data consistency in NoSQL is not stringent as in RDBMS i.e. the data remains in soft state. It may or may not be readily available as soon as the transaction gets over. There is a scope of small amount of latency in the availability. Such degree of data consistency is best suited for social media and not finance sector. Thus, the data is eventually consistent but not instantaneous.

  • Complies with the Consistency and Partition tolerance of CAP theorem. CAP theorem states that a database model must obey anyone of Consistency, Availability or Partition tolerance. Note that conventional RDBMS compiles with Consistency and Availability.

  • Ability to store huge amount of structured or semi structured data and its retrieval

  • Not much technical expertise required

  • Less maintenance and administration overheads


As a new database in the industry, the NoSQL database received mixed responses. The database users who were loyal to relational model rejected the need oriented approach of this database. On the other hand, the people who were facing capacity issues with RDBMS readily adopted the NoSQL database. It was in 2009 when NoSQL started competing with RDBMS.

NoSQL categories

Key-Value stores - Based on Amazon' dynamo.

  • Columnar Family stores

  • Document database - inspired by Lotus Notes. Mainly for document centric and semi structured data

  • Graph database


NoSQL examples

Here are some pioneer and famous NoSQL databases used by the companies dealing with huge volumes of data.

I shall be back with more findings as they find me.

 

 


 

Monday, 6 August 2012

Oracle Engineered Systems: Hardware and Software Engineered to Work Together

Oracle Engineered Systems are highly efficient integrated systems which combine hardware and software to provide a complete enterprise solution to the customers or partners. The focus of the Oracle Engineered Systems is to give extreme performance, high scalability and maximum availability by reducing infrastructure complexity and setup cost. The pre assembled innovative systems have greatly simplified the requirements of a data center. One of the most complete enterprise systems in the current times, oracle engineered systems have everything to offer and fit the requirement. It can be understood as a box packaged box comprising of application support, operating system, virtualization, hardware management, networking support, and optimized storage scheme; all assembled to offer highly efficient and scalable solution to the customers.

Complete list of advantages drawn from Oracle Engineered Systems are as below
1. Integration of hardware and software components
2. Enhanced performance, claims 10x times faster than normal database
3. Low risk during installation and upgrades; high security
4. Accelerated deployment
5. Reduced complexity, IT cost and TCO (Total Cost of Ownership)
6. Single vendor support for purchase, deployment and support

Oracle engineered systems can be classified under below six solutions which we shall discuss briefly

1. Exadata
2. Exalogic
3. Exalytics
4. Oracle Database Appliance
5. Oracle Big Data Appliance
6. SPARC super cluster

1. Exadata
Exadata is one of the fastest database machines which work for both OLTP and data warehousing applications. It is a packaged integration of hardware and software comprising of server (Oracle 11g database servers), storage (Exadata Storage server), networking (InfiniBand), and virtualization. The Exadata machines are efficient to store upto 10 times more data and yield 10x-50x times better execution performance. The Exadata machine runs on the latest database version i.e. Oracle 11g Release 2.

Currently, there are two versions of Oracle Exadata namely, X2-2 and X2-8. The X2-2 version is a lower version with 2 to 8 twelve core database servers. The X2-8 version is best suited for huge requirements with 2 eighty core database servers. Depending upon the database size and performance requirements, these versions can be deployed in quarter rack, half rack and full rack configuration. Lower configurations can be upgraded to the next level with zero downtime, thus making it a scalable solution.

Key features of Exadata which contribute to the extreme performance are smart scan, hybrid columnar compression, smart flash cache, intelligent I/O resource management, smart flash logging, and storage indexes.

Oracle encourages its partners and customers (ISV) to have hands on Exadata and Exalogic through Oracle’s Exastack progam. The OPN members can utilize Oracle resources to achieve Oracle Exastack Ready or Oracle Exastack Optimized status. Oracle Exastack ready status qualifies an OPN member to be beneficiary based on their exhibition on Oracle products. A gold partner with Oracle Exastack optimized status has full access to technical resources from oracle and lab environments.

2. Exalogic
Exalogic is the high performance engineered system which is specifically designed for running Oracle Fusion Middleware, Oracle’s Fusion and Java based applications. Besides the enterprise applications, Exalogic works equally well for Linux or Solaris based applications. For Java based application mounted on Exalogic, performance can be improved upto 10x times with 5x more active users. Oracle applications run 4x times faster as compared to the normal servers with 3x times more active users. Key software engineered with the Exalogic hardware are Weblogic server, Coherence, JRockit and Hotspot, Exalogic elastic cloud software, Oracle linux, and enterprise manager for cloud monitoring and control. Exalogic is available in quarter rack, half rack, full rack and even multi racks (2-8) versions. Upgrades are possible from lower configuration to higher one with zero downtime and negligible maintenance issues.

An Exalogic unit is configured with cloud capacity too. The Exalogic Elastic Cloud is efficient to mount an application on a secure private cloud with extreme performance and simple management. All types of applications, ranging from small scale to large scale like mainframe applications can be based on Exalogic. The cloud capacity associated with Exalogic contributes to the enhanced application capacity and performance, reduced latency, and intensive database communication.

Oracle encourages its partners to take up Exalogic EX-CITE program. The program aims to demonstrate the efficiency and effectiveness of Exalogic on customers/partners business prospects.

3. Exalytics
After database and application engineered systems, Exalytics is the engineered system which focuses on the Business Intelligence applications. The Exalytics machine enables speedy analysis of data using In Memory (In Memory Parallel Analytics) processing engine.

The Exalytics architecture includes BI foundation suite (OBIEE), In Memory Parallel Essbase, and In Memory Parallel TimesTen database for Exalytics along with network components (Infiniband). Oracle TimesTen database is a relational In Memory database where the tables are cached under cache groups in the memory. Its existing capabilities have been enhanced for analytic processing by supporting columnar compression. Oracle EssBase is an OLAP server for analytic applications.

BI query reporting time improves by 18x times when Exalytics works with Oracle database. Combination of Exalytics and Exadata improves the BI query reporting time by 23 times.

The Oracle Exalytics machine is fed with four Intel Xeon E7-4800 processors where each one can provide 10 cores for computational purposes.

4. Oracle Database Appliance (ODA)
The Oracle Database Appliance is the engineered system which serves the lower capacity database services for OLTP and data warehousing applications. It is a shorter format (quarter rack) of Exadata machine which are expandable and promise higher capacity systems. In contrary to Exadata, ODA is affordable and offers easy implementations over skilled and risk deployment.

Oracle Database Appliance comes as a 4 rack unit (2 server nodes and 12TB storage capacity) running on Oracle Linux with 11gR2 RAC supported database. A complete ODA system is engineered with Oracle Linux, Oracle 11gR2 database (enterprise edition), RAC, grid infrastructure, enterprise manager, oracle automatic service requests and appliance manager. The automatic service request is an intelligent facility which can record and generate any hardware failure or replace requests. The appliance manager is a self efficient tool gets started in the deployment stage for assembling, installation and configuration tasks. In the later stages of maintenance and support, appliance manager is efficient to apply patches or reports a fixation for troubleshoot (if any).

It is best suited for non expandable systems and lower capacity customers. ODA is easy to implement, affordable and ensures high performance and serviceability.

5. Oracle Big Data Appliance (OBDA)
Oracle Big Data Appliance is the engineered system from Oracle to handle the growth of large scale enterprise data in the varied sections of the industry. The term Big Data refers to the techniques to counter the large enterprise data, both structured or unstructured, which grows at exponential rate like web data from twitter, linkedin, mapping sites etc. In a single rack, the Big Data appliance has 216 CPU processing cores and 648TB of raw storage. Starting with a single rack, the appliance can be scaled upto eight racks.

The Big Data appliance runs on Oracle Linux and Oracle JVM. Apache Hadoop from Cloudera is used for the distribution while NoSQL database (Oracle Berkeley DB) stores the data sets in key-value pairs. Using Map Reduce framework, the available data is organized and loaded into Oracle Exadata database machine. Key components which operate in this stage are Oracle loader and Oracle Data Integrator. Once the organized data is loaded, it is ready for analytical process and business decision making. The analysis is done in Oracle Exalytics In Memory machine. The statistical environment R is also used for advanced analytics at the decision stage. All the major hardware components i.e. Big Data appliance, Exadata and Exalytics share the InfiniBand connectivity so as to boost the network speed. In addition, the Big Data connectors do the load balancing between Big Data Appliance and Oracle Exadata machine.

The large data is diversified, organized and then analyzed. The Big Data platform operates in three stages namely, Acquire, Organize and Analyze. The infrastructure required for the Big Data platform can be divided as per the three stages.

Acquire: The stage where all available data is pulled and kept. Important components in this stage are Hadoop Distributed File System, and NoSQL Database.

Hadoop is an open source framework developed by Doug Cutting from Cloudera to counter large number of upcoming data requests. It is a file system which parallel takes the requests in large batches, breaks them into smaller requests and feeds into the distributed file systems. Cloudera manager tool is used to manage Hadoop.

Organize: Mapping, reducing and organization of data. Components at this stage are Oracle Exadata database machine, Oracle loader, Oracle Data Integrator, and Hadoop Map Reduce framework.

Analyze: Analysis and decision making stage. Oracle Exalytics does the job in the stage.

Decide and Visualize: Advanced analytics using R statistical environment

6. SPARC Super Cluster
The SPARC super cluster is the engineered system from Oracle to fit general purpose requirements of customers. It runs for all sorts of workloads. It integrates the high performance components like SPARC T4 compute tool, Exadata storage cells, Exalogic Elastic Cloud, ZFS storage appliance, Solaris 11, and enterprise manager. The components share InfiniBands connectivity.