Saturday, October 6, 2012

Big Data and DSL ( Domain Specific Language )

Domain Specific Languages have been around for a long time – a great example of a DSL is SQL for RDBMS. A DSL is differentiated from a general-purpose language such as C, Java or Python since a DSL is geared towards a specific domain.

In our space of log management a DSL serves many purposes. Log files, especially multi-structured log files, contain very rich information – not only at a system level but also at a business and feature level. This information is “logged” not in one file but is spread across many files of many file types. Providing a simple search solves specific problems that IT is interested in but the usefulness of a simple search on log files stops at that. The higher value of business intelligence from log files requires a DSL. Lets see what the benefits of a DSL are.

  1. Describes arbitrary text layout for automated parsing

  2. Describes the context or meaning of the text in order to express more than that which is explicitly stated

  3. Automatically creates an efficient structured schema

  4. Defines semantics for search and browse of log files

  5. Defines application specific tags – example “trend-able” attributes, Status and configuration etc.

Having this rich definition allows for a wide range of applications that can be built out of log files. For example knowing which attributes are status, configuration and trends allows an app to treat them differently and use them appropriately inside the application.

SPL™ is a DSL for machine data that enables companies like Aruba and IBM to mine their logs and enable a wide set of people inside their enterprise ranging from support and services to sales and product management to leverage this data.

More in upcoming posts on how SPL™ enables rapid development of Big Data applications for enterprise and IT.