EVL Data Generation Microservice
EVL Data Generation Microservice provides fast, automated and cost-effective method for data generation. Having a proper test environment is a must in many areas like application development, implementing ETL processes, and stress testing. Often the tests can’t be done on existing production data so data simulation is the only way how to achieve the goal. The simulated data must comply with the real-life data patterns, and data volumes should be close to the expected peaks.
EVL Data Generation advantages:
- Configuration via excel or csv files pre-filled by “reading” the existing data structures
- Automatic random data generation based on data types
- Customization of data pattern based on filled-in parameters like min-max intervals, null values probability, string ranges …
- Ability to include custom made data generation functions
- Extremely fast generation of vast amounts of data by using low-level IO techniques
- Parallel running of jobs and workflow monitoring
- Low implementation and operating costs
EVL Microservices are built on top of the core EVL software and retain its flexibility, robustness, high productivity, and ability to read data from various sources; including csv files, databases–Oracle, Teradata, SQL Server,
etc–and Hadoop streaming data like Kafka and Flume.
EVL Generation white paper. Printable function guide and examples.
Download
EVL Data Generation Functions
Data | Null | Min | Max | Pattern | Description |
String | Pattern like [a-zA-Z0-9] | ||||
Date | Pattern | Values between 1970-01-01 and 2199-12-31 | |||
Number | 30% | Min | Max | Intervals for integer and decimals and probability of null values |
EVL Data Generation Project
A data generation project consists of following steps:- unzipping EVL distribution and defining a few variables and paths
- filling-in an excel or csv file defining source type (e.g. csv, Oracle ODBC …),entity and attribute names and generation parameters to be applied
- automatic generation of EVL jobs for each entity
- running EVL jobs in a batch or individually
- monitoring and tuning
Example
Set variables:
# project directory
CONFIG_FILE_DIRECTORY=$HOME/Project/Generation/
# configuration file name
CONFIG_FILE=generation-config.csv
# jobs directory
CONFIG_EVD=evd/config.generation.evd
# global default parameters
CONFIG_GEN_DEAFULTS=val/config.evl.gen
Data generation definition file TEST
FILE | Entity | Attribute | Data type | Null | Min | Max | Pattern | Description |
FILE | TEST1 | ID | int | 1 | 5000 | Setting number interval, no null allowed | ||
FILE | TEST1 | ACC | int | 20% | 20% can be null | |||
FILE | TEST1 | NOTE | string | 70% | [a-zA-Z0-9,-] | Allowed characters | ||
FILE | TEST1 | Sex | string | 1 | 1 | "M","F" | List of values | |
ORCL | TEST2 | ID | int | 1 | 300000 | 100,200,300 | List of values: 100, 200, 300 | |
ORCL | TEST2 | Postcode | Number | 8% | 5 | 5 | Postcode must be 5 digits, 8% can be null | |
ORCL | TEST2 | Text | string | 3 | 80 | Mandatory minimal text |
Run:
# generating evl jobs from the config file
evl run/generate_jobs.evl
# running the test generation job
evl run/generation.test.evl