Clickhouse generate random data

tags: # ClickHouse  clickhouse

Random function

Clickhouse is a very high -performance OLAP database. Because it is not like the traditional OLTP database (such as MySQL, Oracle, etc.), it has no functions, no storage procedures, and no cycle statement. Therefore, the way to create a random value is also different. Today I will talk about its usage.

First, enter/usr/bin/clickhouse-client --host localhost --port 9000 Enter the command line mode

Integrated

In theSELECT rand Later, press Enter, there will be a prompt

There are 4 types in total

  • rand
  • rand32
  • rand64
  • randConstant

    Pay attention to random several seeds, because RAND is pseudo -random. Because the mechanism is the same, if there is no different seeds, the same cycle will be the same.

    randConstant() In the same SQL, the values ​​obtained are consistent.
    The test statement is:SELECT rand(), rand32(), rand64(), randConstant() FROM numbers(10)

floating point

Since the clickhouse does not provide a method of random floating point, it can only be simulated in some ways

  1. A random integer, and then insert a decimal point randomly
    WITH rand() AS value, length(toString(value)) AS len SELECT value / power(10, rand(0) % (len+1))
  • With <value> as <<variable> Used in the front of the query function, used to define variables
  • length (<star>) Used to get string length
  • toString() You can convert the object to a string
  • FROM numbers(n) Can be used for recycling
  1. toDecimal64(rand(now()) / rand(0), 3)
    Grammar is:TODECIMAL (<Floating point number>, <keeping a few decimal numbers>)

String

Own method

There are 4 random methods that come with the comes with Clickhouse

  • randomFixedString
  • randomString
  • randomPrintableASCII
  • randomStringUTF8

    Use the command, print these functions in a loop
    WITH 10 AS len SELECT randomFixedString(len), randomString(len), randomPrintableASCII(len), randomStringUTF8(len) FROM numbers(20)

    Because the command line interface cannot be aligned, check the format data after DBeaver. It can be found: exceptrandomPrintableASCII() Others contain a lot of garbled code

    It can be seen through the test,length() The function returns is not a string length, but the byte length occupied by the string

    Let's take a look at the length of random string. it can be discovered length (<star length>) What you get is the byte length occupied by the string, not a string length
  • In the first three, specify the character with n bit length, each character only occupies 1 byte
  • Each character of UTF8 occupies 1-4 bytes, so only its length is not fixed
WITH 10 AS len 
SELECT 
	min(length(randomFixedString(len))),max(length(randomFixedString(len))),
	min(length(randomString(len))), max(length(randomString(len))), 
	min(length(randomPrintableASCII(len))), max(length(randomPrintableASCII(len))), 
	min(length(randomStringUTF8(len))), max(length(randomStringUTF8(len)))
FROM numbers(10000000)

Custom method

Given that the above is garbled, it is not convenient to view, so sometimes you need to customize some methods

  1. hex(int) Methods, turn the integer into a hexadecimal stringhex(rand())
  2. Code the under 64 coding with garbled -size string, then invert, and remove the "==" at the end
    Note: SQL is different from java, the number starts from 1, not 0

    substring (<Start Location>, <Law>) Starting position starts from 1, and it can be negative; length cannot be negative number

    For example:Substring (Reverse (Base64Encode (RANDOMSTRING (<String length>)), 3)), 3)
  3. Specify the character set, connect through Concat
    Because there is no cycle statement, how much is it, how much can I splicing by itself?
  • concat(str1,str2,str3...) Used to stitch multiple string, which is passed on to multiple string
  • ArrayStringconcat ([Str1, Str2, Str3 ...], <Division Faculty>) It is used to connect the string array together to support the custom separator
    -
    With <variable value> as <<variable name> Used at the forefront of SQL, used to declare variables
    The following statement is used to randomly extract a character in the string, and then stitches
WITH 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz123456789' AS asc2,
length(asc2) AS asc2Len
SELECT concat(	
		substring(asc2,rand() % asc2Len + 1, 1),
		substring(asc2,rand(null) % asc2Len + 1, 1),
		substring(asc2,rand(0) % asc2Len + 1, 1),
		substring(asc2,rand(1) % asc2Len + 1, 1),
		substring(asc2,rand(2) % asc2Len + 1, 1),
		substring(asc2,rand(now()) % asc2Len + 1, 1)
) AS randStr FROM numbers(10)

UUID

generateUUIDv4 Will generate8-4-4-4-12 Random UUID

enumerate

The definition of enumeration fields is<Field name> enum8 ('<Key1>' = 0, '<Key2>' = 1, ...)
The enumeration type of assignment method can be used by its key, or it can be used by its value

Array

  1. Random length
    RANGE (<Search of the maximum length>) , Note: The maximum length of the array is its parameter size

    range (rand () % <array length>) You can get a random length.
    Notice:a % b If you take the rest, you can only achieve the maximumb-1
  2. Random positive or backflow
    No in the clickhousetrue and falseTherefore, it is expressed in "0, null, null"false, "All Our Outs of 0" aretrue

    Arraysort's first function is an anonymous function
    SELECT arraySort((x) -> if(rand()>rand(0),1,-1)*x, [3,2,5,1,6,4,9,8,0]) FROM numbers(10)

One by one

Temporary table engine

Usually used for testing, it creates a temporary table and operates it, and automatically delete the table after the operation is completed

grammar:nullable (<Table structure>)

INSERT INTO function null('x UInt64') SELECT * FROM numbers_mt(1000000000);

The above sentence SQL is equivalent to the three sentences below

CREATE TABLE t (x UInt64) ENGINE = Null;
INSERT INTO t SELECT * FROM numbers_mt(1000000000);
DROP TABLE IF EXISTS t;

Test statement

INSERT INTO function null('x UInt64') SELECT * FROM numbers_mt(100000000)
INSERT INTO function null('x UInt64') SELECT power(number, 2) FROM numbers_mt(100000000)

Random number generation table engine

The official website address is:https://clickhouse.com/docs/en/sql-reference/table-functions/generate/

grammar:ENGINE = GenerateRandom(random_seed, max_string_length, max_array_length)

  • The random number of the Random_seed computer is pseudo -random number, so a random seed is required to identify the starting point. If you fill in NULL, you will use a random seed
  • max_string_length random strings (including this length). Default
  • max_array_length random array (including this length). Default

The engine does not support distributed formulas and does not support the following features

  • ALTER
  • SELECT … SAMPLE
  • INSERT
  • Indices
  • Replication

Create a "random number generating table"

CREATE DATABASE test_rand
DROP TABLE IF EXISTS test_rand.generate_engine_table
CREATE TABLE IF NOT EXISTS test_rand.generate_engine_table (id UInt32,
	name String,
	addr String,
	intValue UInt32,
	floatValue Float32,
	arr Array(UInt8),
	mydate DateTime
) ENGINE = GenerateRandom(null, 5, 3)


This is a special watch that can only be used for query. It is automatically generated and returned when you query, so the data of each query is different. You must limit the number of total queries, otherwise you will run all the time

For example SELECT * FROM test_rand.generate_engine_table LIMIT 20

INSERT INTO test5.testMergeTree(id, name, value, date) SELECT id, name, id, now() FROM test5.generate_engine_table LIMIT 1000000000

SSB performance test

Introduction of SSB (Star Schema Benchmark): Address:
https://www.cs.umb.edu/~poneil/StarSchemaB.PDF

Official website linkhttps://clickhouse.com/docs/en/getting-started/example-datasets/star-schema/

If the installation system is installed, it is often prompted that many commands do not exist

  • Prompt that git does not exist, useyum install git Just install
  • hint make: command not found, Use the following command to install makeyum install -y gcc gcc-c++ automake autoconf libtool make

SSB-DBGEN test tool github addresshttps://github.com/vadimtk/ssb-dbgen

Download and compile the test tool

git clone https://github.com/vadimtk/ssb-dbgen.git
cd ssb-dbgen
make

After that, you will generate dbgen executable files in the current directory

DBGEN is used to generate data required for TPCH to use
TPCH: The pressure measurement method for querying performance for database query
|

Order result
./dbgen -s 1 -T c customer.tbl
./dbgen -s 1 -T p part.tbl
./dbgen -s 1 -T s supplier.tbl
./dbgen -s 1 -T d date.tbl
./dbgen -s 1 -T l lineorder.tbl
./dbgen -s 1 -T a Generate all tables at one time

There is a collective referred to as sf 1sf == 1g here

[root@localhost dbgen]#./dbgen -S 1 ### Loading data 1g -S 1 == 1G#: But there is a disadvantage that if the amount of data loaded is particularly large, it takes a day to spend a day. So I use multi -threaded DBGEN method

#Methods as below:

#!/bin/sh

./dbgen -vfF -s 1000 -S 1 -C 16 &

./dbgen -vfF -s 1000 -S 2 -C 16 &

./dbgen -vfF -s 1000 -S 3 -C 16 &

./dbgen -vfF -s 1000 -S 4 -C 16 &

./dbgen -vfF -s 1000 -S 5 -C 16 &

./dbgen -vfF -s 1000 -S 6 -C 16 &

./dbgen -vfF -s 1000 -S 7 -C 16 &

./dbgen -vfF -s 1000 -S 8 -C 16 &

#: Detailed parameter explanation

-V details

-S means generating G data

-S cutting data

-F covered with previous files

#######

DSS.DDL This file stores the statement of the table

CAT DSS.DDL executes the table -building statements one by one

Centos 7 time zone settings

Timedatectl # View various states in the system time

TimeDatectl List-Timezones # List the time zone

Timedatectl Set-Local-RTC 1 # adjusts the hardware clock to consistent with the local clock, 0 is set to UTC time
TimeDatectl Set-Timezone Asia/SHANGHAI # Setting system time zone is Shanghai
In fact, the differentiation of the distribution versions is not considered. From the bottom level, modify the time zone is simpler than expected:

cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

Intelligent Recommendation

Python- Random Generate Data Faker

First, randomly generate phone numbers, Phone_Number Generate US phone by default, set parameters to build China District Output Second, randomly generate company names, Company Output Third, random g...

katalon: generate random test data

1. New keywords KEYWORDS Right -click the new keyword Second, call Part of the data generation reference https://www.likecs.com/show-441115.html...

Generate random data using Data-Factory-Core

Introduction Data-Factory-Core is used to automatically generate initialization information based on objects. Easy to test. Data-Factory-Core function: Supports 8 basic types of basic types Support ar...

Scanner (get data from the keyboard) and Random (generate random data)

Scanner usage Before learning this, we want to input data only to input fixed data, but the software we want to develop is to interact with users, so I learned this type of Scanner this time to allow ...

More Recommendation

Clickhouse generate calendar

Clickhouse's Date and DateTime's time range only support [1970-01-01 00:00:00, 2105-12-31 23:59:59].  ...

Manipulating Data - Generate random order with R

problem You want to randomize a data structure. Program note In order to make randomization repeatable, you should set up a random number generator. See:Numbers - generate random numbers、Numbers - gen...

Generate SQL Server random data using vs2010

Original: Generate SQL Server random data using vs2010 A few days ago to do test data, stumbled upon the existence of a random data generation function in vs2010, recorded, convenient for future use, ...

Python library Faker----Generate random test data

Python library: a module that randomly generates test data 1. Installation 2. Create Faker ① ② 3. Generate localized data Previously generated English data, by default locale = em_US, now change it to...

Generate random data using shell scripts

Requirements: 1. Use the build script to create a class library, m_class table, students table Write an insert script to perform a large amount of data insertion on the students table. The data meets ...

Copyright  DMCA © 2018-2026 - All Rights Reserved - www.programmersought.com  User Notice

Top