tags: # ClickHouse clickhouse
Clickhouse is a very high -performance OLAP database. Because it is not like the traditional OLTP database (such as MySQL, Oracle, etc.), it has no functions, no storage procedures, and no cycle statement. Therefore, the way to create a random value is also different. Today I will talk about its usage.
First, enter/usr/bin/clickhouse-client --host localhost --port 9000 Enter the command line mode
In theSELECT rand Later, press Enter, there will be a prompt
There are 4 types in total


randConstant() In the same SQL, the values obtained are consistent.SELECT rand(), rand32(), rand64(), randConstant() FROM numbers(10)
Since the clickhouse does not provide a method of random floating point, it can only be simulated in some ways
WITH rand() AS value, length(toString(value)) AS len SELECT value / power(10, rand(0) % (len+1))With <value> as <<variable> Used in the front of the query function, used to define variableslength (<star>) Used to get string lengthtoString() You can convert the object to a stringFROM numbers(n) Can be used for recycling
toDecimal64(rand(now()) / rand(0), 3)TODECIMAL (<Floating point number>, <keeping a few decimal numbers>)
There are 4 random methods that come with the comes with Clickhouse

WITH 10 AS len SELECT randomFixedString(len), randomString(len), randomPrintableASCII(len), randomStringUTF8(len) FROM numbers(20)
randomPrintableASCII() Others contain a lot of garbled code
length() The function returns is not a string length, but the byte length occupied by the string
length (<star length>) What you get is the byte length occupied by the string, not a string lengthWITH 10 AS len
SELECT
min(length(randomFixedString(len))),max(length(randomFixedString(len))),
min(length(randomString(len))), max(length(randomString(len))),
min(length(randomPrintableASCII(len))), max(length(randomPrintableASCII(len))),
min(length(randomStringUTF8(len))), max(length(randomStringUTF8(len)))
FROM numbers(10000000)

Given that the above is garbled, it is not convenient to view, so sometimes you need to customize some methods
hex(int) Methods, turn the integer into a hexadecimal stringhex(rand())

substring (<Start Location>, <Law>) Starting position starts from 1, and it can be negative; length cannot be negative number
Substring (Reverse (Base64Encode (RANDOMSTRING (<String length>)), 3)), 3)
concat(str1,str2,str3...) Used to stitch multiple string, which is passed on to multiple stringArrayStringconcat ([Str1, Str2, Str3 ...], <Division Faculty>) It is used to connect the string array together to support the custom separator
With <variable value> as <<variable name> Used at the forefront of SQL, used to declare variablesWITH 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz123456789' AS asc2,
length(asc2) AS asc2Len
SELECT concat(
substring(asc2,rand() % asc2Len + 1, 1),
substring(asc2,rand(null) % asc2Len + 1, 1),
substring(asc2,rand(0) % asc2Len + 1, 1),
substring(asc2,rand(1) % asc2Len + 1, 1),
substring(asc2,rand(2) % asc2Len + 1, 1),
substring(asc2,rand(now()) % asc2Len + 1, 1)
) AS randStr FROM numbers(10)

generateUUIDv4 Will generate8-4-4-4-12 Random UUID

The definition of enumeration fields is<Field name> enum8 ('<Key1>' = 0, '<Key2>' = 1, ...)
The enumeration type of assignment method can be used by its key, or it can be used by its value
RANGE (<Search of the maximum length>) , Note: The maximum length of the array is its parameter size
range (rand () % <array length>) You can get a random length.a % b If you take the rest, you can only achieve the maximumb-1
true and falseTherefore, it is expressed in "0, null, null"false, "All Our Outs of 0" aretrue
SELECT arraySort((x) -> if(rand()>rand(0),1,-1)*x, [3,2,5,1,6,4,9,8,0]) FROM numbers(10)
Usually used for testing, it creates a temporary table and operates it, and automatically delete the table after the operation is completed
grammar:nullable (<Table structure>)
INSERT INTO function null('x UInt64') SELECT * FROM numbers_mt(1000000000);
The above sentence SQL is equivalent to the three sentences below
CREATE TABLE t (x UInt64) ENGINE = Null;
INSERT INTO t SELECT * FROM numbers_mt(1000000000);
DROP TABLE IF EXISTS t;
Test statement
INSERT INTO function null('x UInt64') SELECT * FROM numbers_mt(100000000)
INSERT INTO function null('x UInt64') SELECT power(number, 2) FROM numbers_mt(100000000)

The official website address is:https://clickhouse.com/docs/en/sql-reference/table-functions/generate/
grammar:ENGINE = GenerateRandom(random_seed, max_string_length, max_array_length)
The engine does not support distributed formulas and does not support the following features
Create a "random number generating table"
CREATE DATABASE test_rand
DROP TABLE IF EXISTS test_rand.generate_engine_table
CREATE TABLE IF NOT EXISTS test_rand.generate_engine_table (id UInt32,
name String,
addr String,
intValue UInt32,
floatValue Float32,
arr Array(UInt8),
mydate DateTime
) ENGINE = GenerateRandom(null, 5, 3)

This is a special watch that can only be used for query. It is automatically generated and returned when you query, so the data of each query is different. You must limit the number of total queries, otherwise you will run all the time
For example SELECT * FROM test_rand.generate_engine_table LIMIT 20

INSERT INTO test5.testMergeTree(id, name, value, date) SELECT id, name, id, now() FROM test5.generate_engine_table LIMIT 1000000000
Introduction of SSB (Star Schema Benchmark): Address:
https://www.cs.umb.edu/~poneil/StarSchemaB.PDF

Official website linkhttps://clickhouse.com/docs/en/getting-started/example-datasets/star-schema/
If the installation system is installed, it is often prompted that many commands do not exist
yum install git Just installmake: command not found, Use the following command to install makeyum install -y gcc gcc-c++ automake autoconf libtool makeSSB-DBGEN test tool github addresshttps://github.com/vadimtk/ssb-dbgen
Download and compile the test tool
git clone https://github.com/vadimtk/ssb-dbgen.git
cd ssb-dbgen
make
After that, you will generate dbgen executable files in the current directory

DBGEN is used to generate data required for TPCH to use
TPCH: The pressure measurement method for querying performance for database query
|
| Order | result |
|---|---|
./dbgen -s 1 -T c |
customer.tbl |
./dbgen -s 1 -T p |
part.tbl |
./dbgen -s 1 -T s |
supplier.tbl |
./dbgen -s 1 -T d |
date.tbl |
./dbgen -s 1 -T l |
lineorder.tbl |
./dbgen -s 1 -T a |
Generate all tables at one time |
There is a collective referred to as sf 1sf == 1g here
[root@localhost dbgen]#./dbgen -S 1 ### Loading data 1g -S 1 == 1G#: But there is a disadvantage that if the amount of data loaded is particularly large, it takes a day to spend a day. So I use multi -threaded DBGEN method
#Methods as below:
#!/bin/sh
./dbgen -vfF -s 1000 -S 1 -C 16 &
./dbgen -vfF -s 1000 -S 2 -C 16 &
./dbgen -vfF -s 1000 -S 3 -C 16 &
./dbgen -vfF -s 1000 -S 4 -C 16 &
./dbgen -vfF -s 1000 -S 5 -C 16 &
./dbgen -vfF -s 1000 -S 6 -C 16 &
./dbgen -vfF -s 1000 -S 7 -C 16 &
./dbgen -vfF -s 1000 -S 8 -C 16 &
#: Detailed parameter explanation
-V details
-S means generating G data
-S cutting data
-F covered with previous files
#######
DSS.DDL This file stores the statement of the table
CAT DSS.DDL executes the table -building statements one by one
Timedatectl # View various states in the system time
Timedatectl Set-Local-RTC 1 # adjusts the hardware clock to consistent with the local clock, 0 is set to UTC time
TimeDatectl Set-Timezone Asia/SHANGHAI # Setting system time zone is Shanghai
In fact, the differentiation of the distribution versions is not considered. From the bottom level, modify the time zone is simpler than expected:
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
First, randomly generate phone numbers, Phone_Number Generate US phone by default, set parameters to build China District Output Second, randomly generate company names, Company Output Third, random g...
1. New keywords KEYWORDS Right -click the new keyword Second, call Part of the data generation reference https://www.likecs.com/show-441115.html...
Introduction Data-Factory-Core is used to automatically generate initialization information based on objects. Easy to test. Data-Factory-Core function: Supports 8 basic types of basic types Support ar...
Scanner usage Before learning this, we want to input data only to input fixed data, but the software we want to develop is to interact with users, so I learned this type of Scanner this time to allow ...
Clickhouse's Date and DateTime's time range only support [1970-01-01 00:00:00, 2105-12-31 23:59:59]. ...
problem You want to randomize a data structure. Program note In order to make randomization repeatable, you should set up a random number generator. See:Numbers - generate random numbers、Numbers - gen...
Original: Generate SQL Server random data using vs2010 A few days ago to do test data, stumbled upon the existence of a random data generation function in vs2010, recorded, convenient for future use, ...
Python library: a module that randomly generates test data 1. Installation 2. Create Faker ① ② 3. Generate localized data Previously generated English data, by default locale = em_US, now change it to...
Requirements: 1. Use the build script to create a class library, m_class table, students table Write an insert script to perform a large amount of data insertion on the students table. The data meets ...