クエリの実行速度の実験などで100万件規模のレコードを操作したい場合の環境を構築する方法をまとめました。
環境構築
コンテナ作成
$ docker run --platform linux/x86_64 -p 3307:3306 -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql:latest $ mysql -uroot -pmy-secret-pw -h127.0.0.1 -P3307
DB作成
$ CREATE DATABASE sample; $ USE sample;
データ挿入
考え方
単純にselectによる数値の表示を実行してみます。
この結果はカラム名が2で、レコードの個数が1で、そのレコードのカラム名2の値が2という結果が表示されます。
mysql> SELECT 2; +---+ | 2 | +---+ | 2 | +---+ 1 row in set (0.00 sec)
次に同様にSELECT 1したカラムと縦に結合します。
これでレコードが2つ存在し、そのカラム名2という値がそれぞれ2と1であるレコードであるテーブルが出来上がります。
UNIONでもUNION ALLでも結果は変わりません。
mysql> SELECT 2 UNION SELECT 1; +---+ | 2 | +---+ | 2 | | 1 | +---+ 2 rows in set (0.00 sec)
上記と同じビューをもう一つ用意しCROSS JOINするとレコード数 x レコード数の2x2=4レコードの結果が帰ってきます。
mysql> select * from (select 1 union all select 2) t1, (select 3 union all select 4) t2; +---+---+ | 1 | 3 | +---+---+ | 2 | 3 | | 1 | 3 | | 2 | 4 | | 1 | 4 | +---+---+ 4 rows in set (0.00 sec)
このようにCROSS JOINを重ねていくと指数的にレコード数が増加していくテーブルが得られます。
最初のテーブルのレコードが2でN回CROSS JOINすると2^N個のレコードがあるテーブルができます。
4回CROSS JOINすると下記のような結果になります。
select * from (select 1 union all select 2) t1, (select 3 union all select 4) t2, (select 5 union all select 6) t3, (select 7 union all select 8) t4; +---+---+---+---+ | 1 | 3 | 5 | 7 | +---+---+---+---+ | 2 | 3 | 6 | 7 | | 1 | 3 | 6 | 7 | | 2 | 4 | 6 | 7 | | 1 | 4 | 6 | 7 | | 2 | 3 | 5 | 7 | | 1 | 3 | 5 | 7 | | 2 | 4 | 5 | 7 | | 1 | 4 | 5 | 7 | | 2 | 3 | 6 | 8 | | 1 | 3 | 6 | 8 | | 2 | 4 | 6 | 8 | | 1 | 4 | 6 | 8 | | 2 | 3 | 5 | 8 | | 1 | 3 | 5 | 8 | | 2 | 4 | 5 | 8 | | 1 | 4 | 5 | 8 | +---+---+---+---+ 16 rows in set (0.01 sec)
16回繰り返すと65536個のレコードができます。
select COUNT(*) from (select 1 union all select 1) as t1, (select 1 union all select 1) as t2, (select 1 union all select 1) as t3, (select 1 union all select 1) as t4, (select 1 union all select 1) as t5, (select 1 union all select 1) as t6, (select 1 union all select 1) as t7, (select 1 union all select 1) as t8, (select 1 union all select 1) as t9, (select 1 union all select 1) as t10, (select 1 union all select 1) as t11, (select 1 union all select 1) as t12, (select 1 union all select 1) as t13, (select 1 union all select 1) as t14, (select 1 union all select 1) as t15, (select 1 union all select 1) as t16; +----------+ | COUNT(1) | +----------+ | 65536 | +----------+ 1 row in set (0.09 sec)
テーブルの作成
テーブルの構造はこちらの記事のSQLを使わせてもらいました。
qiita.com
CREATE TABLE `users` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL, `email` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL, `email_verified_at` timestamp NULL DEFAULT NULL, `password` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL, `remember_token` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `created_at` timestamp NULL DEFAULT NULL, `updated_at` timestamp NULL DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `users_email_unique` (`email`) ) ENGINE=InnoDB AUTO_INCREMENT=1048561 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
MySQL8.0ではWindow関数が使えるため行数を取得する際にROW_NUMBER()を使えますので、挿入クエリは書き直しました。
下記のSQLでCROSS JOINしたテーブルに対して行数を取得しました。
Window関数のORDER BY句でカラムを指定する際にDuplicateErrorを起こさないように1,2,3...と増やしてみました。
mysql> SELECT ROW_NUMBER() OVER (ORDER BY `1`), t.* FROM (SELECT * FROM ((SELECT 1 UNION ALL SELECT 1) t1, (SELECT 2 UNION ALL SELECT 2) t2, (SELECT 3 UNION ALL SELECT 3) t3)) t; +----------------------------------+---+---+---+ | ROW_NUMBER() OVER (ORDER BY `1`) | 1 | 2 | 3 | +----------------------------------+---+---+---+ | 1 | 1 | 2 | 3 | | 2 | 1 | 2 | 3 | | 3 | 1 | 2 | 3 | | 4 | 1 | 2 | 3 | | 5 | 1 | 2 | 3 | | 6 | 1 | 2 | 3 | | 7 | 1 | 2 | 3 | | 8 | 1 | 2 | 3 | +----------------------------------+---+---+---+ 8 rows in set (0.00 sec)
行数と乱数を用いてダミーデータを作成していきます。
SELECT concat('名前', p.no) as name, concat('test', p.no, '@example.com') as email, now() as email_verified_at, SUBSTRING(MD5(RAND()), 1, 10) AS remeber_token, now() AS created_at, now() AS updated_at FROM ( SELECT ROW_NUMBER() OVER (ORDER BY `1`) AS no, t.* FROM ( SELECT * FROM ( (SELECT 1 UNION ALL SELECT 1) t1, (SELECT 2 UNION ALL SELECT 2) t2, (SELECT 3 UNION ALL SELECT 3) t3 ) ) t ) p; +---------+-------------------+---------------------+---------------+---------------------+---------------------+ | name | email | email_verified_at | remeber_token | created_at | updated_at | +---------+-------------------+---------------------+---------------+---------------------+---------------------+ | 名前1 | test1@example.com | 2023-03-23 15:41:29 | dce811782b | 2023-03-23 15:41:29 | 2023-03-23 15:41:29 | | 名前2 | test2@example.com | 2023-03-23 15:41:29 | df24dd546f | 2023-03-23 15:41:29 | 2023-03-23 15:41:29 | | 名前3 | test3@example.com | 2023-03-23 15:41:29 | 915b4a2d90 | 2023-03-23 15:41:29 | 2023-03-23 15:41:29 | | 名前4 | test4@example.com | 2023-03-23 15:41:29 | b327bbf9fd | 2023-03-23 15:41:29 | 2023-03-23 15:41:29 | | 名前5 | test5@example.com | 2023-03-23 15:41:29 | 313d8ad7cf | 2023-03-23 15:41:29 | 2023-03-23 15:41:29 | | 名前6 | test6@example.com | 2023-03-23 15:41:29 | 147874960a | 2023-03-23 15:41:29 | 2023-03-23 15:41:29 | | 名前7 | test7@example.com | 2023-03-23 15:41:29 | ff2806dd4b | 2023-03-23 15:41:29 | 2023-03-23 15:41:29 | | 名前8 | test8@example.com | 2023-03-23 15:41:29 | 5b4066130c | 2023-03-23 15:41:29 | 2023-03-23 15:41:29 | +---------+-------------------+---------------------+---------------+---------------------+---------------------+ 8 rows in set (0.01 sec)
この結果をテーブルに入れていけばダミーデータをダミーデータが入ったテーブルを作成できます。
SELECT文の結果をtableに挿れるには下記の構文でできます。
INSERT INTO tbl (col1, col2) SELECT col1, col2 FROM otherTbl;
したがって、ダミーデータを挿れる方法は
TRUNCATE users; INSERT INTO users ( name, email, email_verified_at, password, remember_token, created_at, updated_at ) SELECT concat('名前', p.no) as name, concat('test', p.no, '@example.com') as email, now() as email_verified_at, SHA1(p.no) AS password, SUBSTRING(MD5(RAND()), 1, 10) AS remeber_token, now() AS created_at, now() AS updated_at FROM ( SELECT ROW_NUMBER() OVER (ORDER BY `1`) AS no, t.* FROM ( SELECT * FROM ( (SELECT 1 UNION ALL SELECT 1) t1, (SELECT 2 UNION ALL SELECT 2) t2, (SELECT 3 UNION ALL SELECT 3) t3 ) ) t ) p; mysql> source e.sql Query OK, 0 rows affected (0.03 sec) Query OK, 8 rows affected (0.00 sec) Records: 8 Duplicates: 0 Warnings: 0 mysql> select * from users; +----+---------+-------------------+---------------------+------------------------------------------+----------------+---------------------+---------------------+ | id | name | email | email_verified_at | password | remember_token | created_at | updated_at | +----+---------+-------------------+---------------------+------------------------------------------+----------------+---------------------+---------------------+ | 1 | 名前1 | test1@example.com | 2023-03-23 15:47:38 | 356a192b7913b04c54574d18c28d46e6395428ab | c165105765 | 2023-03-23 15:47:38 | 2023-03-23 15:47:38 | | 2 | 名前2 | test2@example.com | 2023-03-23 15:47:38 | da4b9237bacccdf19c0760cab7aec4a8359010b0 | fbc26e91d1 | 2023-03-23 15:47:38 | 2023-03-23 15:47:38 | | 3 | 名前3 | test3@example.com | 2023-03-23 15:47:38 | 77de68daecd823babbb58edb1c8e14d7106e83bb | 9a7b62a7b5 | 2023-03-23 15:47:38 | 2023-03-23 15:47:38 | | 4 | 名前4 | test4@example.com | 2023-03-23 15:47:38 | 1b6453892473a467d07372d45eb05abc2031647a | 76cf562c32 | 2023-03-23 15:47:38 | 2023-03-23 15:47:38 | | 5 | 名前5 | test5@example.com | 2023-03-23 15:47:38 | ac3478d69a3c81fa62e60f5c3696165a4e5e6ac4 | 384c6cb8d4 | 2023-03-23 15:47:38 | 2023-03-23 15:47:38 | | 6 | 名前6 | test6@example.com | 2023-03-23 15:47:38 | c1dfd96eea8cc2b62785275bca38ac261256e278 | d987435aaf | 2023-03-23 15:47:38 | 2023-03-23 15:47:38 | | 7 | 名前7 | test7@example.com | 2023-03-23 15:47:38 | 902ba3cda1883801594b6e1b452790cc53948fda | e3b503ad65 | 2023-03-23 15:47:38 | 2023-03-23 15:47:38 | | 8 | 名前8 | test8@example.com | 2023-03-23 15:47:38 | fe5dbbcea5ce7e2988b8c69bcfdfde8904aabc1f | ce935f3e5c | 2023-03-23 15:47:38 | 2023-03-23 15:47:38 | +----+---------+-------------------+---------------------+------------------------------------------+----------------+---------------------+---------------------+ 8 rows in set (0.00 sec)
実際に100万件のダミーデータを挿れてみる
では、SQL内部のt1, t2, t3の部分を増やして100万件のデータを入れてみましょう。
TRUNCATE users; INSERT INTO users ( name, email, email_verified_at, password, remember_token, created_at, updated_at ) SELECT concat('名前', p.no) as name, concat('test', p.no, '@example.com') as email, now() as email_verified_at, SHA1(p.no) AS password, SUBSTRING(MD5(RAND()), 1, 10) AS remeber_token, now() AS created_at, now() AS updated_at FROM ( SELECT ROW_NUMBER() OVER (ORDER BY `1`) AS no, t.* FROM ( SELECT * FROM ( (SELECT 1 UNION ALL SELECT 1) t1, -- 2 (SELECT 2 UNION ALL SELECT 1) t2, -- 4 (SELECT 3 UNION ALL SELECT 1) t3, -- 8 (SELECT 4 UNION ALL SELECT 1) t4, -- 16 (SELECT 5 UNION ALL SELECT 1) t5, -- 32 (SELECT 6 UNION ALL SELECT 1) t6, -- 64 (SELECT 7 UNION ALL SELECT 1) t7, -- 128 (SELECT 8 UNION ALL SELECT 1) t8, -- 256 (SELECT 9 UNION ALL SELECT 1) t9, -- 512 (SELECT 10 UNION ALL SELECT 1) t10, -- 1024 (SELECT 11 UNION ALL SELECT 1) t11, (SELECT 12 UNION ALL SELECT 1) t12, (SELECT 13 UNION ALL SELECT 1) t13, (SELECT 14 UNION ALL SELECT 1) t14, (SELECT 15 UNION ALL SELECT 1) t15, (SELECT 16 UNION ALL SELECT 1) t16, (SELECT 17 UNION ALL SELECT 1) t17, (SELECT 18 UNION ALL SELECT 1) t18, (SELECT 19 UNION ALL SELECT 1) t19, (SELECT 20 UNION ALL SELECT 1) t20 ) ) t ) p;
このSQLを実行し行数を数えた結果が下記になります。
26秒で挿入を終えました。
mysql> source f.sql Query OK, 0 rows affected (0.07 sec) Query OK, 1048576 rows affected (25.56 sec) Records: 1048576 Duplicates: 0 Warnings: 0 mysql> SELECT COUNT(1) FROM users; +----------+ | COUNT(1) | +----------+ | 1048576 | +----------+ 1 row in set (0.08 sec)
データはこのようになっています。
mysql> SELECT * FROM users LIMIT 5; +----+---------+-------------------+---------------------+------------------------------------------+----------------+---------------------+---------------------+ | id | name | email | email_verified_at | password | remember_token | created_at | updated_at | +----+---------+-------------------+---------------------+------------------------------------------+----------------+---------------------+---------------------+ | 1 | 名前1 | test1@example.com | 2023-03-23 15:59:17 | 356a192b7913b04c54574d18c28d46e6395428ab | c76a49a442 | 2023-03-23 15:59:17 | 2023-03-23 15:59:17 | | 2 | 名前2 | test2@example.com | 2023-03-23 15:59:17 | da4b9237bacccdf19c0760cab7aec4a8359010b0 | f27a41226b | 2023-03-23 15:59:17 | 2023-03-23 15:59:17 | | 3 | 名前3 | test3@example.com | 2023-03-23 15:59:17 | 77de68daecd823babbb58edb1c8e14d7106e83bb | c778ad3ca9 | 2023-03-23 15:59:17 | 2023-03-23 15:59:17 | | 4 | 名前4 | test4@example.com | 2023-03-23 15:59:17 | 1b6453892473a467d07372d45eb05abc2031647a | 5c59a754ad | 2023-03-23 15:59:17 | 2023-03-23 15:59:17 | | 5 | 名前5 | test5@example.com | 2023-03-23 15:59:17 | ac3478d69a3c81fa62e60f5c3696165a4e5e6ac4 | 548f11ce1b | 2023-03-23 15:59:17 | 2023-03-23 15:59:17 | +----+---------+-------------------+---------------------+------------------------------------------+----------------+---------------------+---------------------+ 5 rows in set (0.00 sec)
追記
最初にテーブルに10件データを挿れておいておけば挿入クエリがもっと楽に書けそうです。
下記の例だと10x10x10x10=10000=1万件のデータが挿入できそうです。
select * from tbl, tbl tbl2, tbl tbl3, tbl tbl4; insert into sample (select * from tbl, tbl tbl2, tbl tbl3, tbl tbl4);