In my firm we purchased a brand new server
AMD EPYC 7262 - Eight core
64 GB RAM
Intel SSD DC P4510 (2 TB) - Devoted just for the database
Once I benchmarked disk, I obtained to hurry as much as 2GB/s for writing and 3GB/s for studying.
Then I attempted efficiency in Postgres.
I created a desk (2 columns integer, 5 columns textual content, no index) and crammed it with a loop. Common textual content size round 500 characters. And that is pace for 10 000 000 rows.Some 250 MB/s of writing pace.
Okay, so I guessed perhaps the loop shouldn’t be one of the simplest ways for writing in Postgres and I attempted (This time with 20 000 000 rows.)
choose * into table2 from table1;
then with two unbiased tables (on the identical time = in parallel)
choose * into table3 from table1;
choose * into table4 from table2;
After which with 4 tables on the identical time.
And outcomes had been disappointing, as a result of the time required to complete the operation doubled (evaluating one insert vs 4) and writing pace was getting sooner however with poor scaling, not even near 2x and 4x pace.
I can perceive that one insert (aka single thread) will not be capable of make the most of your complete disk however why does unbiased insert operations slowing one another when {hardware} is clearly not the bottleneck and has a lot assets left?
Following printscreens are comparability for the choose * into statements I did, exhibiting the time on X axis and pace on Y axis. Then there’s disk utilization (means beneath disk capabilities ) and lastly CPU utilization.
I requested a bit related query right here
https://stackoverflow.com/questions/59846912/postgresql-drop-in-performance-with-multiple-running-functions-without-hardware
The place @jjanes informed me about Postgres creating “snapshots” and transaction isolation, and utilizing “repeatable learn” to get higher efficiency. However this resolution cannot be utilized right here, however I suppose the gist of the issue is comparable.
I’d be glad if anyone with higher information would give me some hints find out how to tweak efficiency, as a result of we’re working a number of (unbiased) processes on server fairly often heavy studying and writing and with this poor scaling we’re working them sluggish whereas costly {hardware} is simply chilling.
PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1
20190507 (Crimson Hat 8.3.1-4), 64-bitMonitoring software program: Zabbix
EDITS:
Requests from remark part
1) Code for loop which I used to fill the desk, I used the primary one with md5 and random. I additionally determined to attempt it with out them to see if they might decelerate the method however once I did not use them exact opposite occurred and write pace drop from earlier cca. 250MB/s to round 6MB/s and time to complete the operation multiplied. I truncated the desk and repeated it a number of occasions (even on totally different server with totally different postgres model) however consequence had been the identical. No thought what’s inflicting this unusual behaviour.
create or change operate f_zapis (p_pocet_radku int)
returns integer as $physique$
declare
v_exc_context textual content;
p_cislo int := 123456789;
p_text1 textual content :='';
p_text2 textual content :='';
p_text3 textual content :='';
p_text4 textual content :='';
p_text5 textual content :='';
p_vata textual content :='';
start
p_vata =
$$
orem Ipsum is just dummy textual content of the printing and typesetting trade. Lorem Ipsum has been the trade's normal dummy textual content ever because the 1500s, when an unknown printer took a galley of sort and scrambled it to make a kind specimen e-book. It has survived not solely 5 centuries, but additionally the leap into digital typesetting, remaining primarily unchanged. It was popularised within the 1960s with the discharge of Letraset sheets containing Lorem Ipsum passages, and extra lately with desktop publishing software program like Aldus PageMaker together with variations of Lorem Ipsum.'
$$
;
for i in 1..p_pocet_radku loop
choose (random() * 1000000)::int into p_cislo;
SELECT md5(random()::textual content) into p_text1;
SELECT md5(random()::textual content) into p_text2;
SELECT md5(random()::textual content) into p_text3;
SELECT md5(random()::textual content) into p_text4;
SELECT md5(random()::textual content) into p_text5;
p_text1 = p_text1 || p_vata;
p_text2 = p_text2 || p_vata;
p_text3 = p_text3 || p_vata;
p_text4 = p_text4 || p_vata;
p_text5 = p_text5 || p_vata;
insert into test_2_zapis (id,cislo,pismena1,pismena2,pismena3,pismena4,pismena5) values(i,p_cislo,p_text1,p_text2,p_text3,p_text4,p_text5);
finish loop;
return 1;
exception
when others then
get stacked diagnostics v_exc_context = pg_exception_context;
carry out primary.ut_log('ERR', sqlerrm || ' [SQL State: ' || sqlstate || '] Context: ' || v_exc_context );
elevate;
finish;
$physique$ language plpgsql
truncate desk test_2_zapis;
choose * from f_zapis(1000); --finished in 0.084sec (0.077s, 0.078s)
create or change operate f_zapis (p_pocet_radku int)
returns integer as $physique$
declare
v_exc_context textual content;
p_cislo int := 123456789;
p_text1 textual content :='';
p_text2 textual content :='';
p_text3 textual content :='';
p_text4 textual content :='';
p_text5 textual content :='';
p_vata textual content :='';
start
p_vata =
$$
orem Ipsum is just dummy textual content of the printing and typesetting trade. Lorem Ipsum has been the trade's normal dummy textual content ever because the 1500s, when an unknown printer took a galley of sort and scrambled it to make a kind specimen e-book. It has survived not solely 5 centuries, but additionally the leap into digital typesetting, remaining primarily unchanged. It was popularised within the 1960s with the discharge of Letraset sheets containing Lorem Ipsum passages, and extra lately with desktop publishing software program like Aldus PageMaker together with variations of Lorem Ipsum.'
$$
;
for i in 1..p_pocet_radku loop
/*
choose (random() * 1000000)::int into p_cislo;
SELECT md5(random()::textual content) into p_text1;
SELECT md5(random()::textual content) into p_text2;
SELECT md5(random()::textual content) into p_text3;
SELECT md5(random()::textual content) into p_text4;
SELECT md5(random()::textual content) into p_text5;
*/
p_text1 = p_text1 || p_vata;
p_text2 = p_text2 || p_vata;
p_text3 = p_text3 || p_vata;
p_text4 = p_text4 || p_vata;
p_text5 = p_text5 || p_vata;
insert into test_2_zapis (id,cislo,pismena1,pismena2,pismena3,pismena4,pismena5) values(i,p_cislo,p_text1,p_text2,p_text3,p_text4,p_text5);
finish loop;
return 1;
exception
when others then
get stacked diagnostics v_exc_context = pg_exception_context;
carry out primary.ut_log('ERR', sqlerrm || ' [SQL State: ' || sqlstate || '] Context: ' || v_exc_context );
elevate;
finish;
$physique$ language plpgsql
truncate desk test_2_zapis;
choose * from f_zapis(1000); -- completed in 7.282 sec (7.158s, 7.187s)
------------------------------------------------------
create desk test_2_zapis (
id bigint,
cislo int,
pismena1 textual content,
pismena2 textual content,
pismena3 textual content,
pismena4 textual content,
pismena5 textual content
)
2) Desk with time outcomes for choose into statements, all circumstances are identical as described above.
- Desk dimension 20,000,000 rows
- If a number of selects run on the identical time every has it personal supply and output desk
- All tables has the identical knowledge
- I used to be the one consumer utilizing the database at the moment
Notes:
- time is measured in seconds
- I run the assessments a number of occasions, column Variety of assessments
- consequence had been fairly constant, you may see it from MIN,MAX,AVG occasions
- outcomes are counted on avg values
- if run parallel cases I measured time of every thread (as proven within the desk), after which sum them collectively
- Perfect scaling relies on variety of threads
- For actual scaling the bottom is single thread, which is 891 sec, for multithreads it’s how a lot is finished by one thread after which multiplied by variety of threads
instance Four threads has pace of 1381 seconds per thread, so one thread does 64,5% of the bottom single thread, then a number of by 4 (variety of threads) and we now have 258% achieved, which means the scaling 2,58
3) Wait occasions (for choose into statements)
Wait occasions time line (single thread)
https://ibb.co/K9jWLRx
Wait occasion time line multi-thread (time line for one out of 4 threads)
https://ibb.co/j8wxHYB