PostgreSQL vs. SQLite: read & write in multithreaded environment

The start was humble. I needed to cache some data, and I thought just push them to database table and give index, and the rest will be database's job. There were only 2 TEXT fields, and I needed to refer to only one field to search for specific row - which is some kind of key-value store -, so I thought whatever database engine should be fine.

And yes. It was a BIG mistake.

First I tried SQLite, and I found out that, in multithreaded environment some records are evaporated when trying to write to the table simultaneously, even with -DSQLITE_THREADSAFE=2 compile time option. I pushed the same data in same condition, and sometimes I have only 20 records, other times 40, and yet 26 for some others....... What drove me crazier was that the SQLite itself worked fine without any I/O problems. A good moment to shout "WHAT THE HELL?!" in real time.

So I changed the engine to PostgreSQL. Our trustworthy elephat friend saved all the records without any loss. I was satisfied with that, but...... Though I applied b-tree index to necessary field of the table, it took 100 milliseconds for just running SELECT field2 WHERE field1='something'. No, the table was small enough. There were only 680 records and data lengh was at most 30 characters for field 1 and only 4 characters for field 2. I configured the engine with some optimization, so it worked fine for bigger tables so I felt assured for its performance, but I didn't expect something like this, even in my dreams.

Elephant is tough, but as a side effect it's too slow.......

So, one last chance: I ran pg_dump to move data from PostgreSQL to SQLite, and with same condition(same index, same table structure, ......), I turned on at .timer SQLite shell and it took less than 0.001 second. Yohoo!

After some more experiments, SQLite can't fully resist from data loss by itself even with multithread support option enabled, and you need more external support like std::mutex. I guess that it's fread() call doesn't support full serialization in multithread environment, but I have neither time nor abilities to do the proper inspection. :P

Anyway, now I use the combination of SQLite + WAL mode + more SQLite internal cache + std::mutex. Still the write performance looks good, but if needed, I think I could use more files with load balancing via non-cryptographic hash.

No comments:

Post a Comment

블로그를 이전합니다

뭐, 이런 작은 변방의 블로그에 관심있으신 분들은 아무도 없으시리라 생각합니다만...... (웃음) 블로그 플랫폼을 블로거에서 dev.to로 옮겼습니다. 새 URL은 아래와 같습니다: https://dev.to/teminian 새로운 거처에서 뵙겠습니...

Popular in Code{nested}