You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Looking into the performance of the TransformationSystem, and its DB in particular, the hotest spot is the DataFiles table.
The aim of this table is to deduplicate the LFN in the DB, so if multiple transformations are applied to the same file, the LFN is only stored once in this DataFiles, and the TransformationFiles just refers to it via foreign key.
When a lot of transformations are running, the DataFiles table can get big (currently 80M rows in LHCb). Queries we are running against it are of this type:
SELECT LFN,FileID FROM DataFiles WHERE LFN in ('a', 'b', 'c')
They can take up to half an hour in our case.
Effectively, the DataFiles:
is inefficient at querying (which we do very often, even to insert new files)
subject to race condition (the code tries to protect it at various places, but still)
I propose to remove the DataFiles table, and add an indexed LFN column to the TransformationFiles table. It may make the DB slightly bigger in size, but the performance will be dramatically improved.
The text was updated successfully, but these errors were encountered:
Looking into the performance of the
TransformationSystem
, and its DB in particular, the hotest spot is theDataFiles
table.The aim of this table is to deduplicate the LFN in the DB, so if multiple transformations are applied to the same file, the LFN is only stored once in this
DataFiles
, and theTransformationFiles
just refers to it via foreign key.When a lot of transformations are running, the
DataFiles
table can get big (currently 80M rows in LHCb). Queries we are running against it are of this type:They can take up to half an hour in our case.
Effectively, the
DataFiles
:I propose to remove the
DataFiles
table, and add an indexedLFN
column to theTransformationFiles
table. It may make the DB slightly bigger in size, but the performance will be dramatically improved.The text was updated successfully, but these errors were encountered: