Large data import with event on match

Hi,
We’re building a import template to bring in a large amount of data, upwards of 100k rows. It has an ‘Event triggered by create’ set that relates each created row to an existing table (as it’s a complex match, requiring separate db searches an API calls in a code studio event action).
When testing with dummy data, the above works fine for 20 or so rows, but with files of 50+ not every row is processed in the expected way. I assume this is to do with time-outs or race conditions on the server?
My question is, with larger data sets, would the right approach be to run the import with no event triggered, then run a signal to fire the matching event separately on those records that have no relation?
Thanks, James

Hi James, for files of that size I would recommend a staged approach. Doing the import and processing each record at the same time could take hours (even days) potentially using server resource for extended periods.

Get the data into the Application as Stage 1, then stage 2 process it.

If you have Parallel Processing switched on for the Import then records will be processed out of order. If records are dependent upon previous records in your Import file then this needs to be switched off so they are processed in import file order.

After Stage 1 you could provide the User the ability to check all imported data is correct, you could also provide some validation checks as well (record already exists, data has changed for this record etc). Then the User can click a button to process the data.

So each record needs other events triggering, if you need a number of rules to be run in a certain order on a record make sure they have the same trigger and all are running in the background.

Other things to consider is the API synchronous or asynchronous? Is further processing dependent upon the response?

Hi Ian,

There’s only one rule to run but its an event action to create a relationship and has a few things that need to happen in order. One is an API call to the OS to get a uprn from a text address.
The API is (needs to be) syncronous as the return value its used afterwards. As it’s in an event action using a cs.curl:
let rsp = cs.curl(endpoint, JSON.stringify(data), …
I assume this is blocking code (so that it waits for rsp), I’m not aware of any other way of doing a call within an event action?
What we want to check is that after import, a signal running this rule won’t be subject to the server timing out - so we don’t mind how long it takes as long as each record gets fully processed (and that the API isn’t a race condition in the event action)

Hi James, just wondered what was the need for an Event Action for this? Adding a relationship and triggering an API call can all be done with normal Build unless there is some other processing you need to do. Assume this is your ‘complex match’? Not doable in normal build?

In terms of your Event Action is it the same data that is not processed as expected or does it vary? That would give a clue as to whats triggering/being processed out of the required order. You can always log things to Detective to follow the code processing per record when testing if you are not already doing that.

If the rule is backgrounded I wouldnt have thought timing out would be an issue with this.

It’s an event action as its a person match on a number of varying criteria with a scoring system etc.
I’ll give a larger test file a go with move the rule to a signal, and switch on run in background.

Thanks,

James