3 Best Practices for Optimizing Wave Analytics Dataflows

For those of you that are already using Wave Analytics in a Production environment, you hopefully took a look at the Wave Data Monitor when scheduling your Dataflow.  If you’re working in a Full Sandbox or Production environment and running the Dataflow, you’re typically dealing with large data volumes.  In those scenarios, you really want to make sure your Dataflow is built correctly, because that is when you can start hitting some longer times to refresh your data.

Wave.png

Optimize!  Only Import Records Once

As of Summer ’17, this will be much easier to do!  Importing your list of Accounts three times into Wave is silly.  Reuse the “Extract Account” node for anything that is referencing the Account.  Don’t extract all of your Accounts more than once.  You can use a recipe to do the filtering (if any) that needs to be done on your Dataset.  As mentioned, with the new Dataflow builder, this is going to be much easier and not require any JSON code.  However, you’ve got to be aware of the potential issue and make sure you put the effort into reusing your different sfdcDigests that are extracting records.

bi_integrate_dataflow_editor_nodes_on_canvas.png

Use Incremental Loading Where Possible [Enable Replication (Winter ’17)]

Why would you want to bring in records that haven’t changed?  This allows you to speed up your Dataflow by reducing the records added in and only finding those recently modified.  By default, after Winter ’17, if you have Replication Enabled you’ll have this enabled.  This allows you to only update records that have changed since your Dataflow last ran.  This is a huge boost for speed.  If you have millions or hundreds of millions of records, you’ll greatly benefit from this feature.

wa_integrate_datamanager_replication.png

Don’t Bring In Every Field

If you add every Account field into your dataset, that is going to add more data that Wave has to grab.  While Wave is extremely fast, if you don’t need it… don’t bring it over!  You’re just slowing down how long your Dataflow takes to run, and adding in extra fields that you don’t need in your Lens.  From personal experience, any Long Area Text fields are the worst.

TLDR: Don’t bring in any extra records, extra fields, and turn on Replication (Incremental Loading).

 

 

 

 

 

 

 

 

 

 

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s