Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
OK so I have a Bronze Datalake and a Silver Data Lake
In Silver I have a Parquet File of processed file names e.g.
Proja.csv
Projb.csv
Projc.csv
Projd.csv
And in the dataflow I have a Get MetaData activity conntected to the childitems in my Bronze datalake. So its finding files
lookup.csv
Proja.csv
Projb.csv
Projc.csv
Projd.csv
Proje.csv (Which is the new file)
I then have a filter to remove the Lookup.csv file
Solved! Go to Solution.
Perhaps you can do similar like below:
I have a Bronze Lakehouse and a Silver Lakehouse.
The files in my Bronze Lakehouse are as follows:
The files in my Silver Lakehouse are as follows:
I made a pipeline like this:
The Get Metadata activities get the Child items metadata from the File folder in Bronze lakehouse and Silver lakehouse, respectively.
The Filter activity removes the lookup.csv file from the output of the metadata activity from Bronze lakehouse:
Items: @activity('Get Metadata Bronze').output.childItems
Condition: @not(equals(item().name, 'lookup.csv'))
The Items in the ForEach activity is the output from the Filter activity:
The If Condition inside the ForEach activity:
Expression: @contains(activity('Get Metadata Silver').output.childItems, item())
The Copy activity if the If Condition is False:
After I run the pipeline, the Proje.csv file has been copied to Silver:
I don't know if Fabric Data Pipeline has any limits (like output size, number of items in collection, number of items in foreach activity, etc.) which needs to be taken into consideration or it can result in pipeline failure or unexpected results if the number of files in any of the folders grow above the limits.
Perhaps you can do similar like below:
I have a Bronze Lakehouse and a Silver Lakehouse.
The files in my Bronze Lakehouse are as follows:
The files in my Silver Lakehouse are as follows:
I made a pipeline like this:
The Get Metadata activities get the Child items metadata from the File folder in Bronze lakehouse and Silver lakehouse, respectively.
The Filter activity removes the lookup.csv file from the output of the metadata activity from Bronze lakehouse:
Items: @activity('Get Metadata Bronze').output.childItems
Condition: @not(equals(item().name, 'lookup.csv'))
The Items in the ForEach activity is the output from the Filter activity:
The If Condition inside the ForEach activity:
Expression: @contains(activity('Get Metadata Silver').output.childItems, item())
The Copy activity if the If Condition is False:
After I run the pipeline, the Proje.csv file has been copied to Silver:
I don't know if Fabric Data Pipeline has any limits (like output size, number of items in collection, number of items in foreach activity, etc.) which needs to be taken into consideration or it can result in pipeline failure or unexpected results if the number of files in any of the folders grow above the limits.
If there is a more efficient way to compare the two collections of child items from Get Metadata Silver and Get Metadata Bronze and return the items which only exist in the Get Metadata Bronze, then I would like to know.
(I am thinking if there exists some kind of anti join functionality, or similar?
Perhaps some way to do one array minus another array, which keeps only the items which are only in the first array?)
In my solution, I am using the ForEach activity with an IF condition inside to achieve a similar effect.
If you want to use the lookup.csv file to lookup which files don't need to be processed again (instead of using the file names in the Silver lakehouse directory for this purpose):
In my case, the lookup.csv file has the following content:
The 'ForEach LookupFileRow' activity:
Items: @activity('Get Lookup File Content').output.value
The 'Append varLookupFileNames' activity inside the 'ForEach LookupFileRow' activity:
The 'IF Condition' inside the 'ForEach' activity:
Expression: @contains(variables('varLookupFileNames'), item().name)
Otherwise similar like the previous example pipeline.
I don't know if Fabric Data Pipeline has any size limits (like output size, number of items in collection, number of items in foreach activity, result size in lookup activity, etc.) which needs to be taken into consideration or it can result in pipeline failure or unexpected results if the number of files in any of the folders grow above the limits.
For example, the Lookup activity has some limitations:
HI @DebbieE,
I think you need a template list or query result that used to compare with current items, or you can't define which not exist and use to filter.
Regards,
Xiaoxin Sheng
I would need some specific information to work with here for how I would go about that. This is all in a fabric pipeline
Hi @DebbieE,
Here is the document link about use dataflow in data pipeline, you can use M query editor to operation with query table records:
Use a dataflow in a pipeline - Microsoft Fabric | Microsoft Learn
Regards,
Xiaoxin Sheng
Check out the September 2024 Fabric update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
7 | |
4 | |
3 | |
2 | |
2 |
User | Count |
---|---|
18 | |
4 | |
4 | |
3 | |
3 |