Solved: How do I flatten columns into rows in Copy Data ac...

DevinL · ‎07-08-2024

I have a data pull that creates a new column for each row of data as follows:

This is creating a column for each row, but what I need is for it to create a row for each numbered array. Essentially, I need to remove the data.results.#. and have a row for each numbered array.

I tried changing the destination on the right from "data.results.0.amount" to "amount", "data.results.1.amount" to "amount", etc. but then it throws an error saying that there are duplicate columns, so that's not the way to go. The closest thing I could find online was something about a "Flatten activity" and "unroll" which I don't see available in Microsoft Fabric.

Any ideas on how to do this?

frithjof_v · ‎07-10-2024

I am curious why the data source is creating a data structure like this.

Is it a JSON data source?

It seems like instead of creating each record as a new json object (aka record), it creates a new duplicate of properties.

Is there a limit of how many columns this data source will create? Will it continue to increase the number of columns as time goes?

I think I would try to transpose the columns into rows, you would then get 2 columns with many rows. The first column would contain the original column names, and the second column would contain the values.

(Before you do the transpose, you would need to use the "Use headers as first row" option).

Then I would try to split the content of the first column into three columns, so you will get three columns with this content:

-"data.results."

- the number

- the attribute name

And you will also still have the column which contains the value.

Then I would remove the column which only contains the string "data.results" in each cell.

I would do a pivot on the column which contains the attribute names.

I think that could work.

Something like this:

(this code contains some dummy data I entered, you can paste this entire code in Advanced editor and see what I mean).

let
  Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlXSUTI0MACSxkBsDsSmII6hUmwsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [data.results.0.account_id = _t, data.results.0.amount = _t, data.results.0.invoice_id = _t, data.results.1.account_id = _t, data.results.1.amount = _t, data.results.1.invoice_id = _t]),
  #"Changed column type" = Table.TransformColumnTypes(Source, {{"data.results.0.account_id", Int64.Type}, {"data.results.0.amount", Int64.Type}, {"data.results.0.invoice_id", Int64.Type}, {"data.results.1.account_id", Int64.Type}, {"data.results.1.amount", Int64.Type}, {"data.results.1.invoice_id", Int64.Type}}),
  #"Demoted headers" = Table.DemoteHeaders(#"Changed column type"),
  #"Transposed table" = Table.Transpose(#"Demoted headers"),
  #"Split column by delimiter" = Table.SplitColumn(Table.TransformColumnTypes(#"Transposed table", {{"Column1", type text}}), "Column1", Splitter.SplitTextByDelimiter("."), {"Column1.1", "Column1.2", "Column1.3", "Column1.4"}),
  #"Changed column type 1" = Table.TransformColumnTypes(#"Split column by delimiter", {{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", Int64.Type}, {"Column1.4", type text}}),
  #"Removed columns" = Table.RemoveColumns(#"Changed column type 1", {"Column1.1", "Column1.2"}),
  #"Renamed columns" = Table.RenameColumns(#"Removed columns", {{"Column1.3", "RowID"}}),
  #"Pivoted column" = Table.Pivot(Table.TransformColumnTypes(#"Renamed columns", {{"Column1.4", type text}}), List.Distinct(Table.TransformColumnTypes(#"Renamed columns", {{"Column1.4", type text}})[Column1.4]), "Column1.4", "Column2"),
  #"Changed column type 2" = Table.TransformColumnTypes(#"Pivoted column", {{"RowID", Int64.Type}, {"account_id", Int64.Type}, {"amount", Int64.Type}, {"invoice_id", Int64.Type}})
in
  #"Changed column type 2"

View solution in original post

frithjof_v · ‎07-10-2024

Have you tried using Dataflow Gen2 instead?

(Or notebook)

I don't have much experience with Azure Data Factory, but what I read some time back is that Dataflows Gen2 in Fabric is supposed to cover the role of Mapping Dataflows.

DevinL · ‎07-10-2024

Yes, I've tried using dataflow gen2 but I can't seem to find a way to get these number array columns into rows. In Dataflow gen2 I can import the JSON file and it will expand all of the individual arrays. I don't see a way to put the data into rows instead of one row with many of columns.

I tried to repost this into the Synapse forum, because I didn't mean to post in the Data Factory forum, but then a very "helpful" person merged the topics and stuck it back into the Data Factry forum. XD

frithjof_v · ‎07-10-2024

Perhaps, under the Transform pane, you could utilize the Transpose or Pivot/Unpivot functions.

frithjof_v · ‎07-10-2024

I am curious why the data source is creating a data structure like this.

Is it a JSON data source?

It seems like instead of creating each record as a new json object (aka record), it creates a new duplicate of properties.

Is there a limit of how many columns this data source will create? Will it continue to increase the number of columns as time goes?

I think I would try to transpose the columns into rows, you would then get 2 columns with many rows. The first column would contain the original column names, and the second column would contain the values.

(Before you do the transpose, you would need to use the "Use headers as first row" option).

Then I would try to split the content of the first column into three columns, so you will get three columns with this content:

-"data.results."

- the number

- the attribute name

And you will also still have the column which contains the value.

Then I would remove the column which only contains the string "data.results" in each cell.

I would do a pivot on the column which contains the attribute names.

I think that could work.

Something like this:

(this code contains some dummy data I entered, you can paste this entire code in Advanced editor and see what I mean).

let
  Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlXSUTI0MACSxkBsDsSmII6hUmwsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [data.results.0.account_id = _t, data.results.0.amount = _t, data.results.0.invoice_id = _t, data.results.1.account_id = _t, data.results.1.amount = _t, data.results.1.invoice_id = _t]),
  #"Changed column type" = Table.TransformColumnTypes(Source, {{"data.results.0.account_id", Int64.Type}, {"data.results.0.amount", Int64.Type}, {"data.results.0.invoice_id", Int64.Type}, {"data.results.1.account_id", Int64.Type}, {"data.results.1.amount", Int64.Type}, {"data.results.1.invoice_id", Int64.Type}}),
  #"Demoted headers" = Table.DemoteHeaders(#"Changed column type"),
  #"Transposed table" = Table.Transpose(#"Demoted headers"),
  #"Split column by delimiter" = Table.SplitColumn(Table.TransformColumnTypes(#"Transposed table", {{"Column1", type text}}), "Column1", Splitter.SplitTextByDelimiter("."), {"Column1.1", "Column1.2", "Column1.3", "Column1.4"}),
  #"Changed column type 1" = Table.TransformColumnTypes(#"Split column by delimiter", {{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", Int64.Type}, {"Column1.4", type text}}),
  #"Removed columns" = Table.RemoveColumns(#"Changed column type 1", {"Column1.1", "Column1.2"}),
  #"Renamed columns" = Table.RenameColumns(#"Removed columns", {{"Column1.3", "RowID"}}),
  #"Pivoted column" = Table.Pivot(Table.TransformColumnTypes(#"Renamed columns", {{"Column1.4", type text}}), List.Distinct(Table.TransformColumnTypes(#"Renamed columns", {{"Column1.4", type text}})[Column1.4]), "Column1.4", "Column2"),
  #"Changed column type 2" = Table.TransformColumnTypes(#"Pivoted column", {{"RowID", Int64.Type}, {"account_id", Int64.Type}, {"amount", Int64.Type}, {"invoice_id", Int64.Type}})
in
  #"Changed column type 2"

frithjof_v · ‎07-10-2024

However, I see now that your source data seems to have a very high number of columns (aka attributes or properties): 1120. Will this number grow?

I'm not sure how well it will work in Dataflow gen2. (But I would give it a try). I'm not sure if there is a limit.

Perhaps it is possible to manipulate the json file directly so that the json gets structured in a format which is easier to use. I think each data.result.x should be a separate object in the json. I think that is a usual way to structure a json. Then you would get each object as a new row inside the Dataflow Gen2 (or Data pipeline), and not many columns. (i.e. the "usual format").

I am thinking maybe it could be possible to do that manipulation of the json in a Notebook, by using Python code.

Or maybe the source system which produces the json could structure the json differently, so you won't get this issue.

(Or perhaps that manipulation of the json file is even possible to do in a data pipeline or dataflow gen2, but I don't know how and haven't seen something like that.)

DevinL · ‎07-10-2024

That did it.

The example code you gave was a perfect example!

I'm going to take this and find a way to make this work for any sized table. Not sure how, but that is what's next.

The hope is that when the API pulls the data and writes it to a JSON file in the datalake that this dataflow gen2 can take that JSON (no matter how many rows) and do the same conversion the put it into a datalake table.

Thanks for all your help!

DevinL · ‎07-10-2024

Thanks for the responses!

The data shouldn't have that many rows normally because the process would run daily and probably have less than 20 entries in a day. That being said, I'm not sure I could make this work if there were over 90 entries because it stops creating columns at a certain point.

There was an option in the transform pane to "Extract" text before/after a delimiter and you can also set it to ignore x number of delimiters. I did an extract with a delimiter of "." and ignored the first two and was left with the column names I needed.

The trouble now is transposing/pivoting to get the correct results in the correct columns.

It's getting a lot closer!

DevinL · ‎07-09-2024

I have a data pull that creates a new column for each row of data as follows:

This is creating a column for each row, but what I need is for it to create a row for each numbered array. Essentially, I need to remove the data.results.#. and have a row for each numbered array.

I tried changing the destination on the right from "data.results.0.amount" to "amount", "data.results.1.amount" to "amount", etc. but then it throws an error saying that there are duplicate columns, so that's not the way to go. The closest thing I could find online was something about a "Flatten activity" and "unroll" which I don't see available in Microsoft Fabric.

Any ideas on how to do this?

v-yilong-msft · ‎07-08-2024

Hi @DevinL ,

I don't think you should have duplicate naming in the first place, then you can follow the steps below.

1. If you are using Azure Data Factory or Synapse Analytics, you can use Data Flow to perform the spreading. The "Spread" transformation allows you to expand an array into separate rows.

2. Select the array to expand. This creates a new row for each element in the array. Or, if your array is nested within another array or object, you can choose to expand the root directory.

3. After unfolding, map the fields from the original structure to the new unfolded structure.

Of course, I searched for an official documentation you can read: Flatten transformation in mapping data flow - Azure Data Factory & Azure Synapse | Microsoft Learn

Best Regards

Yilong Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

DevinL · ‎07-09-2024

Thanks for the reply, but this solution doesn't work in synapse Data Engineering.

How do I flatten columns into rows in Copy Data activity?

Helpful resources

Join our Community Sticker Challenge

Fabric Monthly Update - July 2024

Fabric Community Update - July 2024

European Microsoft Fabric Community Conference

How do I flatten columns into rows in Copy Data activity?

Helpful resources

Join our Community Sticker Challenge

Fabric Monthly Update - July 2024

Fabric Community Update - July 2024