How to draw a data manipulation pipeline

0 votes
asked Mar 22 in Question / help by viebel (120 points)

I would like to draw a data manipulation piepeline made of 2 steps:

  1. group by isbn
  2. aggregate by author_name
I tried to use nested rectangles and it almost works. The problem is that the rectangles are not aligned.
Here is my plantuml source code:

@startuml
skinparam componentStyle rectangle

[group by **isbn**] as groupby

rectangle "Input" as input {
  component book1 [
  **title** 7 Habits of Highly Effective People
  **isbn** 978-1982137274
  **author_name** Sean Covey
  ]
  component book2 [
  **title** 7 Habits of Highly Effective People
  **isbn** 978-1982137274
  **author_name** Stephen Covey
  ]

  component book3 [
  **title** The Power of Habit
  **isbn** 978-0812981605
  **author_name** Charles Duhigg
  ]
}


rectangle "Intermediate" as intermediate {
  rectangle "978-1982137274" as 7habits {
    component book11 [
    **title** 7 Habits of Highly Effective People
    **isbn** 978-1982137274
    **author_name** Sean Covey
    ]

    component book12 [
    **title** 7 Habits of Highly Effective People
    **isbn** 978-1982137274
    **author_name** Stephen Covey
    ]
    book11 -[hidden]d- book12
  }

  rectangle "978-0812981605" as power {
    component book13 [
    **title** The Power of Habit
    **isbn** 978-0812981605
    **author_name** Charles Duhigg
    ]
  }
}

[aggregate by **author_name**] as agg

rectangle "Output" as output {
  component book111 [
  **title** 7 Habits of Highly Effective People
  **isbn** 978-1982137274
  **authorNames** [Sean Covey, Stephen Covey]
  ]
  component book121 [
  **title** The Power of Habit
  **isbn** 978-0812981605
  **authorNames** [Charles Duhigg]
  ]
}

input -d-> groupby
groupby -d-> intermediate
intermediate -d-> agg
agg -d-> output

@enduml

An image of the result can be see in this URL.

Please advise.

commented Mar 22 by Martin (3,880 points)
edited Mar 23 by Martin

Here is one idea using sub-diagrams:

commented Mar 22 by viebel (120 points)
Awesome!

Could you tell me what is exactly a subdiagram and why using a subdiagram works better than using a rectangle for my use case?
commented Mar 23 by Martin (3,880 points)
edited Mar 23 by Martin

For some reason Containers (shapes that contain other shapes) prefer to connect at the corners instead of the centre of a side like a normal shape would.  I have no idea why.
A sub-diagram is a picture composed from an embedded PlantUML definition, and so it can be placed inside a shape without turning that shape into a container - it remains a regular shape that just happens to display a picture of another diagram.
The syntax of a subdiagram is to contain the text within double curly braces (which must each be on their own line):

{{
subdiagram text
}}

The disadvantage is that none of the elements in the sub-diagram can be referenced in the main diagram or vice versa - it has to be self-contained and independent.  But luckily you didn't want to draw arrows to or from any of the inner components.

commented Mar 23 by viebel (120 points)
Thank you for the clarification.

1 Answer

0 votes
answered Mar 24 by Martin (3,880 points)
selected Mar 25 by viebel
 
Best answer
Nested shapes are hard to align vertically.  However, when there's no interaction between the inside of the nested shape and the rest of the diagram, then a good solution is to make the contents of the shape an independent self-contained sub-diagram.  This means the shape is no longer nested and behaves like a standard shape, yet still looks like it is nested.  See my comment above for a clickable example.
...