Here are output flaws for several LLMs analyzing daily meeting transcripts.

| Prompt version | Input | GP | **G35** | G4 | CS | CO | Defects: +1 for each one [Super outputs: -1 for each one] | | --- | --- | --- | --- | --- | --- | --- | --- | | 1 Short prompt | | 3 | 7 | 2 | 7 | 7 | | | | #1 | 1 | 5 | 1 | 4 | 3 | GP: Advice for not all low ratings. G35: “After that, could you also share the criteria”. No names. Non-zero rating for the 1st criterion. Too high ratings. No actions G4: Too high ratings. CS: Requests for transcript and criteria at once. Non-zero rating for the 1st criterion. Names are only in 1 explanation. Advice is not related to the criteria CO: Requests for transcript and criteria at once. Names are only in 1 explanation. Advice for not all low ratings. | | | #2 | 0 | 2 | 0 | 2 | 2 | GP: Advice is not structured by the criteria. [However, advice is very useful, with examples] G35: No actions. Weird phrase at the end. G4: - CS: Names are only in 1 explanation. Responded not in English. CO: No list for advice. No actions | | | #3 | 1 | 0 | 1 | 1 | 2 | GP: Advice is not structured by the criteria. G35: - (all ratings are high) G4: Extra advice for high ratings. CS: Responded not in English.
CO: No advice. No actions | | 2 Unstructured long prompt | | **GP 6** | **G35 15** | **G4 3** | **CS 7** | **CO 3** | | | | #1 | 2 | 6* | 1 | 3 | 2 | GP: Questions instead of advice. Not for all low ratings. G35: Non-zero rating for the 1st criterion. Too high ratings. Names are only in 1 explanation. Extra advice for high ratings. No actions G4: Advice for not all low ratings. CS: Non-zero rating for the 1st criterion. Names are only in 1 explanation. No actions CO: Non-zero rating for the 1st criterion. Advice for not all low ratings. | | | #2 | 2 | 4* | 1 | 2 | 1 | GP: Questions instead of advice. Not for all low ratings. G35: Names are only in 1 explanation. Extra advice for high ratings. No actions. G4: No actions. CS: No actions. Responded not in English.
CO: Advice for not all low ratings | | | #3 | 2 | 5* | 1 | 2 | 0 | GP: Questions instead of advice. Not for all low ratings. G35: Summary “who said what”, instead of explanations on each criteria. !No ratings. No advice G4: No actions. CS: No actions. Responded not in English. CO: - | | 3 Structured long prompt | | **GP 3** | **G35 13** | **G4 4** | **CS 9** | **CO 2** | | | | #1 | 1 | 5* | 2 | 4 | 1 | GP: No names. Advice for not all low ratings. [Very actionable advice] G35: Stopped before printing advice. No names. Non-zero rating for the 1st criterion. Advice for not all low ratings. G4: No names. Too high ratings. CS: “The total duration is not provided”. Advice for not all low ratings. No actions. Weird phrase at the end. CO: Non-zero rating for the 1st criterion. | | | #2 | 2 | 4* | 1 | 2 | 0 | GP: Too low ratings. No names. Advice for not all low ratings. [Very actionable advice] G35: Stopped before printing advice. Too low ratings. No names. G4: Extra advice for 1 high rating. CS: Weird phrase at the end. Responded not in English. CO: Too low ratings. [Great explanations with many examples] | | | #3 | 0 | 4* | 1 | 3 | 1 | GP: - G35: Stopped before printing advice. General advice, not related to the criteria. Mentioning names is not related to the explanations. G4: Summary “who said what”, instead of explanations on each criteria CS: Too short advice. Weird phrase at the end. Responded not in English. CO: Extra advice for high ratings | | 4 Step by step long prompt | | **GP 9** | **G35 14** | **G4 2** | **CS 12** | **CO 1** | | | | #1 | 3* | 6* | 0 | 4* | 0 | GP: [Specific discussions that should have been avoided] No names. !No advice. G35: All steps are described in the first message. Criteria are not requested before analysis. Too high ratings. No names. No advice for 1 low rating G4: - CS: !No advice. “The meeting duration is not provided” CO: [Very actionable advice] | | | #2 | 3* | 4* | 1 | 4* | 0 | GP: !No advice G35: Stopped before printing advice. No names. Extra advice for 1 high rating G4: Too low ratings CS: !No advice. Responded not in English CO: | | | #3 | 3* | 4* | 1 | 4* | 1 | GP: !No advice G35: Output format is violated. Total ratings for criteria are unknown. [Useful info for each participant] Advice is not for the SM. No actions. G4: Not enough actions are recommended. CS: !No advice. Responded not in English CO: Advice for low rating (8) is absent |