Here are output flaws for several LLMs analyzing daily meeting transcripts.
| Prompt version | Input | GP | **G35** | G4 | CS | CO | Defects: +1 for each one
[Super outputs: -1 for each one] |
| --- | --- | --- | --- | --- | --- | --- | --- |
| 1 Short prompt | | 3 | 7 | 2 | 7 | 7 | |
| | #1 | 1 | 5 | 1 | 4 | 3 | GP: Advice for not all low ratings.
G35: “After that, could you also share the criteria”. No names. Non-zero rating for the 1st criterion. Too high ratings. No actions
G4: Too high ratings.
CS: Requests for transcript and criteria at once. Non-zero rating for the 1st criterion. Names are only in 1 explanation. Advice is not related to the criteria
CO: Requests for transcript and criteria at once. Names are only in 1 explanation. Advice for not all low ratings. |
| | #2 | 0 | 2 | 0 | 2 | 2 | GP: Advice is not structured by the criteria. [However, advice is very useful, with examples]
G35: No actions. Weird phrase at the end.
G4: -
CS: Names are only in 1 explanation. Responded not in English.
CO: No list for advice. No actions |
| | #3 | 1 | 0 | 1 | 1 | 2 | GP: Advice is not structured by the criteria.
G35: - (all ratings are high)
G4: Extra advice for high ratings.
CS: Responded not in English.
CO: No advice. No actions |
| 2 Unstructured long prompt | | **GP
6** | **G35
15** | **G4
3** | **CS
7** | **CO
3** | |
| | #1 | 2 | 6* | 1 | 3 | 2 | GP: Questions instead of advice. Not for all low ratings.
G35: Non-zero rating for the 1st criterion. Too high ratings. Names are only in 1 explanation. Extra advice for high ratings. No actions
G4: Advice for not all low ratings.
CS: Non-zero rating for the 1st criterion. Names are only in 1 explanation. No actions
CO: Non-zero rating for the 1st criterion. Advice for not all low ratings. |
| | #2 | 2 | 4* | 1 | 2 | 1 | GP: Questions instead of advice. Not for all low ratings.
G35: Names are only in 1 explanation. Extra advice for high ratings. No actions.
G4: No actions.
CS: No actions. Responded not in English.
CO: Advice for not all low ratings |
| | #3 | 2 | 5* | 1 | 2 | 0 | GP: Questions instead of advice. Not for all low ratings.
G35: Summary “who said what”, instead of explanations on each criteria. !No ratings. No advice
G4: No actions.
CS: No actions. Responded not in English.
CO: - |
| 3 Structured long prompt | | **GP
3** | **G35
13** | **G4
4** | **CS
9** | **CO
2** | |
| | #1 | 1 | 5* | 2 | 4 | 1 | GP: No names. Advice for not all low ratings. [Very actionable advice]
G35: Stopped before printing advice. No names. Non-zero rating for the 1st criterion. Advice for not all low ratings.
G4: No names. Too high ratings.
CS: “The total duration is not provided”. Advice for not all low ratings. No actions. Weird phrase at the end.
CO: Non-zero rating for the 1st criterion. |
| | #2 | 2 | 4* | 1 | 2 | 0 | GP: Too low ratings. No names. Advice for not all low ratings. [Very actionable advice]
G35: Stopped before printing advice. Too low ratings. No names.
G4: Extra advice for 1 high rating.
CS: Weird phrase at the end. Responded not in English.
CO: Too low ratings. [Great explanations with many examples] |
| | #3 | 0 | 4* | 1 | 3 | 1 | GP: -
G35: Stopped before printing advice. General advice, not related to the criteria. Mentioning names is not related to the explanations.
G4: Summary “who said what”, instead of explanations on each criteria
CS: Too short advice. Weird phrase at the end. Responded not in English.
CO: Extra advice for high ratings |
| 4 Step by step long prompt | | **GP
9** | **G35
14** | **G4
2** | **CS
12** | **CO
1** | |
| | #1 | 3* | 6* | 0 | 4* | 0 | GP: [Specific discussions that should have been avoided] No names. !No advice.
G35: All steps are described in the first message. Criteria are not requested before analysis. Too high ratings. No names. No advice for 1 low rating
G4: -
CS: !No advice. “The meeting duration is not provided”
CO: [Very actionable advice] |
| | #2 | 3* | 4* | 1 | 4* | 0 | GP: !No advice
G35: Stopped before printing advice. No names. Extra advice for 1 high rating
G4: Too low ratings
CS: !No advice. Responded not in English
CO: |
| | #3 | 3* | 4* | 1 | 4* | 1 | GP: !No advice
G35: Output format is violated. Total ratings for criteria are unknown. [Useful info for each participant] Advice is not for the SM. No actions.
G4: Not enough actions are recommended.
CS: !No advice. Responded not in English
CO: Advice for low rating (8) is absent |