Comment by raincole
> Another issue is that some of the words are segmented very unnaturally
I immediately noticed that too. Are the "gaps" generated by an LLM? I think the model might not understand Japanese very well.
> Another issue is that some of the words are segmented very unnaturally
I immediately noticed that too. Are the "gaps" generated by an LLM? I think the model might not understand Japanese very well.
The point is not that you can't cut みません into み and ません. The point is that it should be one single gap in the first place.
It's like cutting gaps out of English sentence like this: I'm [go][ing] to beat the shit out of that guy. Sure we know the logical way to break down 'going' is 'go' and '-ing', but it should be one single gap anyway.
+1 this definitely makes sense, since you're gonna have a million verbs ending in "masen", just make it a separate word and understand that it's just part of the conjugation.
It's a bit like segmenting "don't see" into "don't" and "see." ません is the negative of the auxiliary ます just as "don't" is the negative of the auxiliary "do." If you have to split Japanese text into words and want to be principled about it, treating ません as a separate word is not a bad way to go about it.
But of course there are other ways, so a "fill in the blank" question with two gaps right next to each other is generally a bad idea.