Linguist

(5 votes)

I wrote this extension to provide an easy way for users to create a list of new candidate words to the existing spellcheck dictionary for their language. Just run the command "List Non-recognized Words" and you will get a list of all the words from your document that are not recognized during spellchecking. After quality-assuring this list (and removing wrong spelling etc.), simply send the document to the people who maintains the dictionary. You can also use it if you want to make a personal wordbook.

There is also a command that makes a complete list with all the words from the document listed alphabetically (no dubleants).

From several reasons I chose to write this extension in Python, which was probably not the easiest choice since Python support in OOo is not very well-documented yet. You are welcome to study my code a an example of a simple Python extension.

Download extension
Operating System: System Independent
Compatible with: OpenOffice.org 2.4 | StarOffice 8 Update 9 or higher.
Official release: 1.2.2
Date: 2008-Jul-08
Size: 14.36 KB
License: opensource | Read license
Further product information: Product details

Comments

WrongWordsList macro

this extension works in a similar fashion as this macro.
http://user.services.openoffice.org/en/forum/viewtopic.php?f=20&t=1222&s...

did you use that macro for inspiration?

Sub WrongWordsList

Dim oDocModel as Variant
Dim oTextCursor as Variant
Dim oLinguSvcMgr as Variant
Dim oSpellChk as Variant
Dim oListDocFrame as Variant
Dim oListDocModel as Variant
Dim sListaPalabras as String
Dim aProp() As New com.sun.star.beans.PropertyValue

oDocModel = StarDesktop.CurrentFrame.Controller.getModel()
If IsNull(oDocModel) Then
MsgBox("There's no active document." + Chr(13))
Exit Sub
End If

If Not HasUnoInterfaces (oDocModel, "com.sun.star.text.XTextDocument") Then
MsgBox("This document doesn't support the 'XTextDocument' interface." + Chr(13))
Exit Sub
End If

oTextCursor = oDocModel.Text.createTextCursor()
oTextCursor.gotoStart(False)

oLinguSvcMgr = createUnoService("com.sun.star.linguistic2.LinguServiceManager")
If Not IsNull(oLinguSvcMgr) Then
oSpellChk = oLinguSvcMgr.getSpellChecker()
End If
If IsNull (oSpellChk) Then
MsgBox("It's not possible to access to the spellcheck." + Chr(13))
Exit Sub
End If

Do
If oTextCursor.isStartOfWord() Then
oTextCursor.gotoEndOfWord(True)
' Verificar si la palabra está bien escrita
If Not isEmpty (oTextCursor.getPropertyValue("CharLocale")) Then
If Not oSpellChk.isValid(oTextCursor.getString(), oTextCursor.getPropertyValue("CharLocale"), aProp()) Then
sListaPalabras = sListaPalabras + oTextCursor.getString() + Chr(13)
End If
End If
oTextCursor.collapseToEnd()
End If
Loop While oTextCursor.gotoNextWord(False)

If Len(sListaPalabras) = 0 Then
MsgBox("There are no errors in the document.")
Exit Sub
End If

oListDocFrame = StarDesktop.findFrame("fListarPalabrasIncorrectas", com.sun.star.frame.FrameSearchFlag.ALL)
If IsNull(oListDocFrame) Then
oListDocModel = StarDesktop.loadComponentFromURL("private:factory/swriter", "fListarPalabrasIncorrectas", com.sun.star.frame.FrameSearchFlag.CREATE, aProp())
oListDocFrame = oListDocModel.CurrentController.getFrame()
Else
oListDocModel = oListDocFrame.Controller.getModel()
End If

oTextCursor = oListDocModel.Text.createTextCursor()
oTextCursor.gotoEnd(False)

oListDocModel.Text.insertString (oTextCursor, sListaPalabras, False)

oListDocFrame.activate()

End Sub

WrongWordsList

No, I haven't seen that macro before. But as far as I can tell from the code, it basically does the same thing. It i easier, though, to install an extension than to install a piece of macro code. :-)

Finn

GUI language vs. document language.

that macro however does a better job since it retrieves just mispelled words.

your extension has problems becuase it checks GUI language not the document language.

i write in italian and Linguist selects almost any word.

i'll keep using the macro. you should fix that feature.

GUI language vs. document language.

I just installed an Italian dictionary and tried to run Linguist on an Itialian text after having set default document language to Italian. In a text with 145 words it just found 4 unknown words.

Just set default document language correctly in the Tools > Options > Languages menu, then it will work.

Btw: One of the four unknown words was 'Berlusconi' :-)

OOo 3.0 beta: ‘List Unrecognized Words’ lists all words

I've installed the latest release of this extension and have run it on several documents with Russian as the language of the Default style. In all instances, the result is the same: ‘List Unrecognized Words’ lists all words, not only misspelled ones. Is this because my GUI language is English (as Russian is not available so far)?

Please check the new version...

Finn Gruwier Larsen

OOo 3.0 beta: ‘List Unrecognized Words’ lists all words

Thanks for your comment. I am sorry to say that the current version still checks the GUI language, not the document language. I have planned to make a new version that checks the document language instead. I will see if I can get it done soon.

Finn Gruwier Larsen

Nice tool!

I find this tool really useful!
Only, I would prefer if the statistics would open in a popup, instead of in a new document....
Thanks for this useful extension.
jo

New version ready

Version 1.1 adds readability (lix) measuring and other document statistics.

It doesn't seem to work

For big documents (not so big ones), the app freezes.
I tried with just one paragraph with few unrecognized words, it creates a new void document.

Please send a document that

Please send a document that doesn't work to finn (at) gruwier.dk.

Finn Gruwier Larsen

Language: Danish

Finn,
Great script for many reasons. But you forgot to say, that the language is Danish ;-)

Leif Lodahl
http://lodahl.blogspot.com

Language

Hi Leif,

It should definately work with other locales. As mentioned in the description, Linguist tests on the GUI locale setting to determine which locale to use for spellchecking. I've just verified that it works with US English. But there might be locales that it doesn't work with - especially if there are locales (other the Danish) that does not conform to the language-COUNTRY formula. In fact the script treats Danish as a special case, since for unknown reasons the GUI locale setting for Danish is not called 'da-DK', but just 'da'.

Finn

Document language

Couldn't Linguist check the language of the document instead of the GUI? I often work with documents in languages other than the interface language and would like to use Linguist to create wordlists for spellcheckers.

Language settings

Hi Erdal,

I don't think there is a setting for the language of the document as such. In fact, each paragraph of a document can have its own language setting. But there is a setting called "default language for documents". If I used this it would be possible to set this setting to the language on which you want to spellcheck and still have another language in the GUI. I will consider that next time I make a release.

Finn