Tag: Wikimedia

  • Five ways to enrich Wiktionary

    Since 2010, I’ve been contributing to the Malagasy Wiktionary.
    It has become a habit now: every month, every week, every day, and almost every morning and evening, I turn on the web browser to check what’s going on on Wiktionary, and what I can do to add further content.
    Some days, I get so interested in adding some pieces of information that I feel like writing a program to add it in the next hours.
    And some days, I don’t feel like contributing, and them I’m just looking at the recent changes to check if pages have been vandalised in my absence, or if some pages have been fixed by other users.
    Still there are several ways to contribute to Wiktionary. Here are five of them:
    (1) Write pages manually. This is the most basic yet most tedious work to do. This is how everyone start, and this will is how most of us will contribute probably for the next 30 years. In 2045, Wiktionary or even Wikipedia in its current form will probably become obsolete or be self-editing.
    Before this happens, you’ve got to put in a lot of work. Still, you can increase your efficiency by learning to write code, then:
    (2) Write a program that writes pages that you may need to fix. Simple, since the last three years, I’ve been concentrating on how to do this. But as time passes a lot of pages get created, and even with a lot rate of error, you end up with thousands of pages of potentially wrong information. OK, but you also end up with even more pages with correct information. Coupled with synonyms dictionary and advanced NLP you can have it write definitions of words that can’t be translated directly to the target language.
    (3) Write a program that reads newspapers to find the words to be created. With a very complete dictionary it gets difficult to find missing words. You won’t have the will to read dozens of newspaper articles every day, so have a program read them for you and find all missing words for you. After that, write a program to detect all compound words and add them to the Wiktionary if you feel like it. The next-level of this kind program would be an almost-real-time word scraper which analyses text flow for e.g. Twitter and lists all missing words at the end of the day.
    Learning to code is one thing, but adding information and know what piece of information to add are two different things. Whenever you have an idea, or interesting lexicographic datasets under your eyes, get to code and add those bits of information to the Wiktionary. Do so in compliance with copyright laws.
    (4) Navigate through dictionaries and add exotic words. Passionate about word etymology? Are you learning a language? Do the words not exist in Wiktionary? Feel free to add them. Always do so in compliance with copyright laws. Compiling several dictionaries and definitions may be attributed as original work but never do verbatim copy of word definitions. I did this one time and almost get sued because of a complaint of a copyright owner. If you feel you’re good enough in AI and NLP, write a program to reformulate and translate the sentences.
    Code is strong, code is powerful. It requires a lot of time to write good one. It requires a lot of time to become good at coding, and not everyone feels like learning it. So what to do?
    (5) Contribute to your native language Wiktionary. English put apart, Wiktionary is written in 170 different languages. A huge number of them have below 100,000 pages. Malagasy, my native tongue, has 3.75 million only thanks to my efforts in trying to create the biggest dictionary in Malagasy that has ever existed. If your native language is English, get interested in other languages and add new words in them, be it at the English Wiktionary or elsewhere. What, you are not passionate about languages? Add obscure English slang terms then.

  • African language Wikimedia projects summary

    A few months ago I wrote an article which summarises my history on the Malagasy Wiktionary, and more generally my history on Malagasy language Wikimedia projects.
    I am back here to write a short summary recapitulating the current progression of African language WMF projects. In this article you’ll learn about the current stage of African language projects and their trend.
    In terms of community size, the biggest African-language community is the Afrikaans language Wikipedia community; followed by Egyptian Arabic speaking community and Swahili speaking community.
    If we look closer to the statistics. The award goes to the Afrikaans language Wikipedia community which has 7 to 8 very active contributors (performing more than 100 edits per month).
    The Egyptian Arabic Wikipedia community counts 2-3 very active contributors, which is big for an African language but very small comparing to Standard Arabic community counting more than twenty times more active users (83 very active users in June 2013), most of them being Egyptian contributors.
    About Swahili, the number of very active users is one to two. On a 2-year term, this number can be averaged to 1. But the number of active users (i.e. making more than 5 edits per month) is 9 in average, which is a fine thing for a language that is spoken in countries where internet access is quite hard.
    These numbers were obviously averaged from July 2011 to June 2013, so it smoothes short-term variations.
    In terms of raw article size, the biggest African language Wikimedia project is the Malagasy Wiktionary – which currently counts 2.5 million articles, only smaller than English and bigger than French! – , the Malagasy Wikipedia (40,000+ articles) and the Yoruba Wikipedia (30,000+ articles), followed by the Afrikaans and the Swahili language Wikipedias (respectively 27,000+ and 25,000+ articles).
    The Malagasy Wiktionary balecame very big for reasons you can read here, the Malagasy Wikipedia is big thanks to geography articles (~20,000 articles) and celestial objects (~8,000 articles); the Yoruba Wikipedia is made big by articles about people and also celestial objects (~15,000 objects).
    Many Wikimedians who consult the statistics should know that the number of content pages does not determine the quality or the comprehensiveness of an encyclopedia. Judging wikis by article count is like judging a book by the appearance of its cover. And many book readers and critics know that looking at the cover is not enough to judge a novel. Here, by its raw size, the Malagasy language dominate in the two biggest projects (Wikipedia and Wiktionary) but that doesn’t mean it has a very active community.
    To judge about the quality, comprehensiveness and completeness of the articles of such wikis, it is better to dive into this kind of statistics where scores are given by the absence/presence of vital articles and the size (number of characters) of such articles (if they exist). That kind of statistics are better than article count and page depth which can be inflated by the use of bot and the generation of tons of non-article pages (talk pages, subpages, redirects…).
    According to the List of Wikipedias by sample of articles, the best scored African language Wikipedia is the Afrikaans Wikipedia, which ranks 58th and the Swahili Wikipedia (79th) followed by Egyptian Arabic, Yoruba and Somali Wikipedias. Malagasy Wikipedia is quite far behind and ranks 155th which is only higher than Lingala (161st), Wolof (175th) and Shona (187th) Wikipedias having less than 5,000 articles. Which means article count is only the cover of the book and thus some efforts have to be done there to make Malagasy Wikipedia more comprehensive.
    What about the trend?
    Less than a year ago, some Wikipedias found a way to grow in number of article thanks to species databases. The first ones I saw to grow this way are Winaray and Cebuano Wikipedias. Winaray Wikipedia gained 100,000 articles primarily thanks to low quality geography stubs (consisting in one or two sentences), and secondarily thanks to articles about species, animal and vegetal ones, making it to have 510,000 articles. Cebuano has more than decupled in article count within the last 50 weeks, from 40,000 to more than 500,000 articles. This mania of creating article about species has propagated to Swedish and Dutch Wikipedia which has recently surpassed the German Wikipedia, and in response to that, the latter Wikipedia seemed to have boycotted the Dutch Wikipedia, by deleting the link to the Dutch Wikipedia in the German language Wikipedia main page.
    Now let’s write about the growth trend of African language Wikimedia projects. First off, let’s talk about Wikipedias, then Wiktionaries and finally other «minor» Wikimedia projects.

    Wikipedia language edition

    Current article count

    Growth (in 300 days) (1)

    Malagasy

    40,619

    +2,415

    Yoruba

    30,624

    +582

    Afrikaans

    27,801

    +3,928

    Swahili

    25,368

    +1,232

    Amharic

    12,722

    +1,015

    Egyptian Arabic

    10,764

    +1,939

    Somali

    2,830

    +383

    Lingala

    2,035

    +118

    Kinyarwanda

    1,816

    +7

    Kabyle

    1,517

    +778

    Wolof

    1,172

    +49

    Kongo

    826

    +135

    Northern Sotho

    688

    Igbo

    739

    +44

    Zulu

    586

    +22

    Setswana

    496

    –1

    Bambara

    392

    +6

    Siswati

    368

    +6

    Ewe

    302

    +12

    Hausa

    291

    +17

    Oromo

    276

    +36

    Tigrinya

    259

    +2

    Tsonga

    250

    +7

    Sango

    204

    +17

    Kirundi

    192

    +8

    Sesotho

    189

    +44

    Akan

    179

    +17

    Fulfude

    166

    +12

    Luganda

    166

    –2

    Twi

    157

    +12

    Chamorro

    157

    +6

    Xhosa

    151

    +10

    (1) Calculated following this site, data retrieved in July 26th 2013.
    On Wikipedia, the growth is slow comparing to other languages spoken in developped countries, where Internet access is easy and unexpensive to the normal citizen. The African language with the biggest community grows at approximately 5,000 articles per year, which is fairly high comparing to Swahili which growth is almost twice lower. If the current trend continues, the Afrikaans Wikipedia will surpass the Yoruba language Wikipedia next year, and the Malagasy Wikipedia in the next 2 years, as the two current biggest Wikipedias are stagnating in article growth.
    On smaller Wikipedias, the trend is positive, though slow. All open Wikipedias have more than 100 articles.
    The biggest of them is the Malagasy Wiktionary which has its growth kept by the use of Bot-Jagwar. Owned by myself, Bot-Jagwar runs from the Cloud, so it works regardless my computer and my internet connection’s healths. Thanks to it, the Malagasy Wiktionary gains 300 to 500 content pages daily. Automations eases many things in many ways, but automated processes can fail. So I have to keep an eye not only on the source code but also to entries generated thanks to that source code.
    African language Wikipedias are slowly but surely gaining articles as time passes. There seems to be a moratorium in closing African language Wikipedias, and this is fine because languages mainly spoken in developping countries need time to develop a community. Furthermore, the official language in these countries, especially African ones, are very often not the local language.

    Kurzweil Curve showing growth of computing power. It shows that all human brains can be simulated by 2050.
    Kurzweil Curve showing growth of computing power. It shows that all human brains can be simulated by 2050. What about having billions of “virtual” contributors on Wikipedia in 2050? Source (kraxinglogic.com)

    An increase of bot-made articles (which constitute nowadays 20% of articles created in Wikipedia) can indicate that in a near future, perhaps in 25 or 30 years, a bot will be able to write article like humans do. This is because Ray Kurzweil predicts the ability to simulate the human brain to be possible in twelve years and that current computers’ calculation power were supercomputers’ in the 1990s.
    What about me? Well, it’s been a while since my last big article on the Malagasy Wikipedia. And according to the list of Wikipedias by sample of article, several hundreds of article needed in all Wikipedias are missing, so my first goal for Wikipedia is to fill these gaps, slowly but yet surely. I prefer contributing about geography, but as I am the only contributor of the Wiki, I have to fill gaps a bit everywhere : Biography, Chemistry, Sports, etc. At that pace, I can barely create three or four articles per day. At that pace, I can fill the 1,000 articles that every Wikipedia should have list whithin the year.
    It’s been a while since the last time I blogged in Malagasy, So this article will be followed by a Malagasy language article. Perhaps a translation of this one, perhaps a new one.
    Useful resources
    To read further about what’s mentioned here.

    1. The law of Accelerating Returns by Kurzweil
    2. http://www.wikistatistics.net for all statistics about Wikimedia projects

    [polldaddy poll=7298306]

  • My story on the Malagasy Wiktionary

    It’s been a while since I posted on this blog. This article is about the mass adding content on the Malagasy Wiktionary. The object of this post is to provide some explanations on why and how the Malagasy Wiktionary has become so big.
    But first, allow me to introduce myself. My nickname on all Wikimedia projects is Jagwar. I am a Wikimedia contributor since August 2008, and I am going to be 20 years old soon. I speak Malagasy as mother tongue, French as a second language and English as a foreign language (soon the third language, since it is not quite perfect yet…).

    When I discovered perfectly randomly the Malagasy language version, the wiki was virtually dead, with no one adding interesting content, and an active community mainly constituted by non native speakers. Without any knowledge of the rules of the wiki, with almost no knowledge of how to correctly write Malagasy, I began an article. It grew up to 20,000 characters, making it to be the biggest page of the wiki at that time. Bust unfortunately (or fortunately, for the sake of readers), a non-native speaker administrator spotted the lack of notability of the article, leading it to be deleted.
    I could leave the wiki, as tens of hours of work had literally vanished of the wiki… But I didn’t, I still cannot figure out why, but deeper in my mind, a little voice told me to continue contributing. At that time, the Malagasy Wikipedia counted 550 articles, maybe less, but not more.
    So I continued on this way for a while. To help me in my task I wrote to potential volunteers. These people didn’t see the point to contribute to a wiki in their mother tongue: either they were unable to spell correctly Malagasy words, or they didn’t have time enough to do good work; while others required money to start contributing (times are hard in Madagascar, I know), and even with money, I am not sure these ones will stay long once the money paid.
    In October 2008, I discovered Malagasy Wiktionary. At the beginning I actually didn’t know what to do out there, so I continued to work on the Malagasy Wikipedia just to become more skilled and used to write Malagasy.

    In July 2009, I was on vacation to my fatherland: Madagascar. I have taken this occasion to learn more deeply the written Malagasy language, though my means were quite limited: reading newspapers, the Bible (I am christian), watching news broadcasts on TV as well as on Radio… I almost forget French (!), though it was present almost everywhere as second official language.
    When back to France, I have decided to incite potential volunteers that are able to write to contribute on the Malagasy language Wikimedia projects: but you know, Madagascar was in crisis and people sometimes asked for money to contribute: other blamed me on my spelling mistakes, and others simply ignore the request. I had less and less time to dedicate to the projects and I have no money to give this way. One day, I decided that I couldn’t wait anymore for someone to arrive: the progress of my skills in Malagasy, in programming languages, and the promise of a very busy future (inducing a chronic lack of time) mentally forced me to do something, to do something for my mother tongue, even a tiny little thing.

    In 2010, when I could write in my mother tongue without too much spelling mistakes, I started to write bots. Once they are written, I ran them at the very full speed: fifty thousand edits per day: that was the pace, the normal pace. At the beginning it was the importation of foreign language wikis from other wikis, and it consisted mainly in importing verb forms, first through an import form, and after through a script that copy-pastes other wikis’ content pages to the Malagasy Wiktionary equivalent page. I went slightly at the beginning, but I did it more and more often, till the wiki got 200,000 content pages. On these possible copyright-infringing importations, I received a warning from a user that almost got his mother tongue wiki closed due to the creation of thousands of useless pages.

    In 2011, I got mad: after discovering the astonishing easiness of Volapük, I wrote a script to upload the word forms of that language. At full speed – i.e around 50,000 edits per day – three weeks were required to make the Malagasy Wiktionary the third biggest Wiktionary of the world. But months passed, and no one, absolutely no one, did contribute: one day on the wiki, the number of active users dropped to two, for a wiki that contains 1,19 million content pages (in comparison, the German Wikipedia which had a comparable article count, didn’t count less than 25,000 active users) !

    On July of the same year, a new script has been written. That script allowed to create translations based on foreign language entries. With that script, up to 5,000 articles were created, and they mainly concern lemma entries. Just a few weeks later, the import of all Malagasy words has been completed. But its repercussion on article count was not visible due to the mass deletion of Volapük language entries. Why this mass deletion? Because many entries seemed to be wrong as they are not conjugation of verbs, but nouns (-.-‘), so the decision is taken to delete them all to re-create them later, with a better quality if possible. Since then, my activity on the Malagasy Wikipedia is put in brackets to dedicate my whole wiki time to the renovation of the Malagasy Wiktionary.

    During the summer vacation, I took the time to restructure the Malagasy Wiktionary. The article, category structure were inspired by the structure of the French Wiktionary: use of template for languages, parts of speech, allowed the Malagasy Wiktionary entries to be automatically categorized through the use of templates. Time passed and the routine started to install.
    One night, I discovered an online Malagasy monolingual dictionary. Having no idea about the copyright-ability of the content (the copyright seemed to apply only on design), I decided to reuse the content on that dictionary to complete the entries on the Malagasy Wiktionary. The problem arrived just a few weeks later, when I received a mail from a Wikimedia Foundation staff member. P. Beaudette. In its mail, he asked me the origin of the Malagasy language entries, I answered they were from various bilingual dictionaries, and the online monolingual dictionary… An copyright infringement investigation was led and my bot was blocked during the whole process. At the end of it, I was told by the staff member to remove the 30,000 entries that infringe the original dictionary’s copyright, which was done.

    After this copyright infringement episode, I decided to orient my contribution in adding Malagasy language content to other wikis. But before that, I did some work on the Fijian and Tagalog Wiktionaries, that was more or less appreciated… There was in particular an IP address checking my contributions on the Fijian and Tagalog Wiktionaries. This IP told me to stop mass-adding content to these languages of which I speak no word. I ceased to work on both wikis a few weeks later, as the work is finished.

    But this mass-adding content, especially in language I didn’t speak at all, seemed to annoy people that have decided to discuss about the case on MetaWiki forum. No concluding results was given, and things were as they were before.

    With most of the hard work being removed, with a behaviour that has been reproved by many users, I decided to take a break of indefinite duration. It actually lasted 5 months, during which I tried to work on my written Malagasy outside Wikimedia projects. The progression of my skills, spelling as well as programming skills, were honourable, allowing me to go back again and make the Malagasy Wikimedia projects, and especially the Malagasy Wiktionary, evolve again. In July 2012, I built a new tool that allows me to know the non-exising entries/pages on the Malagasy wiktionary by consulting the daily online newspapers. Only two newspapers are currently supported, because of their use of RSS feeds. But the ability to make the script read non-RSS supporting websites is coming soon.

    In September, I have developped a new, improved translation retriever that allows the script to get all translations of all languages on a given page (the previous version could only translate one language at once), which almost decuples the translation harvest. This function is embedded in a XML dump reader that ampifies the efficiency of the script: fast translation retrieving and no requirement to be connected to the server while processing. Done every month, the dump processing and uploading make the wiki to gain more than 100,000 lemmata in a few months. These lemmata may have translation errors, but it is low enough not to be taken in consideration (<1%). Hardest cases can be resolved by a single check on the source wiki (which is indicated by a template).

    In October, I have thought about building a bot that completes a task as scheduled by a parameter file. This is particularly useful for maintaining list of wikis up-to-date. Currently, the pace at which the list of Wikis on the Malagasy Wiktionary is four times a day, i.e every six hours.

    At the end of January 2013, I thought about a more efficient use of the translation retriever that I wrote a few months ago. Then comes the IRC bot: it retrieves in real time all the edits made on selected wikis and does its possible to translate the latter entry in Malagasy,  in real time! The first time it was developped, it only used the traditional translation retriever, but later, on March, it also features a basic entry processor that allows the IRC bot to also translate entries in foreign languages into Malagasy, using the same dictionary. This latter version of the IRC bot is currently in use, and it creates hundreds of entries and content pages on the Malagasy Wiktionary everyday. I have no precise idea about the error rate but I am pretty sure it is less than 5%. The positive side of the bot is its ability to keep the pace when several edits are made in a minute, nevertheless, as it requires to be online and to be connected to Wikimedia servers, the processing frequency is limited to one page per second. Something is being thought on allowing the bot to process more pages.

  • Hatsarana sa habetsahana ?

    Habetsahana sa hatsarana ? Misy ny sasany hilaza hoe aleo hatsarana toa izay habetsahana, ary misy ny sasany milaza «hatsarana ankabetsahana», ary ny sasany “habetsahana anie mandeha amin’ny fanangonambola ihany e” … Ka inona ny tena marina ? Habetsahana sa hatsarana ?…

    Velom-panontaniana amin’izany aho ankehitriny satria eo amin’ny Wikibolana eo dia misy adihevitra mikasika ny habetsahana sy ny hatsarana, na marimarina kokoa, ny resaka teny iditra amboarina amin’ny alalaln’ny rôbô ary ireo teny iditra amboarina tanana, izany hoe amin’ny alalan’ny fampiasana ny mpanovan’ilay Wiki fa tsy ny API-ny.

    Nanao traikefa aho mikasika ny famoronana votoatiny amin’ny alalaln’ny robo eo amin’ny Wikibolala amin’ny teny malagasy, dia aveo indray teo amin’ny Wikibolana amin’ny teny fijianina ary amin’ny farany, tagalaogy ; ka izao no tena verdict an’izy ity ; tsara ihany ny mampiasa script raha toa ka hampiditra teny anglisy maro dia maro, izany hoe eo amin’ny roa na telo arivo eo ho eo, dia izay ihany ; satria raha manandrana ny handika ny teny hafa ianao, dia misy risque satria mety tsy mitovy ny dikan’ilay teny amin’ny teny anglisy amin’ny teny indonezianina ; jereo fotsiny ny teny star amin’ny teny anglisy, izay azo dikaina hoe «kintana» amin’ny teny malagasy, na «bintang» amin’ny teny indonezianina ; ny dika hafan’i «star» koa anefa dia mety atao hoe «olo-malaza» amin’ny teny malagasy na «pesohor» amin’ny teny indonezianina. Ny olana amin’ilay teny iray mety midika zavatra hafa tsy misy fifandraisana, dia ilay famaritany mety hafangaron’ny milina : lasa «bintang» ohatra ilay «olo-malaza», na lasa «pesohor» ilay «kintana»; tranga tsotra fotsiny ilay nasehoko teto, fa raha ny tena marina dia mety lasa lavitra noho izany ilay olana. Ka raha milanja mahery ny dimy isan-jaton’ny teny iditra ao anaty Wikibolana iray izany olana kely izany, dia mety hametraka olana goavana be ho an’ireo izay mianatra ny teny malagasy ary koa ho an’ny malagasy izay mianatra teny vahiny. Ka izany zavatra izany, na dia iray tokana anatin’ny iray alina aza, dia manimba ny hatsaran’ny rakibolana foana, indrindra indrindra rehefa manakaiky ny iray tapitrisa ny isa tontalin’ny teny iditra.

    Tsy afaka miresaka mikasika ny hatsarana izany isika eto amin’ity Wikibolana ity raha toa ka vitsy kely (izany hoe eo amin’ny folompolony eo ny isan’ny lahatsoratra) ny lahatsoratra voasoratra ato. Raha tsy tonga any amin’ny zaton-jatony any ho any ny isan’ny lahatsoratra dia tsy tokony hiadihevitra mikasika ny hatsaran’ny rakipahalalana isika, torak’izany koa ny mikasika ny rakibolana.

    Tsy afaka miresaka mikasika ny hatsarana ihany koa anefa isika raha roa na telo fotsiny no isan’ny lahatsoratra, na dia vita tsara dia tsara aza izy ireo. Manomboka mireasaka mikasika ny hatsarana isika, rehefa tonga eo amin’ny roa na telo arivo eo ho eo ny isan’ny teny iditra, ka eo izy azo tombatombanina ny votoatiny. Raha latsak’izay dia kely loatra ny isany.

    Ka izao ny olako miaksika ny fanadihadiana ny hatsarana ary ny habetsahana : tia handray anjara eo amin’ny Wikibolana amin’ny teny tsy fantatro aho, eo amin’n efapolo eo ho eo no isan’ny teny iditra ao anatiny ; ireo teny iditra ireo, dia tsy bordel be daholo. Ka raha tia hampitombo azy ianao, dia manao izay ilaina haha-mpandrindra ny tena, dia aveo amin’izay manadio ny votoatiny ao anatin’ilay wiki. Dia aveo amin’izay manomboka mamorona ny teny iditra mbola tsy misy. Azo atao izany, fa anefa mangata-potoana ihany, ary ho an’ireo sasany izay be dia be mihitsy «andrianiny ny fiteniny, ary ampiany ny an’ny hafa». Misy koa ny manana fisainana toa izao «andrianiny ny fiteny ampiasiana indrindra eo amin’ny aterineto, ary avelany hi-demerde ny hafa».

    Misy koa ny hafa tia hampitombo ny wiki hafa, anefa vitsy ireo sady kely ny fotoana hananany hanaovana izany, izany hoe ny manao fikarohana mikasika ilay teny, dia aveo mameno ny Wikibolana avy amin’izay fahalalana nangonona izay. Anefa izany mangata-potoana ka eto amin’ny toerana ipetrahako, izany hoe aty Eoropa, ary indrindra indrindra amin’izaho mbola mianatra, dia tsy manam-potoana ny hanaovana izany rehetra izany aho ; koa rehefa manam-potoana, dia manoratra script Python hanaovana ilay asa amin’ny toerako, na dia ratsy kokoa aza ny hatsarany amin’izy nosoratana tamin’ny rôbô, satria mety ahitana hadisoana hafahafa (jereo ihany eo amin’ny pejin-dresako sy ny laogin’ny pejy voafafa), izay azo ahitsiana ihany, ho an’izay tia handray anjara amin’ny alalan’ny fanitsiana… Betsaka ny mpikambana malagasy fantatro tia handray anjara amin’ny alalan’ny fanitsiana ny lahatsoratra, ka eo izy afaka manao izany tsara.

    Izaho no anisan’izay olona tia handray anjara be dia be izay anefa tsy misy fotoana. Ka tiako hanan-tombontsoa ny wiki hafa amin’ny fahaizako manamboatra rindrankajy. Izany zavatra izany no anisan’ny nampitombo wiki roa ; wikibolana moa no ankamaroan’izy ireo, fa misy koa ny wikipedia. Wikipedia malagasy, wikibolana malagasy, fijianina ary tagalaogy no nandramako ireo script nosoratako ireo (izay miisa eo amin’ny telo ambin’ny folo teo ho eo), ka mandeha tsara ilay izy, na dia mamoaka hadisoana hafahafa aza. Ary ireo hadisoana izay isaina amam-polony ireo no itsikeran’ny mpikambana hafa an’ny tenako, na dia hoe tsy hanimba an’ilay Wiki no asako voalohany, fa manisy votoatiny, avy amin’ny loharano izay azo trandrahana : Rakibolana an-tranonkala, rakibolana taratasy ary ny Wikibolana hafa (anglisy indrindra indrindra). Ankehitriny

  • Tetikasa Wikimedia amin’ny fiteny afrikana

    Amin’izao fotoana izao, tsy mbola misy ny Wikipedia amin’ny teny afrikanina izay manana isan-dahatsoratra mihoatra ny iray hetsy. Mety ny fanjakazakan’ny Wikipedia amin’ny fiteny eoropeana lehibe angamba, ary koa ny habetsahan’ny “details” ao amin’ny lahatsorany angamba no mety mahatonga izany. Ary noho izany, ny fiteny ofisialy rehetra ny Vondrona Eoropeana (afatsy ny fiteny maltey) dia manana isan-dahatsoratra mihoatra ny efatra alina. Ny Wikipedia amin’ny fiteny eoropeana lehibe indrindra dia ny teny anglisy mazava ho azy, arahan’ny fiteny alemana, ary ny fiteny frantsay.

    Ny Wikipedia amin’ny teny anglisy moa dia manana lahatsoratra lava be saika mikasika ny zava-drehetra. Jereo fotsiny ny zavatra mikasika an’i Madagasikara eo amin’ny Wikipedia amin’ny anglisy : raha amin’ny resaka antsipirihany aloha dia efa resiny lavitra ireo Wikipedia amin’ny teny frantsay maika fa ny teny malagasy… Ary tsy mikasika ny zavatra momban’i Madagasikara ihany izany, fa saika amin’ny zava-drehetra mihitsy.

    Na dia efa nihena betsaka aza ny lanjan’ny Wikipedia amin’ny teny anglisy eo anivon’ny tetikasa Wikipedia amin’ny ankapobeny, dia mbola voasoratra amy fiteny indo-eoropeanina foana ny enina amin’ny dimampolo (56) isan-jaton’ny lahatsoratra eo amin’ny tetikasa Wikipedia.

    Ny olana amin’ny fiteny afrikanina dia izy ireo tsy manana sata ofisialy : tsy ampiasaina ho teny ofisialy, tsy fampiasa an-tsoratra… satria betsaka amin’ireo firenena afrikana no mampiasa na ny teny frantsay na ny teny anglisy. Ireo firenena afrikanina tavaratra moa dia mampiasa ny teny arabo avokoa (fa tsy ny teny berbera). Ilay tsifananana sata ofisialy io miampy ny tsifananana fomba fanoratana iray no mahatong any fiteny afrikana tsy mivoatra eo amin’ny aterineto. Resaka amin’ny ankapobeny izany ataoko izany, ary fantatro fa misy fiteny afrikana mivoatra eo amin’ny aterineto.

    Wikipedia amin’ny fiteny amin’izao fotoana izao

    Dia ahoana ihany izany ny momban’ny Wikipedia amin’ny fiteny afrikana ? Ny teny malagasy no voalohany amin’ny lafin’ny isan-dahatsoratra voatahiry, ny fiteny yoruba no manaraka azy, arahan’ny fiteny soahily ary ny fiteny afrikaans. Ireo Wikipedia roa farany ireo no nanjakazaka tamin’ny Wikipedia nandritry ny taona maromaro. Ny manarana ny fiteny afrikaans ary ny fiteny swahily dia ny fiteny amharika izay miisa lahatsoratra iray alina mahery kely.

    Andao àry jerena akaikikaiky ny zava-mitranga ao amin’ny Wikipedia. Ao amin’ny WIkipedia amin’ny teny malagasy dia misy olona tokana ihany no mandray anjara : tanatin’ny efa-taona, eo amin’ny telo alina eo no isan’ny lahatsoratra nosoratany. Izy irery ve no nanoratra ireo telo alina ireo ? “no life” amin’i Wikipedia ve izy io ? Tsia ! Mampiasa ny atao hoe “mpikambana rôbô” izy : Bot-Jagwar, izay namorona lahatsoratra eo amin’ny telo alina latsaka eo ho eo, mikasika ireo tanàna any Frantsa, Brazila ary Madagasikara. Manome fampahalalana ara-tstatistika ny akamaroan’ireo lahatsoratra ireo. Mbola tsy afa-manoratra lahatsoratra irery ny rôbô, sa ahoana ?

    Wikibolana amin’ny teny afrikanina

    Mitovy amin’ny zava-mitranga amin’ny Wikipedia ihany ny Wikibolana : manjakazaka tsy misy ohatr’izany ny teny malagasy, amin’izy io mananan teny iditra miisa eo amin’ny 1,4 tapitrisa, izay in-jaton’ny Wikibolana faharoa amin’ny fiteny afrikana be teny iditra indrindra. Tanatin’ny 18 volana no namoronana ny ankamaroan’ireo lahatsoratra ireo. Izay tsy afaka ataon’ny olona miisa valo ambin’ny folo maika fa olona iray…

    Mitovy ihany koa ny fomba famoronana ireo lahatsoratra ireo : amin’ny alalan’ny rôbô. Amin’izany fomba izany eo amin’ny valo alina isan’andro eo ho eo ny lahatatsoratra azo foronina, raha ampiasaina anadro aman’alina. Afaka 13 andro izany dia efa feno ny iray tapitrisa. Ny Wikibolana amin’ny teny malagasy no azo antsoina hoe “Botovolana” (na amin’ny teny anglisy “Bottionary”).

    Tsy dia misy fivoarana firy ny Wikiboky amin’ny fiteny afrikana. Ny teny afrikana mandroso indrindra dia ny teny afrikaans izay miisa toko dimampolo, arahan’ny teny malagasy (toko 32), soahily (toko 12) ary ny bambara (toko 7).

  • African language Wikimedia projects

    At this time, no African language Wikipedia has passed the hundred thousandth article. This is certainly due to the domination of European language Wikipedias and the relevance of their articles throughout the Internet : Almost all of EU official language (except Maltese) counts more than 40,000 articles. The greatest European language Wikipedia is English, as THE international language and the fact that it is very widespread. As consequence, that language has very detailed information about almost everything ; where other Wikipedia have stubs and quite often nothing. This makes a vicious circle that make English language more and more favorised. Despite that fact, the gap Between English and other language Wikipedias has been reduced, but in favour of other Indo-european language : 56% of all articles in Wikimedia projects are written still in an Indo-european language.

    The problem with African language is that they have almost no official recognition and are not used as official language in many, many countries of Africa, which rather uses French or English or Arabic language instead. In second, there is almost nothing which allow local African language to spread over the Net : many african languages have no normalised written form. But I am going away of my subject, and will write about this later.

    Situation of African language Wikipedias

    So what about African language Wikipedias? Malagasy and Yoruba languages are the first, recently passing over Swahili and Afrikaans, the dominant African Wikipedias since the opening of African language Wikipedias. The third following African Swahili is Amharic, which has passed quite recently the 10,000th article.

    OK, it looks good, but let’s have a look closer. Inside Malagasy Wikipedia, there is only one, but a very, very active user : in less than four years of contribution, he has shown more than 30,000 articles alone? Is he a Wikipedia “no-lifer”? Actually not. He uses his bot : Bot-Jagwar to create tons and ton of article about cities of all around the world :France, Brazil, Madagascar, etc. These articles give a general and statistical facts about the citites. A bot is not yet able to redact, huh?

    Situation of African language Wiktionaries

    About Wiktionary, it is the same figure as in Wikipedia : Malagasy language is strongly dominating. But here, the Malagasy Wikipedia counts about 1.4 million entries, which is almost one hundred times the greatest African-language Wiktionary in the project. This was made in only less thant 18 months, which means that many hundred thousands of entries are created over there almost every month, which is physically as well as statistically impossible due to the “widespreadness” of Malagasy language and due to the number of active users of the Wiktionary : it is turning arount 18, which means that each of these active users have written more than 100,000 pages in that time laps : which is simply impossible.

    The fact is that there is only one user (or more exactly, one bot) doing all these edits: Bot-Jagwar. This bot has performed more than 5 million edits in less than 20 months and is now the most active “user” of the whole Wikimedia Projects, and has made itself more than 70% of the total edits of the Malagasy Wiktionary. This is what we call a “Bottionary” (I have seen this word somewhere, but don’t know exactly where… Google is made for that, if you know what I mean)

    Situation of African language Wikibooks

    About Wikibooks, African languages, even of European origin, are not very advanced. The most advanced of African language si Afrikaans with 50 chapters, followed by Malagasy with 32 chapters, Swahili with 12 and Bamamankan with only 7 chapters. There is no doubt that Wikibooks is a hard project to develop, and a less interesting one than other projects such as Wiktionary or Wikipedia. But this shows that Afrikaans language is dominating the African-language Wikimedia projects ; and very often but never always, Swahili.