Category: English

  • From rock bottom to new beginnings: my journey through the early 2020s

    The 2020s have undeniably been one of the most challenging decades in recent memory, perhaps second only to the Cold War era, or even WWII. Born in 1993, I didn’t experience the WWII or the Cold War firsthand, but I’ve certainly felt the full force of the early 2000s, the 2010s, and especially the turbulence of the 2020s. Even though we’re just halfway through, the decade has already tested me profoundly.

    In early 2020, the COVID pandemic turned our lives upside down. Overnight, I transitioned from a routine office worker to remote work with no preparation whatsoever. Initially, the adjustment was tough—I developed unhealthy habits, notably becoming excessively sedentary due to reduced physical activity. However, one positive aspect emerged: I realized the absurdity of commuting two hours daily when remote work was feasible. This realization reshaped my working life, and I’ve continued to work remotely ever since.

    By 2021, things had begun to stabilize. I overcame my initial remote work struggles and became comfortable with this new normal, although pandemic-related travel restrictions remained challenging. The year also marked a significant turning point when my aunt proposed starting a business in the United States. This idea became a major focus for me, occupying significant mental bandwidth for the following two years.

    In January 2022, this dream began taking tangible shape. I traveled to Boston, established our Delaware company, and started preparations for relocating to the U.S. by August. Interestingly, just two weeks before the planned move, I met my Oli, my future wife.

    However, the excitement of launching a business came with immense stress. In September and October 2022, I worked tirelessly—up to 18 or 19 hours daily, losing weight rapidly due to intense stress and exhaustion. Despite substantial investment—over $50,000 in franchise fees alone—the business struggled to generate sufficient revenue. Hiring reliable staff was an enormous challenge, exacerbated by my visa situation, forcing me to operate within restrictive conditions.

    The strain culminated when, after returning briefly to France to reset my visa status, I was unexpectedly denied entry upon my return to the U.S. A grueling 12-hour interrogation led to the cancellation of my visa waiver due to suspicions of unauthorized employment. Suddenly, I found myself back in France with no job, no luggage, and few resources.

    Forced to reassess, I sought employment urgently to stabilize financially. After initial setbacks, including unsuccessful attempts at various jobs, I ultimately returned to my previous company, filling the position I’d left months earlier.

    In January 2023, my aunt went back to France and won’t come back as she’d also be at risk of facing visa complications. Together, we waited in France, managing our U.S. business remotely and maintaining operations at approximately $3,000 monthly, hoping to secure new visas by June.

    In June 2023, I went to the U.S. Embassy with my aunt with hopeful anticipation—we were ready for our visas to be approved and eager to continue building our business in the United States. Unfortunately, things didn’t turn out as planned.

    Our visa applications were rejected. The embassy cited concerns that we hadn’t provided sufficient evidence of strong ties to our home country—ties that would compel us to return after our visas expired. Essentially, they treated our business visa application similarly to how they review applications for tourism or short-term visits. The rejection felt abrupt, unfair, and devastating—especially considering we had invested heavily in this project.

    Financially, the blow was significant. Collectively, we had spent more than $160,000, and personally, I had invested around $90,000 into setting up the business—obtaining licenses, meeting certification requirements, and keeping ample cash reserves for running expenses. To have it all dismissed in a brief 15-minute interview felt surreal and painful.

    With our visas denied, we urgently needed a new plan. Keeping the U.S. business was no longer viable; it made sense to sell quickly, even at a substantial discount, simply to recover some portion of our investment. My aunt recommended a realtor friend in Chicago who had experience not only with properties but also businesses. We entrusted him with the sale.

    Weeks went by with little progress. Interested buyers were scarce—and understandably so. Our business had been inactive for months, generating zero revenue while we continued incurring fixed costs like electricity, internet, utilities, and annual company fees. Keeping it closed yet “operational” was costing us roughly $3,000 a month—a relentless financial drain that was unsustainable.

    After several unsuccessful attempts, the realtor returned the keys. We then had an emergency meeting to assess our options. It was clear we faced an impossible choice: continuing to lose thousands of dollars monthly on an inactive business, or paying another $800 plus additional fees to officially close it down. Ultimately, we decided the best path forward was simply to walk away—even though this meant becoming tax delinquent with the state of Delaware and losing our entire investment without ever having opened our doors.

    From an outside perspective, this entire experience felt like a cruel joke—or worse, a scam. We did everything by the book, followed all rules and requirements meticulously, yet found ourselves locked out of both the country and our own investment. By the end of June 2023, the feeling of loss, disappointment, and frustration was overwhelming. It felt like rock bottom.

    But, as they say, when you hit rock bottom, the only way left is up. During this challenging period, my relationship with Oli continued to grow stronger. She had already proposed marriage, and I had asked for some time to think. In August, after careful reflection, I said “yes.”

    We planned a beautiful vacation together spanning late September and early October—a memorable and joyful break from the hardships we’d recently endured. During those special weeks, our daughter was conceived, marking a beautiful new beginning amidst the turbulence.

    Today, looking back, it feels like everything unfolded exactly as it needed to—perhaps not smoothly, certainly not painlessly—but ultimately guiding me toward an unexpected and profoundly meaningful new chapter of my life.

    The end of the year 2023 turned out to be another major turning point in my life. When meeting Oli in August 2022, we began envisioning our future together—a shared life, marriage plans, and discussions about how many children we hoped to have. Those were beautiful, exciting conversations that deepened our connection and strengthened our commitment.

    However, turning these dreams into reality wasn’t without its challenges. The paperwork required for her to come live with me here in France was extensive. Even if we wanted to marry as early as September 2023, administrative procedures meant we’d have to wait at least six months just to complete the necessary paperwork. By the time we finally filed everything in December 2023, we knew we wouldn’t receive an official response until June 2024 at the earliest.

    In the meantime, life continued to unfold beautifully. In March 2024, we celebrated our traditional betrothal ceremony—a deeply meaningful and emotional moment for both of us and our families. Then, in June, we joyfully welcomed our daughter into the world, forever changing our lives in the most wonderful way. Finally, in November 2024, we officially held our civil marriage ceremony, completing another critical step towards building our future together. Immediately afterward, we initiated the transcription of our marriage certificate, laying the groundwork for Oli’s immigration to France.

    On the professional front, however, 2024 presented new obstacles. My employment contract at the renowned red-and-black bank, where I’d worked nearly three years, was set to expire in May 2024 and wouldn’t be renewed except under exceptional circumstances. With mutual understanding, the bank and I parted ways amicably, marking the beginning of a challenging job search.

    I quickly discovered I’d re-entered the job market at an exceptionally difficult moment. Competition was fierce, standing out had become increasingly challenging, and many experienced professionals—including myself—were forced to consider roles involving salary cuts. For over six stressful months, I searched extensively for a suitable position.

    Eventually, my persistence paid off. I received a compelling offer from another company willing to increase my salary by 20%. When I approached my managers at my current job to give the required three-month notice (customary in France), they fought vigorously to retain me. Recognizing the difficulty of hiring skilled professionals, my employer surprised me by immediately matching the new company’s offer.

    Finding myself in an unexpectedly advantageous position, I decided to stay. After a tumultuous period of uncertainty, it felt rewarding and reassuring to be recognized and valued professionally.

    Reflecting now, 2023-2024 was truly transformative—a year filled with tremendous ups, challenging downs, and beautiful new beginnings. From marriage and fatherhood to job change, it’s clear this journey was never straightforward, but it led me exactly where I was meant to be.

  • Optimised Malagasy Keyboard (version 3.0)

    This blog post is the follow-up post about this older post from 2016: Finding an optimised keyboard for Malagasy

    I wrote that post back in November 2016 about how inefficient the AZERTY keyboard — currently in use by most Malagasy people — was. It has an abysmal performance and may even lead to finger joints problem after extensive use over the years.

    Feedback from first versions

    After spending a few days iterating over layouts, I came up with a first version that scored really well with the test corpus, and even used it for a couple years, but besides it being not popular at all, it suffered some flaws:

    • Accented characters in Malagasy are OK, but accented letters in French, which is also used by most Malagasy people using a computer, were severely lacking
    • Programming was hard as some characters such as the anti slash were not present.
    • Money symbols like the Euro or the Pound were absent. While not a major inconvenience, their absence can sometimes be felt when writing in about the UK (use of Pound Sterling) or France (which uses Euro), for instance.
    • The characters “<” and “>” were not type-able on certain keyboards, including the laptop I was using back in 2017.

    After having spend a couple years getting used to the first iteration and noting its flaws, I have come up with another version, which takes some improvements suggested by Ian Douglas (see comments on the older post).

    Version 3

    So this Keyboard basically is a major change compared to the previous iteration, as several keys have been moved or swapped. Most notably

    • The U key is now moved to the right-hand side of the keyboard. U is not used in native words
    • The Apostrophe and Double quote has been moved to the left side of the keyboard. The most common word using the apostrophe is amin’ny which would allow us here to type it by alternating left and right hands.
    • the accented O has its own key. Like accented letters in French, Ô is not a considered a separate letter but it’s often used.

    Analysis Results

    When accessing the analysis results, we have the following winners:

    The heatmap for the Version 3 is as follows:

    The row usage is as follows

    Below is the hand usage for our sample text based on Sarasara Tsy AmbakaIt heavily favours the left hand against the right hand as Malagasy uses a lot of vowels, which are all on the left hand side of the home row right below the user’s fingers.

    The piechart above is obtained by having the left thumb hit the spacebar. We can swap that with the right thumb and have the result below for the Malagasy v3:

    Hand usage is not a lot more balanced. Space bar accounted for roughly 13% of all keyboard hits in the sample text I used.

    On multilingual typing

    The most used language pair in Madagascar when it comes to multilingual typing is Malagasy and French, or more likely French and Malagasy. Office workers use most often French as a work language, and use Malagasy for other everyday communication. When it comes to bilingual usage, here is how the Malagasy v3.0 keyboard performs. The tests were made with a 5,000 character text in French appended with another 5,000 characters in Malagasy. Informational density per character is higher in French than in Malagasy: in French we have an average of 6 characters per word whereas in Malagasy we are closer to 10. Nevertheless, the passage has been truncated.

    Here are the detailed results. I will present the most interesting parts here.

    The v3 is still the winner here, but as you can see, the difference between the winner and the second no longer seem to be significant, so let’s use another metric:

    In the table above, we have the distance covered by our fingers dancing on the keyboard, in centimetres, the less, the better.

    Let’s start with the loser here, the AZERTY layout (will this AZERTY-bashing post ever stop?), with over 33,000 centimetres for ten thousand characters, where the left pinky and the index fingers travel a lot. If these were metres and not centimetres, that’s 75% of a marathon.

    A surprising-but-not-so-surprising contender here is the BEPO layout which already has some notoriety and nice total distance of 17,241 centimetres which makes writing 10,000 characters look less like a marathon and more like 40%, of a marathon. Good runners could run 40% of a marathon on a weekday after a day of work.

    Malagasy v1.0 also gets away with 16,765 cm

    Malagasy v2.2 and v3 are all quite close to each other with respectively 15,901 cm and 15,299 cm. Version 3 has some nice keymaps allowing it to type some keys that were absent in version 2.

    On shortcuts

    We office workers like to use shortcuts. The most famous being Ctrl+A (select all), Ctrl+C (copy), Ctrl+F (search in file), Ctrl+K (cut line after cursor), Ctrl+N (new file), Ctrl+S (save), Ctrl+U (cut line before cursor), Ctrl+V (paste), Ctrl+X (cut), Ctrl+Z (cancel last action),

    Where do we stand about these for the Malagasy Keyboard v3?

    Well, here we gotta use both control keys, of use two hands if we don’t want to do that.

    Conclusion

    Finding the optimal combination is very much a work-in-progress, but the version 3 has already come a long way. I especially need to find a way to re-balance right hand and left hand usage, but that won’t be easy given how we use vowels.

    See also

  • On the state of things

    Malagasy culture/history on Malagasy Wikipedia

    Unlike English, French or Chinese, Malagasy is practically unused as a medium of international communication. And those foreigners who have a good command of Malagasy are interested in the national culture much, much more than the average Joe; or have acquired it as part of a very specific training to eventually spread Christianity in Madagascar. Thus it is more than justified to center our interest on Malagasy culture on one hand, specifically because a lot of the upper class people in Madagascar have been schooled in international schools and thus tend to ignore the very culture of the country they live in, or at least not to give it as much weight as it deserves.

    That might seem very inward-looking in our vision of the sum of human knowledge, but our culture is poorly documented especially online (and surprisingly even more so in its original language) as most of it is done through oral transmission, thus in that regard there truly is a deadline.

    While on the other hand, popular culture, science and technology can always be translated from English or French, and doing so correctly requires special vocabulary knowledge which the average Rakoto doesn’t always have, and written resources to mitigate that lack of knowledge are rare.

    Malagasy on Wiktionary

    Following a discussion that had been made on Wikimedia Metawiki (which is basically a wiki to talk about other wikis), the Malagasy Wiktionary was targeted by a so-called “small-wiki audit” which aims to assess, as it names implies, the quality of the content in a small wiki. What is meant by “small” here is the community. I used to be the only contributor there, and made a very extensive use of bots to fill the wiki with as much content as possible. I had done so by implementing a parser coupled with a basic machine translation engine.

    The effort was spread over 8 years, and a lot of mistakes were made in the process. The conclusion of the discussion was that all the content with the exception of already-created Malagasy were going to be deleted. Such deletion was mostly complete by 2021.

    In the meantime, NLLB (for No Language Left Behind) — a project by Meta (Facebook), which is a new technology for machine translation targeted at lesser-documented languages — was published and I swiftly adopted such technology to create foreign-language entries, with supervision.

    On Artificial Intelligence

    Since the end of year 2022, generative AI has been the hottest topic in the tech world since the first iPhone and the smartphone revolution that ensued. The public has been mostly hyped by the impressive capacity by Stable Diffusion to generate images in seconds where a commissioned artist would take days, or OpenAI’s ChatGPT ability to respond to users’ questions and to compose prose in seconds where poets or compositors would also have taken days. ChatGPT has changed the tech world quite a lot since 2022 where it was made publicly available.

    Generative AIs power an ever-increasing panel of apps and aims to commoditize drawings, paintings, and images by taking in a prompt and outputting an image. In addition to it’s closed-source nature, the sheer size of the model (176 billion) makes it impractical to be run on commodity hardware.

    Unfortunately, I have the impression that ChatGPT has been dumbed-down I think by an ever-increasing amount of rules on sensitive topics it’s not allowed to give a decisive answer on. Following this, I’ve been more attentive to its small competitors like LLaMA and other models that I won’t name here that could run on a beefed-up laptop (for what it’s worth, the one I’m currently using now has 48GB or RAM) and can run without limitations on controversial topics.

    It is said that the human brain has between 150,000 to 300,000 billion synapses, and we could need a similar number of “parameters” to achieve something that looks like a whole-brain emulation (WBE). The current GPT-3 is 3 orders of magnitude below that number. Given the current trend which is to gain 2 orders of magnitude every 2 years, we could bet on WBE being practical by the end of the decade. Fantastic times await ahead!

  • Using GPT-2 for Malagasy

    Long ago I became interested in natural language processing. From 2010 until 2014 I had been actively developing various programs to increase content coverage of the Malagasy Wiktionary. The result now is 5.9 million words in 4,100 languages.

    From 2014 to this day, I have been researching ways to improve and perfect the quality of translations as provided by the bot. In 2018, the OpenAI community had released a language model used to generate news-like articles. Those generated articles were so believable that the consortium had refrained to release the full model until the end of 2019, as there were fears that fine-tuning the full model could lead to fake news or dangerous propaganda to  be published en masse. As a result, they were only released once detection techniques were accurate enough to tell generated and non generated articles apart.

    Once the full model was released, I began fine-tuning the model on Malagasy language text. The target was to generate news-like articles from the existing corpus scraped from 4 major news website, resulting in 49 MB of training data. In comparison, the English language model was trained using 40 GB of data.

    Scraping Malagasy language sources

    On the internet, data sources and diversity for Malagasy are relatively scarce compared to English or any other European language. The main reason for that is that most Malagasy sites use French as their publishing language. As a consequence, the sources used were daily newspapers such as NewsMada, Madagascar Tribune, Aoraha, la Gazette de la Grande Ile. It is worth noting that two of these newspapers are bilingual so article had to be filtered.

    Filtering out French articles

    The next task was to detect and remove French language articles since we are training the model to generate Malagasy and not French.

    How?

    Since we’re basically both using the Latin alphabet, using Unicode to our advantage won’t do the job. Language detection using machine learning, while attractive, is clearly overkill and will further divert us from our goal.

    Instead, to keep things simple, I relied on the single biggest difference between written Malagasy and French.  Our version of the Latin alphabet rules out the letters C, Q, U, W and X or other accented characters like É or È. In other words, all native Malagasy words won’t contain any of these.

    I also fetched all French words and inflections to be spot on every single time. And in less than 100 lines, I could filter out anything French.

    Using GPT-2

    As expected, training takes time and space. Lots of it. Model for checkpoints take 1.3 GB and is saved on-disk every 50 iterations.  At 21,000 iterations, further progress seems hard, but this is what it can generate (article below does not exist):

    ANTSIRABE: SARONA TANTERAKA NY FITAFIANA MPANAO SINTO-MAHERY | NEWSMADA

    Par Taratra sur 08/12/2019

    Nandray ny asa famonoana ho faty ny zandary nandray anjara tamin’ny fanafihana nitafiana mpanao
    sy toeram-piantsonan’ny taxi-be nandritra ny fanarahan-dia, tao amin’ny kaompania Ambositra,
    faran’ny herinandro teo, ka nanao ny fanarahan-dia.

    Tsiahivina fa efa nisy ny nahafantarana fa nanafika mpandraharaha an’ilay mpandraharaha ny
    tao Andranohazo Antsirabe. Raikitra ny fitifirana ka vokatry ny fanarahan-dia avy hatrany ity mpandraharaha
    ity. Tsy fantatra mazava hatrany na ny sasany aza tambajotran-javatra malemy na koa raha tsy izany
    mitohy na miaro ny kolikoly rehetra na mpanao sinto-mahery na manana ny anton-diany

    Conclusions

    Should we go further with our model,  we would end up creating a “thismalagasynewsarticledoesnotexist.com” website to host them all. Source code is present on Github along news anticles as training data, which for copyright reasons, cannot be made public.

    Another use for a good-enough model would be to illustrate the Malagasy Wikitonary with unique examples for word usage.

  • Finding an optimised keyboard layout for Malagasy

    In the 21st century, people type. They type a lot.

    Office workers and the Jane Doe’s and John Doe’s from all over the world, speaking various languages, type on electronic keyboards. An average typist types 30-40 words per minute. It mostly depends on their typing language and the layout they use. The best typists can achieve speeds up to 100 words per minute.

    The current keyboard layout in use by most Malagasy language speakers puts whoever who wants to write in Malagasy at a huge disadvantage. It is impossible to write quickly in their language without stressing out their hand muscles. A typical malagasy sentence is quite often longer than a French one due to word length. Depending on the text sample, It may vary from 7% longer (compare the first 10 verses of the Chapter 1 of the Gospel of John) to 20% longer for more complex texts. A text that had required 10 hours to be written in French will easily take 11 up to 14 hours for Malagasy. At the scale of a company, or even a country, that is a huge waste of time, mostly due to a legacy that has lost all its relevance as keyboards do not have the same constraints as typewriters.
    To tell you my story: since I’ve got my Samsung tablet, I’ve almost never used the default Samsung keyboard. So what did write my text messages with? I’m using my own keyboard layout; I’ll show you why and how.

    A quick review on Malagasy uses

    Before I get to the point, let’s see on what my fellow Malagasy citizens type their Malagasy language text with:

    azerty.jpg
    Fig. 1: AZERTY keyboard, made by French as an imitation of the American QWERTY

    This, ladies and gentlemen, is the layout that is currently being used and known by most of the 24 million people in Madagascar. No need to say that their fellow citizens who have emigrated to France also use it.
    The problem is that layout is not suitable for Malagasy. At all.

    heatkey.jpg
    Fig.2: Heat map on an AZERTY keyboard used to type in Malagasy.

    The heat map above has been generated using the Malagasy version of the Rainilaiarivony Wikipedia article. As a Wikimedia contributor, I’ve had the pleasure to type it… using the AZERTY keyboard. It was really a pain, and it looked like you did a lot of effort only to get less than the English version from which I had been translating.

    azerty1.jpg
    Fig. 3: In an AZERTY keyboard, when typing in Malagasy, your left pinky travels A LOT

    That is also felt by my fellow citizens, a lot of whom have taken bad writing habits like writing SMS. That habit is sometimes taken to a new level, so that an unexperimented reader may find difficult or even impossible to read a text written in that SMS-style writing.
    Even though most people browse the Web in French or English far more often than in Malagasy, using the QWERTY/AZERTY layouts is a pain, even if this is all we have, and even if this is what most people will ever know. Even if it’ll never have the success of the traditional layouts, I’ll give my two cents for a layout optimised for Malagasy language

    Solutions

    To palliate this strong disadvantage given to Malagasy regarding keyboard typing speed. I’d been using the German Neo keyboard layout. This was an already good alternative to the QWERTY which I’d been using for 4 years, but it was still sub-optimal, as my left little pinky is above a letter that is never used in Malagasy, my mother tongue.

    neo
    Fig. 4: German Neo Layout (see: neo-layout.org)

    While looking for a solution to my problem I’ve discovered patorjk.com. From a given text, this website basically calculates which keys are most hit while the text is typed. From those keys’ position, a rating will be given. That rating takes into account for 1/3 the distance your finger had moved, how you use your fingers for 1/3 and how you often you have to switch fingers and hands while typing for 1/3. The higher the rating, the lower your hands will have to travel to type the text; so mechanically you’d be less tired typing the text in an optimal keyboard than in a standardised one.
    So for our Rainilaiarivony text, there are the rating for the keyboards:

    rating
    Fig. 5: Layout ratings

    The loser here is clearly the AZERTY, used by most of my fellow citizens. The standardised  Dvoraks are good candidates for typing Malagasy, and maybe we should consider those keyboards since they are widely supported in modern operating systems.
    Here is what the programmer Dvorak looks like:

    dvorak.jpg
    Fig. 6: Programmer Dvorak Keyboard

    Setting the Malagasy Optimised Layout

    First version (7 November 2016)

    The Dvorak score was impressive at the first sight, but the Dvorak was not the optimal layout for Malagasy. The one which the algorithm had found optimal was the following one:

    malagasy1.jpg
    Fig.7: Algorithmically generated Layout from patorjk.com (some keys’ positions have been frozen for more practicality)

    That layout looks pretty decent but the keys are put in a little bit messy way. On the basis of that keyboard, the German Neo and the arrangement of a bunch of standard ergonomic keyboards I’ve come out to the following layout:

    malagasy
    Fig. 8: Own-made keyboard (the Malagasy Keyboard)

    I’ve rerun the analysis on the same Rainilaiarivony article on that keyboard and a couple others. Here are the ratings:

    ratings2
    Fig. 9: Ratings of the Malagasy keyboard layout on the basis of the Malagasy version of the Rainilaiarivony

    Well, to say the least, it looks like I’ve done way more than what the algorithm had succeeded to find. I’m pretty sure the layout I’ve designed is not very far from the perfect Malagasy-optimised Dvorak. Let’s go further into the report and see the row usage comparison.

    row-usage.jpg
    Fig. 10: Row usage comparison.

    Yes, the AZERTY is an absolute typist horror when it comes to Malagasy.
    The use rate of the home row for the our Malagasy keyboard is not very far from the optimal/personnalized layout generated by the algorithm.

    version 2 (13 November 2016)

    ergo1
    Fig. 11: Hot keys on the second attempt.

    Well, after a few day testing the keyboard layout I’ve got on the first attempt, I’ve felt some mandatory re-tuning of the optimised keyboard. That implied moving some keys to get the hot ones (the ones I have to hit most to type down my text) right under my index and my right middle finger. Since the left finger almost always type vowels, I’ve made them stay as most as possible at the home row unless you want to type some foreign words – in which case you’ll have some gymnastic to do.

    ergo3.jpg
    Fig.12: Finger usage of various keyboards.

    As shown in fig. 12, the total number of hits in the Rainilaiarivony article is distributed as such: ~53% for the left hand and ~47% for the right hand. This excludes the thumb hitting the spacebar.

    ergo4.jpg
    Fig.13: Second attempt’s rating.

    We’re getting better. Though the article is the same, I’ve switched to selecting the article from its HTML form. Since working on the article over and over again may constitute some bias, I’ve tried using some text samples from the Sarasara Tsy Ambaka.
    I took quite a huge text sample (containing ~260,000 characters). It took a while to process but it takes out much of the bias related to the Rainilaiarivony article. The results still makes our Malagasy optimised keyboard the best layout ever to exist for the Malagasy language (cf. figure 14)

    ergo5.jpg
    Fig. 14: Layout ratings comparison.

    I have to note that the calculated optimised layout gets closer and closer to the one I’ve designed, at least for the home row. Have a look:

    ergo6.jpg
    Fig. 15: The calculated layout. Looks a bit familiar, right?

    As of this second version, we have an fairly optimised layout for Malagasy language, i.e. you’ll gradually type faster as your hand muscles get used to the new layout. Even for typing other languages such as French, this layout surpasses the AZERTY as the latter keyboard layout had been initially made to avoid the jamming of typewriters.

    My conclusions

    I may never say it much enough: the AZERTY keyboard is the absolute worst keyboard to type Malagasy with. Even the QWERTY does better. The Dvorak is a pretty good candidate for a widespread “more ergonomic” layout due to its presence in all modern widespread operating systems, but there is better.
    Even if the French have designed the BÉPO layout for their language, it has failed to replace the omnipresent and inherited AZERTY slow layout. There is only one person I know who uses it on a daily basis. We also have to add to the fact that BÉPO has been around since 2008 and the Klavie Malagasy (“Malagasy Keyboard”) has only been written about just now, in 7th November 2016. As heavy as it is, the legacy left by AZERTY is highly likely to continue to be used in Madagascar probably for decades as long as keyboard typing exists, even if we relevantly know that the AZERTY layout is totally unsuitable to write French let alone Malagasy.
    Right now I’m typing this article in English on a QWERTY keyboard. I’m planning to translate it to Malagasy as it gets more complete in order to reach more of the target audience.
    I’ve already implemented that layout on my tablet so I’ve got all the time I need to adapt my fingers from the old Neo layout to the new Klavie Malagasy.

    Updates

    v2.1 as of 19 December 2017

    Attached a PDF file containing the test corpus. A slightly better version has been proposed in the comments (thanks Ian!); and even though it has lower score than the v2.0, it has a really awesome idea of putting the T on the home row.
    To better track all the changes, the project now has its own repository on Github. Long live open source!

    Resources

  • Five ways to enrich Wiktionary

    Since 2010, I’ve been contributing to the Malagasy Wiktionary.
    It has become a habit now: every month, every week, every day, and almost every morning and evening, I turn on the web browser to check what’s going on on Wiktionary, and what I can do to add further content.
    Some days, I get so interested in adding some pieces of information that I feel like writing a program to add it in the next hours.
    And some days, I don’t feel like contributing, and them I’m just looking at the recent changes to check if pages have been vandalised in my absence, or if some pages have been fixed by other users.
    Still there are several ways to contribute to Wiktionary. Here are five of them:
    (1) Write pages manually. This is the most basic yet most tedious work to do. This is how everyone start, and this will is how most of us will contribute probably for the next 30 years. In 2045, Wiktionary or even Wikipedia in its current form will probably become obsolete or be self-editing.
    Before this happens, you’ve got to put in a lot of work. Still, you can increase your efficiency by learning to write code, then:
    (2) Write a program that writes pages that you may need to fix. Simple, since the last three years, I’ve been concentrating on how to do this. But as time passes a lot of pages get created, and even with a lot rate of error, you end up with thousands of pages of potentially wrong information. OK, but you also end up with even more pages with correct information. Coupled with synonyms dictionary and advanced NLP you can have it write definitions of words that can’t be translated directly to the target language.
    (3) Write a program that reads newspapers to find the words to be created. With a very complete dictionary it gets difficult to find missing words. You won’t have the will to read dozens of newspaper articles every day, so have a program read them for you and find all missing words for you. After that, write a program to detect all compound words and add them to the Wiktionary if you feel like it. The next-level of this kind program would be an almost-real-time word scraper which analyses text flow for e.g. Twitter and lists all missing words at the end of the day.
    Learning to code is one thing, but adding information and know what piece of information to add are two different things. Whenever you have an idea, or interesting lexicographic datasets under your eyes, get to code and add those bits of information to the Wiktionary. Do so in compliance with copyright laws.
    (4) Navigate through dictionaries and add exotic words. Passionate about word etymology? Are you learning a language? Do the words not exist in Wiktionary? Feel free to add them. Always do so in compliance with copyright laws. Compiling several dictionaries and definitions may be attributed as original work but never do verbatim copy of word definitions. I did this one time and almost get sued because of a complaint of a copyright owner. If you feel you’re good enough in AI and NLP, write a program to reformulate and translate the sentences.
    Code is strong, code is powerful. It requires a lot of time to write good one. It requires a lot of time to become good at coding, and not everyone feels like learning it. So what to do?
    (5) Contribute to your native language Wiktionary. English put apart, Wiktionary is written in 170 different languages. A huge number of them have below 100,000 pages. Malagasy, my native tongue, has 3.75 million only thanks to my efforts in trying to create the biggest dictionary in Malagasy that has ever existed. If your native language is English, get interested in other languages and add new words in them, be it at the English Wiktionary or elsewhere. What, you are not passionate about languages? Add obscure English slang terms then.

  • Google translate now available in Malagasy

    Good news, if it can be said, for my fellow Malagasy citizens: Since 6th of December 2014, Google Translate has been allowing them to see almost any web page in their mother tongue in addition to 89 others. Many people, myself included, have been waiting for this moment that would have come sooner or later. First of all, I would like to address a big thanks to all people that have made this possible. Thanks to you, the Malagasy language is getting further integrated into the polyglot Web world. You’ve also given a chance to the 15 million monolinguals to have an approximate understanding of what other people have written using other languages are writing.

    Before Google Translate

    Before we’ve got Google translate to translate almost anything in our language, including curse words, several websites have helped us Malagasy and other language enthusiasts to write corpora in a proper way in our mother tongue: many of us have already heard about Freelang, tenymalagasy.org and so on. The only drawback of these website is that they do not work in a collaborative way: they are not «crowdsourced». Wikibolana is a Malagasy language crowdsourced dictionary, but I have been so far the one that has generated most of its content.

    Is it really that good?

    Well, let’s be honest: absolute accuracy has been the motto for no machine translation system ever. But for a brand new language on Google Translate, Malagasy is… quite good. Daring to translate a language with such an unusual syntax like Malagasy is already a huge challenge, a challenge worth to be accepted. At first sight, idiomatic sentences and expressions are fairly well handled. Still when it comes to very complex sentences, it is a  mess: verbs are at the wrong place, which either gives the sentence a completely different meaning, or makes it look like an incomplete sentence. There are also some fails as the one in the screen shot below.

    GTfail
    “ahave” does not mean anything in Malagasy. But this is not the opinion of Google Translate

    Let’s see an example of a translation of a paragraph of the article Madagascar in the English Wikipedia:

    Original in English In 2012, the population of Madagascar was estimated at just over 22 million, 90 percent of whom live on less than two dollars per day. Malagasy and French are both official languages of the state. […] The island’s elephant birds, a family of endemic giant ratites, went extinct in 17th century or earlier, most probably due to human hunting of adult birds and poaching of their large eggs for food. Google-translated in Malagasy (as of December 2014) Tamin’ny 2012, ny mponina ao Madagasikara dia tombanana ho 22 tapitrisa mahery kely, 90 isan-jaton’ny izay [no] miaina amin’ny  [vola] latsaky ny roa dolara isan’andro. Malagasy sy Frantsay dia samy fiteny ofisialy ao amin’ny fanjakana. […] Ny nosy vorona ny elefanta, ny fianakaviana ny fizahantany ratites goavana, dia efa lany tamingana tamin’ny taonjato faha-17, na teo aloha, indrindra noho ny olona angamba ny olon-dehibe ny fihazana sy ny vorona lehibe Fihazana ny atodiny ho sakafo.  

    The green-coloured sentences are syntactically correct without correction. The first one has required the red words in square brackets to sound correct. The third one hurt my brain: “The elephants are a bird island, the family of big tourists, have gone extinct in 17th century, or before, perhaps because of people, adults, hunting and adult birds who have their eggs hunted for food.” It hurt to understand, and also hurt to back-translate. Astonishingly making a round-trip translation has given a correct sentence in English, so please always have your translations checked human translators.

    Efforts to be continued

    One can take part to increase translation accuracy by translating articles by using the Google translator toolkit, or by using and correcting translations provided by Google translate itself.

  • Switching to Linux: good or bad choice?

    Last updated on July 13, 2014
    Do you want to switch to Linux? Before doing so, I invite you to reconsider all implied consequences of a switching to another operating system.
     
    Linux? What is that?
    But in the first place, what is Linux? It is the kernel of the GNU/Linux operating system. To be frank with you, «Linux» is a generic name for a few dozens of distributions having one thing in common: the Linux kernel. What is a kernel? It is a software that manages your hardware (motherboard, CPU, hard disk, networking, etc.) to make it work with applications you use. Current Microsoft Windows’ kernel is NT. By the past it also had MS-DOS which was the kernel used for Windows 1 up to Windows ME. I can write about this longer, but then we’d be off-topic.
    So, Linux is an operating system, competing with Windows. It has to be known that Desktop computer market is the «final frontier» for Linux. All desktop computers nowadays come with Microsoft Windows pre-installed.
    Because they use different kernels, Windows’ software will not work on Linux. There’s still a (poor) workaround for this problem, but I’ll talk about it later. This is also a blessing because Windows’ viruses can’t run on Linux whatsoever.
    I’m not saying Linux is totally clean of viruses – because people have already created viruses that have successfully infected a Linux system – but still, with right reflexes, you’ll avoid most of problems. The most basic tip is to never run a Linux-based system as a root user, unless you know exactly what you’re doing. You can still run tasks requiring root privileges by using your own user password, but it will mostly happen when you install programmes.
     
    Linux is Free
    Primarily, Linux distributions can be used legally free of charge, by anyone. This means you don’t need to install an «anti-product activation » thing picked from a weird site, to use your operating system at will. The latter action, often performed by Windows users, is not only illegal, but can also compromise your security by letting that weird software from a weird site dig «holes» (backdoors) in your firewall. For people who like doing computer DIY, Linux is also open-source, developped by a community counting thousands of programmers an code reviewers. Have you found a bug in the software? You have the freedom to patch it and share your patch to other people. Yes, Linux licence allows this.
    You also have a vast array of choices regarding distributions (commonly known as «distros»).
    Linux distros are all built to do things in a certain way, so you have to think about what you’ll be doing with the OS, and then you download the distro that fits your needs. It is not like Windows, where you first install your OS, and then figure out what you need.
    All distros (eleven) have their own software repository and desktop environment (DE) but they all have something in common: the Linux kernel, hence the generic name. By May 2014, the most recent version is 3.14 issued two months ago.
    Something that discriminates each distro is at first sight their desktop environment, then the default software. Ubuntu itself has six desktop environments (Edubuntu, Kubuntu, Mythbuntu, Ubuntu Studio, Xubuntu, Lubuntu). Depending on your taste, you choose your DE: Unity has a very «modern» appearance; KDE is a very flexible desktop making it look almost like what you want it to (you can even rotate icons on the desktop!); LXDE offers a lightweight DE as well as XFCE. About updates, they are done through an update manager. Also, most of distro issue a new version every year.
     
    The switch
    So you’ve finally decided to switch. Your CD is burnt (or your USB key is configured), and you are going to shut down your PC. Please don’t do it yet, there are some matters to be thought about : do you use specific software for your videos? Do you play games? Have you some specific hardware for which installation requires a driver burnt on a CD?
    To answer these questions, you’ll have to do some research on the Web. If you use frequently used software, then it is likely to find a free and/or open-source equivalent on some distro. If you use something like AutoCAD or Photoshop, then you’ll still find «free» equivalent of these on Linux, but they won’t always be as powerful. Furthermore, chances are that Photoshop format will not be compatible with their free equivalents.
    About games, forget about playing Call of Duty, Battlefield or League of Legends on Ubuntu. The Steam Machine is on its way, so gaming will soon be possible and be more and more common on Linux.
    If you cannot separate of your Windows software, there’s still a workaround: Wine. This piece of software allows you to run simple programmes on Linux. It is not guaranteed that everything will work on it, but still, it’s better than nothing. If you depend on a Windows-OS-only software to do your business, I advise you to dual-boot your computer. Then you’ll have and a Windows OS to run your software and a Linux distro to do your things as well. Note that Windows files can be accessed easily from Linux, when the opposite requires you to download software, and mount manually Linux partitions from that software. It is the way most people do when switching to Linux, avoiding all the inconvenience of having data requiring to be backed-up on another HDD.
    Your hardware has come with an installation CD? The best way to proceed in this case is to check if the distro you’re going to install will support it.
    Still, the best way to know if the distro you’ve chosen fits to the hardware is to boot using the CD which is most of the time a Live CD. Live CDs allow you to test the operating system on your computer without changing anything a single byte to the hard disk drive, as every required data is charged into memory. You can then choose to install the OS on your hard drive once you’re satisfied by the OS behaviour on your computer.
    If you decide to switch, take the time to check if you’ve successfully backed-up all your data. We never know if something is going to fail, and to have twice the same data is always better than not having the data. If you can’t somehow migrate your data because you don’t have an external HDD, you can still choose to dual-boot your computer, so you’ll still have access to your data stored on the Windows NTFS (or FAT32, or FAT) partition. You can even choose to install your OS in an external HDD, if you need all the space on your computer HDD for your data. But to boot, do not forget to plug-in the USB key !
    Usually, installation won’t take a long time. To install Kubuntu 12.04, I only needed 50 minutes to format the entire disk (500 GB) and get the PC ready for work.
     
    My personal story
    Because I got fed up by the inefficiency of my (free) anti-virus programme and by Trojans, key-loggers and root-kits compromising personal data security (my credit card number somehow leaked when I made an online purchase on a well-known financial transaction platform), I decided to make the big switch by changing the OS of my 4 year-old laptop computer to some Linux distribution.
    Because I do care a lot about hardware support and user-friendliness, I’ve taken the decision to choose Kubuntu 12.04, first because it is a long-term support version (i.e. updates will be done on this OS for 5 years), and secondly because I am familiar and have positive experience with Ubuntu distros in terms of hardware support.
    I made the switch a month ago by changing my laptop OS from Win7 to Kubuntu 12.04. The most annoying thing I’ve had to face since the switch is (still) hardware support. If your hardware is a little complex, crap happens quite a lot. Before definitely switching to Kubuntu, I tried Ubuntu (unity desktop), Mandriva (now OpenMandriva), Mint, Mageia and Debian. The latter three were unable to support my networking hardware, and (perhaps I have deficient research skills, but…) I found no workaround for it. Same problem for my printer. My connected printer refuses to do its job when I order it, which is quite frustrating to the average user.
    When the switch has been complete, I noticed that Kubuntu – or at least the 12.04 version – has a serious memory leakage problem: kded4 process occupies more and more memory as time passes, and after a week of activity it ‘eats’ up to two gigabytes of memory. The PC then gets slower and slower, making it totally unusable so I’ve had to find a workaround to make the inflation cease. The price of this has been the inability to make the PC sleep, which reveals to be quite impractical, especially when you are working outside without an accessible plug to help your laptop keep the charge.
    Even if Ubuntu support fairly well all the laptop’s hardware, some hardware problems still arise when you don’t expect them: I wanted to make an Ad-hoc connection to a friend’s laptop, but Kubuntu prevented me to do it because of kernel bugs. Also, a friend of mine had a Ubuntu 12.10 version and I was really astonished to see a so unstable Ubuntu version: random errors pop up every 10 minutes! I finally advised him to install another version.
    Despite the lack of hardware support, switching to some Linux distribution is something great, especially when your hardware can’t support the latest Windows version. Also, for people who don’t want to invest tens of euros (or dollars) in an anti-virus solution, it is also a good choice.
    Useful links

  • African language Wikimedia projects summary

    A few months ago I wrote an article which summarises my history on the Malagasy Wiktionary, and more generally my history on Malagasy language Wikimedia projects.
    I am back here to write a short summary recapitulating the current progression of African language WMF projects. In this article you’ll learn about the current stage of African language projects and their trend.
    In terms of community size, the biggest African-language community is the Afrikaans language Wikipedia community; followed by Egyptian Arabic speaking community and Swahili speaking community.
    If we look closer to the statistics. The award goes to the Afrikaans language Wikipedia community which has 7 to 8 very active contributors (performing more than 100 edits per month).
    The Egyptian Arabic Wikipedia community counts 2-3 very active contributors, which is big for an African language but very small comparing to Standard Arabic community counting more than twenty times more active users (83 very active users in June 2013), most of them being Egyptian contributors.
    About Swahili, the number of very active users is one to two. On a 2-year term, this number can be averaged to 1. But the number of active users (i.e. making more than 5 edits per month) is 9 in average, which is a fine thing for a language that is spoken in countries where internet access is quite hard.
    These numbers were obviously averaged from July 2011 to June 2013, so it smoothes short-term variations.
    In terms of raw article size, the biggest African language Wikimedia project is the Malagasy Wiktionary – which currently counts 2.5 million articles, only smaller than English and bigger than French! – , the Malagasy Wikipedia (40,000+ articles) and the Yoruba Wikipedia (30,000+ articles), followed by the Afrikaans and the Swahili language Wikipedias (respectively 27,000+ and 25,000+ articles).
    The Malagasy Wiktionary balecame very big for reasons you can read here, the Malagasy Wikipedia is big thanks to geography articles (~20,000 articles) and celestial objects (~8,000 articles); the Yoruba Wikipedia is made big by articles about people and also celestial objects (~15,000 objects).
    Many Wikimedians who consult the statistics should know that the number of content pages does not determine the quality or the comprehensiveness of an encyclopedia. Judging wikis by article count is like judging a book by the appearance of its cover. And many book readers and critics know that looking at the cover is not enough to judge a novel. Here, by its raw size, the Malagasy language dominate in the two biggest projects (Wikipedia and Wiktionary) but that doesn’t mean it has a very active community.
    To judge about the quality, comprehensiveness and completeness of the articles of such wikis, it is better to dive into this kind of statistics where scores are given by the absence/presence of vital articles and the size (number of characters) of such articles (if they exist). That kind of statistics are better than article count and page depth which can be inflated by the use of bot and the generation of tons of non-article pages (talk pages, subpages, redirects…).
    According to the List of Wikipedias by sample of articles, the best scored African language Wikipedia is the Afrikaans Wikipedia, which ranks 58th and the Swahili Wikipedia (79th) followed by Egyptian Arabic, Yoruba and Somali Wikipedias. Malagasy Wikipedia is quite far behind and ranks 155th which is only higher than Lingala (161st), Wolof (175th) and Shona (187th) Wikipedias having less than 5,000 articles. Which means article count is only the cover of the book and thus some efforts have to be done there to make Malagasy Wikipedia more comprehensive.
    What about the trend?
    Less than a year ago, some Wikipedias found a way to grow in number of article thanks to species databases. The first ones I saw to grow this way are Winaray and Cebuano Wikipedias. Winaray Wikipedia gained 100,000 articles primarily thanks to low quality geography stubs (consisting in one or two sentences), and secondarily thanks to articles about species, animal and vegetal ones, making it to have 510,000 articles. Cebuano has more than decupled in article count within the last 50 weeks, from 40,000 to more than 500,000 articles. This mania of creating article about species has propagated to Swedish and Dutch Wikipedia which has recently surpassed the German Wikipedia, and in response to that, the latter Wikipedia seemed to have boycotted the Dutch Wikipedia, by deleting the link to the Dutch Wikipedia in the German language Wikipedia main page.
    Now let’s write about the growth trend of African language Wikimedia projects. First off, let’s talk about Wikipedias, then Wiktionaries and finally other «minor» Wikimedia projects.

    Wikipedia language edition

    Current article count

    Growth (in 300 days) (1)

    Malagasy

    40,619

    +2,415

    Yoruba

    30,624

    +582

    Afrikaans

    27,801

    +3,928

    Swahili

    25,368

    +1,232

    Amharic

    12,722

    +1,015

    Egyptian Arabic

    10,764

    +1,939

    Somali

    2,830

    +383

    Lingala

    2,035

    +118

    Kinyarwanda

    1,816

    +7

    Kabyle

    1,517

    +778

    Wolof

    1,172

    +49

    Kongo

    826

    +135

    Northern Sotho

    688

    Igbo

    739

    +44

    Zulu

    586

    +22

    Setswana

    496

    –1

    Bambara

    392

    +6

    Siswati

    368

    +6

    Ewe

    302

    +12

    Hausa

    291

    +17

    Oromo

    276

    +36

    Tigrinya

    259

    +2

    Tsonga

    250

    +7

    Sango

    204

    +17

    Kirundi

    192

    +8

    Sesotho

    189

    +44

    Akan

    179

    +17

    Fulfude

    166

    +12

    Luganda

    166

    –2

    Twi

    157

    +12

    Chamorro

    157

    +6

    Xhosa

    151

    +10

    (1) Calculated following this site, data retrieved in July 26th 2013.
    On Wikipedia, the growth is slow comparing to other languages spoken in developped countries, where Internet access is easy and unexpensive to the normal citizen. The African language with the biggest community grows at approximately 5,000 articles per year, which is fairly high comparing to Swahili which growth is almost twice lower. If the current trend continues, the Afrikaans Wikipedia will surpass the Yoruba language Wikipedia next year, and the Malagasy Wikipedia in the next 2 years, as the two current biggest Wikipedias are stagnating in article growth.
    On smaller Wikipedias, the trend is positive, though slow. All open Wikipedias have more than 100 articles.
    The biggest of them is the Malagasy Wiktionary which has its growth kept by the use of Bot-Jagwar. Owned by myself, Bot-Jagwar runs from the Cloud, so it works regardless my computer and my internet connection’s healths. Thanks to it, the Malagasy Wiktionary gains 300 to 500 content pages daily. Automations eases many things in many ways, but automated processes can fail. So I have to keep an eye not only on the source code but also to entries generated thanks to that source code.
    African language Wikipedias are slowly but surely gaining articles as time passes. There seems to be a moratorium in closing African language Wikipedias, and this is fine because languages mainly spoken in developping countries need time to develop a community. Furthermore, the official language in these countries, especially African ones, are very often not the local language.

    Kurzweil Curve showing growth of computing power. It shows that all human brains can be simulated by 2050.
    Kurzweil Curve showing growth of computing power. It shows that all human brains can be simulated by 2050. What about having billions of “virtual” contributors on Wikipedia in 2050? Source (kraxinglogic.com)

    An increase of bot-made articles (which constitute nowadays 20% of articles created in Wikipedia) can indicate that in a near future, perhaps in 25 or 30 years, a bot will be able to write article like humans do. This is because Ray Kurzweil predicts the ability to simulate the human brain to be possible in twelve years and that current computers’ calculation power were supercomputers’ in the 1990s.
    What about me? Well, it’s been a while since my last big article on the Malagasy Wikipedia. And according to the list of Wikipedias by sample of article, several hundreds of article needed in all Wikipedias are missing, so my first goal for Wikipedia is to fill these gaps, slowly but yet surely. I prefer contributing about geography, but as I am the only contributor of the Wiki, I have to fill gaps a bit everywhere : Biography, Chemistry, Sports, etc. At that pace, I can barely create three or four articles per day. At that pace, I can fill the 1,000 articles that every Wikipedia should have list whithin the year.
    It’s been a while since the last time I blogged in Malagasy, So this article will be followed by a Malagasy language article. Perhaps a translation of this one, perhaps a new one.
    Useful resources
    To read further about what’s mentioned here.

    1. The law of Accelerating Returns by Kurzweil
    2. http://www.wikistatistics.net for all statistics about Wikimedia projects

    [polldaddy poll=7298306]

  • My story on the Malagasy Wiktionary

    It’s been a while since I posted on this blog. This article is about the mass adding content on the Malagasy Wiktionary. The object of this post is to provide some explanations on why and how the Malagasy Wiktionary has become so big.
    But first, allow me to introduce myself. My nickname on all Wikimedia projects is Jagwar. I am a Wikimedia contributor since August 2008, and I am going to be 20 years old soon. I speak Malagasy as mother tongue, French as a second language and English as a foreign language (soon the third language, since it is not quite perfect yet…).

    When I discovered perfectly randomly the Malagasy language version, the wiki was virtually dead, with no one adding interesting content, and an active community mainly constituted by non native speakers. Without any knowledge of the rules of the wiki, with almost no knowledge of how to correctly write Malagasy, I began an article. It grew up to 20,000 characters, making it to be the biggest page of the wiki at that time. Bust unfortunately (or fortunately, for the sake of readers), a non-native speaker administrator spotted the lack of notability of the article, leading it to be deleted.
    I could leave the wiki, as tens of hours of work had literally vanished of the wiki… But I didn’t, I still cannot figure out why, but deeper in my mind, a little voice told me to continue contributing. At that time, the Malagasy Wikipedia counted 550 articles, maybe less, but not more.
    So I continued on this way for a while. To help me in my task I wrote to potential volunteers. These people didn’t see the point to contribute to a wiki in their mother tongue: either they were unable to spell correctly Malagasy words, or they didn’t have time enough to do good work; while others required money to start contributing (times are hard in Madagascar, I know), and even with money, I am not sure these ones will stay long once the money paid.
    In October 2008, I discovered Malagasy Wiktionary. At the beginning I actually didn’t know what to do out there, so I continued to work on the Malagasy Wikipedia just to become more skilled and used to write Malagasy.

    In July 2009, I was on vacation to my fatherland: Madagascar. I have taken this occasion to learn more deeply the written Malagasy language, though my means were quite limited: reading newspapers, the Bible (I am christian), watching news broadcasts on TV as well as on Radio… I almost forget French (!), though it was present almost everywhere as second official language.
    When back to France, I have decided to incite potential volunteers that are able to write to contribute on the Malagasy language Wikimedia projects: but you know, Madagascar was in crisis and people sometimes asked for money to contribute: other blamed me on my spelling mistakes, and others simply ignore the request. I had less and less time to dedicate to the projects and I have no money to give this way. One day, I decided that I couldn’t wait anymore for someone to arrive: the progress of my skills in Malagasy, in programming languages, and the promise of a very busy future (inducing a chronic lack of time) mentally forced me to do something, to do something for my mother tongue, even a tiny little thing.

    In 2010, when I could write in my mother tongue without too much spelling mistakes, I started to write bots. Once they are written, I ran them at the very full speed: fifty thousand edits per day: that was the pace, the normal pace. At the beginning it was the importation of foreign language wikis from other wikis, and it consisted mainly in importing verb forms, first through an import form, and after through a script that copy-pastes other wikis’ content pages to the Malagasy Wiktionary equivalent page. I went slightly at the beginning, but I did it more and more often, till the wiki got 200,000 content pages. On these possible copyright-infringing importations, I received a warning from a user that almost got his mother tongue wiki closed due to the creation of thousands of useless pages.

    In 2011, I got mad: after discovering the astonishing easiness of Volapük, I wrote a script to upload the word forms of that language. At full speed – i.e around 50,000 edits per day – three weeks were required to make the Malagasy Wiktionary the third biggest Wiktionary of the world. But months passed, and no one, absolutely no one, did contribute: one day on the wiki, the number of active users dropped to two, for a wiki that contains 1,19 million content pages (in comparison, the German Wikipedia which had a comparable article count, didn’t count less than 25,000 active users) !

    On July of the same year, a new script has been written. That script allowed to create translations based on foreign language entries. With that script, up to 5,000 articles were created, and they mainly concern lemma entries. Just a few weeks later, the import of all Malagasy words has been completed. But its repercussion on article count was not visible due to the mass deletion of Volapük language entries. Why this mass deletion? Because many entries seemed to be wrong as they are not conjugation of verbs, but nouns (-.-‘), so the decision is taken to delete them all to re-create them later, with a better quality if possible. Since then, my activity on the Malagasy Wikipedia is put in brackets to dedicate my whole wiki time to the renovation of the Malagasy Wiktionary.

    During the summer vacation, I took the time to restructure the Malagasy Wiktionary. The article, category structure were inspired by the structure of the French Wiktionary: use of template for languages, parts of speech, allowed the Malagasy Wiktionary entries to be automatically categorized through the use of templates. Time passed and the routine started to install.
    One night, I discovered an online Malagasy monolingual dictionary. Having no idea about the copyright-ability of the content (the copyright seemed to apply only on design), I decided to reuse the content on that dictionary to complete the entries on the Malagasy Wiktionary. The problem arrived just a few weeks later, when I received a mail from a Wikimedia Foundation staff member. P. Beaudette. In its mail, he asked me the origin of the Malagasy language entries, I answered they were from various bilingual dictionaries, and the online monolingual dictionary… An copyright infringement investigation was led and my bot was blocked during the whole process. At the end of it, I was told by the staff member to remove the 30,000 entries that infringe the original dictionary’s copyright, which was done.

    After this copyright infringement episode, I decided to orient my contribution in adding Malagasy language content to other wikis. But before that, I did some work on the Fijian and Tagalog Wiktionaries, that was more or less appreciated… There was in particular an IP address checking my contributions on the Fijian and Tagalog Wiktionaries. This IP told me to stop mass-adding content to these languages of which I speak no word. I ceased to work on both wikis a few weeks later, as the work is finished.

    But this mass-adding content, especially in language I didn’t speak at all, seemed to annoy people that have decided to discuss about the case on MetaWiki forum. No concluding results was given, and things were as they were before.

    With most of the hard work being removed, with a behaviour that has been reproved by many users, I decided to take a break of indefinite duration. It actually lasted 5 months, during which I tried to work on my written Malagasy outside Wikimedia projects. The progression of my skills, spelling as well as programming skills, were honourable, allowing me to go back again and make the Malagasy Wikimedia projects, and especially the Malagasy Wiktionary, evolve again. In July 2012, I built a new tool that allows me to know the non-exising entries/pages on the Malagasy wiktionary by consulting the daily online newspapers. Only two newspapers are currently supported, because of their use of RSS feeds. But the ability to make the script read non-RSS supporting websites is coming soon.

    In September, I have developped a new, improved translation retriever that allows the script to get all translations of all languages on a given page (the previous version could only translate one language at once), which almost decuples the translation harvest. This function is embedded in a XML dump reader that ampifies the efficiency of the script: fast translation retrieving and no requirement to be connected to the server while processing. Done every month, the dump processing and uploading make the wiki to gain more than 100,000 lemmata in a few months. These lemmata may have translation errors, but it is low enough not to be taken in consideration (<1%). Hardest cases can be resolved by a single check on the source wiki (which is indicated by a template).

    In October, I have thought about building a bot that completes a task as scheduled by a parameter file. This is particularly useful for maintaining list of wikis up-to-date. Currently, the pace at which the list of Wikis on the Malagasy Wiktionary is four times a day, i.e every six hours.

    At the end of January 2013, I thought about a more efficient use of the translation retriever that I wrote a few months ago. Then comes the IRC bot: it retrieves in real time all the edits made on selected wikis and does its possible to translate the latter entry in Malagasy,  in real time! The first time it was developped, it only used the traditional translation retriever, but later, on March, it also features a basic entry processor that allows the IRC bot to also translate entries in foreign languages into Malagasy, using the same dictionary. This latter version of the IRC bot is currently in use, and it creates hundreds of entries and content pages on the Malagasy Wiktionary everyday. I have no precise idea about the error rate but I am pretty sure it is less than 5%. The positive side of the bot is its ability to keep the pace when several edits are made in a minute, nevertheless, as it requires to be online and to be connected to Wikimedia servers, the processing frequency is limited to one page per second. Something is being thought on allowing the bot to process more pages.